Low latency convolution

Zzillaomg · Feb 28, 2020

I'm new to Bela and I'm working on a low latency convolution patch for my Bela mini.

I'm using bsaylors external and it runs at about 30% CPU with a block size of 32 and a window size of 256 convolving the input + a wave file of about half a second.

If I try and drop the block size to 16 I get a bunch of underuns and glitched audio. f I try and use a wav file longer than that time the same thing happens.

I've also tried a vanilla convolution patch as well as the convolve~ external with no luck.

So my question is: is this the limit of Bela using pure data for convolution??

I don't have much experience coding in C++, but if there were examples I could probably sort it out very slowly. I've also worked with SC a bit. Would that be more efficient?

Im using heavy for some CPU intensive patches on my larger Bela which has been working great but it doesn't support the required objects, correct?

Any help is much appreciated.

giuliomoro · Feb 28, 2020

zillaomg If I try and drop the block size to 16 I get a bunch of underuns and glitched audio. f I try and use a wav file longer than that time the same thing happens.

This may due to the way the external was built or the way it is written, or both. Did you build it yourself? How? On normal Pd, the compile- time blocksize (DEFDACBLKSIZE) is 64, while on Bela it is 16. This may mean some externals wouldn't work well if a) they rely on that value being 64, and/or b) they were built for "generic" ARM instead of Bela. Whether it would work gracefully at smaller blocksize depends also on the way the DSP code is implemented. One solution could be to re-compile Bela's libpd with DEFDACBLKSIZE = 64

zillaomg I've also tried a vanilla convolution patch as well as the convolve~ external with no luck.

Vanilla implementations tend to be very CPU intensive, and also could also rely on having DEFDACBLKSIZE == 64.

zillaomg Im using heavy for some CPU intensive patches on my larger Bela which has been working great but it doesn't support the required objects, correct?

correct.

Zzillaomg · Feb 28, 2020

giuliomoro This may due to the way the external was build or the way it is written, or both. Did you build it yourself? How? On normal Pd, the compile- time blocksize (DEFDACBLKSIZE) is 64, while on Bela it is 16. This may mean some externals wouldn't work well if a) they rely on that value being 64, and/or b) they were built for "generic" ARM instead of Bela. Whether it would work gracefully at smaller blocksize depends also on the way the DSP code is implemented. One solution could be to re-compile Bela's libpd with DEFDACBLKSIZE = 64

So if I recompiled libpd with (DEFDACBLKSIZE = 64) I could still change the patch block size to run smaller than that? How would I do that? In the IDE or with block~ in the patch?

I used a generic build I realized. How do I compile from source? I've done it with other externals but I've never had to create a makefile before.

giuliomoro · Feb 28, 2020

zillaomg So if I recompiled libpd with (DEFDACBLKSIZE = 64) I could still change the patch block size to run smaller than that? How would I do that? In the IDE or with block~ in the patch?

no. That would be the minimum.

zillaomg How do I compile from source?

where is the source?

Zzillaomg · Feb 28, 2020

giuliomoro where is the source?

https://github.com/pd-externals/bsaylor

giuliomoro · Feb 28, 2020

It should be enough to copy the source to the board and from within that folder do:

make PD_INCLUDE=/usr/local/include/libpd/

Then copy the resulting .pd_linux binaries into the pd-externals folder:

mkdir -p /root/Bela/projects/pd-externals/ #create it if it doesn't exist
cp *pd_linux /root/Bela/projects/pd-externals/

If you need more assistance with doing this from the terminal, let me know.

Zzillaomg · Feb 28, 2020

Whoops I was trying from ./pd-lib-builder for some reason.

On second thought I may have compiled from source this way. Will try again when I get home.

Thanks

Zzillaomg · Mar 3, 2020

Recompiled and performance is the same.

I can get away with using an impulse response of about 300-400 ms if the block size is 32.

If anyone has any suggestions they are more than welcome

giuliomoro · Mar 3, 2020

zillaomg e block size is 32.

Is this the Bela program's block size or the partsize argument you pass to [partconv~] ? As far as I understand from a quick look at the help file and the source code, partconv~ is using single-threaded evenly-partitioned convolution, which is not particularly CPU-efficient. More modern partitioned convolution techniques would split the convolution in block of increasing size. As the convolution is implemented via a FFT, the CPU efficiency increases dramatically for larger blocks, allowing to process much longer (> 10 seconds ) impulse responses, while keeping the latency low (because the first few blocks are small enough). I am not aware of a Pd external that implements this.

You could try increasing Bela's blocksize to 128 and play around with the [partconv~]partition size

Zzillaomg · Mar 3, 2020

giuliomoro Is this the Bela program's block size or the partsize argument you pass to [partconv~] ?

32 is Bela's block size.

I tried various partition sizes on the convolution object and the only way I got it working with a longer impulse response was with Bela's block size at 128, the convolution object at 2048 and a block~ object at 64.

Not sure why this is the case (why it won't work without that block~ object)

Either way the latency is too much, because I'll be performing on it with live cello input.

Right now I think I'll live with the shorter impulse response and try and code it in supercollider in the upcoming months. That will provide more efficiency right?

Llokki · Mar 3, 2020

zillaomg maybe you could look into the axoloti convolution object, it works "realtime" and i use it with an IR for double bass

Zzillaomg · Mar 4, 2020

lokki

Thanks for the reply.

Yea, I was actually using the convolution object with Axoloti but that's also using a very short response and it overloaded the CPU with everything else I had going on.

I'm fine having a short response when just trying to get a cello sound but my goal was to be able to select different responses on the fly and improvise with them, e.g., select a longer cymbal and get that texture while playing with the body or with a contact mic.

That's actually the main reason I got Bela-- I thought that the increased CPU and RAM could run everything. I can do some heavier granular stuff which is cool along with some other more CPU intensive stuff but without coding in C it doesn't seem like I'll be able to run everything in the same patch

I'll keep working on it though and also hope that the new axoloti2 comes out soon :-)

Mmatt · Mar 4, 2020

If you're willing to get your feet wet in the API the NEON C++ libraries that come with bela have some very efficient FIR filter algorithms built in.

Using the ARM specific versions of the filters I'm able to run 8 512 tap FIR filters in parallel which only consumes ~40% of the CPU. The same filter bank using multiple resolutions costs less than half that. Also I can also use the scope to analyze the spectrum of each filter band while the program is running for only a ~15% hit in CPU.

My point is that Bela is very very powerful and you can get a lot more out of it if you dig in past the user friendly stuff. Right now I'm working on making an easy to use front end for some of the NEON functions to try and help folks out and show them the power of Bela. Keep an eye out for that on this forum if you're interested, and best of luck on your project!

giuliomoro · Mar 4, 2020

I think you'll get to about 11000 taps using NE10 if you do it all in the time-domain. A partitioned convolution approach would be able to use a frequency-domain approach for the later and longer partitions, comfortably stretching above 10 seconds. However, I am not aware of an open source implementation of this algorithm, but also I haven't searched extensively.

Mmatt · Mar 5, 2020

Very interesting. 11,000 is a ton of headroom. I wonder how many more you can get by using the multi resolution or sparse FIR algorithms.

Applying the filters in the frequency domain is a cool idea - recently I had the idea to use the 16n fader bank as a controller for a bela graphic eq. You make the shape of the power spectrum you want with the 16 faders then use NE10 interpolation to create higher resolution versions to multiply with the input signal. Also you could use it as a waveform designer.

I've already made a fir filter bank patch on Bela with 16n as a controller by doing convolution in the time domain. The frequency domain approach is very interesting might have to try it esp if it's more efficient.

Don't mean to hijack this thread, just wanted to share the success I had figuring out how to use the ne10 libraries. Bela rocks

Zzillaomg · Mar 5, 2020

matt

matt If you're willing to get your feet wet in the API the NEON C++ libraries that come with bela have some very efficient FIR filter algorithms built in.

Using the ARM specific versions of the filters I'm able to run 8 512 tap FIR filters in parallel which only consumes ~40% of the CPU. The same filter bank using multiple resolutions costs less than half that. Also I can also use the scope to analyze the spectrum of each filter band while the program is running for only a ~15% hit in CPU.

My point is that Bela is very very powerful and you can get a lot more out of it if you dig in past the user friendly stuff. Right now I'm working on making an easy to use front end for some of the NEON functions to try and help folks out and show them the power of Bela. Keep an eye out for that on this forum if you're interested, and best of luck on your project!

Thanks for the reply!

I'm definitely interested but as I have very little little experiencing coding (my background is in music theory) I don't really know where to start.

I have a tiny bit of c++ experience though not in dsp. I'm a quick study though so if you could point me in the right direction I would definitely give it a shot.

Remork · Apr 8, 2020

maybe a bit late to the party, but has anybody seen this?

giuliomoro · Apr 8, 2020

yup, Mathieu was in the masters program with me when he did this. This was before Bela, so he didn't have the advantage of Xenomai scheduling and had to increase the blocksize to achieve reliable performance for long reverbs. Some people have tried running something similar on Bela and got more than 10s stereo reverb with a block size of 32 samples. Unfortunately their code is not publicly available.

Zzillaomg · Jun 24, 2020

Just an update - I recompiled pd to a default block size of 64 and the William Brent convolve~ object works with a block size of 256 with wav files up to 1 second (16 bit 44.1khz)

Not great but works. Will try and tweak it more

ryjobil · Jun 27, 2020

zillaomg I have a tiny bit of c++ experience though not in dsp. I'm a quick study though so if you could point me in the right direction I would definitely give it a shot.

I'm going out on a branch guessing that your music theory background gives you more insight into DSP concepts than you might think. I never got deep into music theory, but I started out in the arts before I went into engineering and I was pleasantly surprised how much my minimal exposure to music theory prepared me for signal processing concepts. What I determined is you don't have to be a math guru to "get it". Just add, subtract, multiply and imagine pitches and tones changing over time. If you are comfortable with PD, the C++ just adds some twists to expressing what you're thinking, but you can get it if you have the time to invest in the technical aspect up front.

The coding part is easy, because there is a lot of detail you don't actually need to understand about C or C++ in order to code DSP algorithms on Bela. A lot of the lower level "hard" stuff has been done by a team of brilliant individuals who know it at that level.

By saying the coding part is easy, I don't want to trivialize the steep learning curve, but once you get some basic programs running, then your imagination grows, and your confidence grows with it. It takes time and attention, including time away from creating music.

As for pointing you toward a path, look at Bela examples and study the code until you understand somewhat how it works. Then start to modify it and observe how your changes to the code work out. It will seem slow at first, but if you keep at it you will start to imagine how to use what you are learning to do what you really want to do.

Here are some resources that have been helpful to me over the years of tinkering with this stuff:
https://www.learncpp.com/
https://www.dsprelated.com/ (search forums and blogs for topics of interest)
Last but not least, Bela example code is full of good ideas and coding concepts worth understanding.

My method was to study open source code that was interesting to me, and going line-by-line, searching and reading about any new syntax or concepts I saw in the code that I didn't understand.

I think there are a lot of resources out there where you can balance the amount of technical learning needed against musical creativity. Some things like partitioned convolution with non-uniform block sizes are not unreasonably difficult to grasp in principle, but very tedious to implement in practice. This is more a level of commitment rather than a level of skill or intelligence.

If you don't have the level of commitment needed to develop an open source project for this, then welcome to the club of mortal individuals who wait for somebody with the right combination of time, talent, money and combination of interests...or much much faster hardware