Dynamic Range Compressor

ryjobil · Jun 29, 2017

math-neon wins the prize! It looks like this implementation of expf() only takes about 1.5x CPU time as a single floating point multiply.

I was originally hesitant to use this thinking expf() was going to be computationally expensive and figured it wouldn't make an audible difference, but I was wrong on both counts. It's computationally cheap and makes a very significant difference in the sound.

Once I implemented this I started hearing a sound that really brought to mind the sound of a Dynacomp, only on Bela you don't have to mess around with that bias trimmer.

Of course, my discovery that e^-x with math-neon is little more than a multiply, it makes implementation of a traditional feed-forward compressor trivial...although my feedback compressor performs well enough I'm not seeing a need for yet another compressor.

Now this compressor has added features:
Linear feedback gain mode (the original design)
Exponential feedback gain mode ( approximately log-linear transfer function)
Parallel compression (wet/dry mix)

Check it out if you like compressors like I do. I'm a compressor nerd (in case it isn't self-evident).

These can sound really cool on drums and percussion instruments. I have added enough parameterized control it's not only a guitar effect any more.

giuliomoro · Jun 29, 2017

Yeah from the measurements I made back then expf_neon was 5.5 times faster than expf and still remarkably precise, so good news that it has been put to good use!

ryjobil · Jun 29, 2017

giuliomoro I was able to reproduce a similar result with my test program. I did the comparison between the two expf() functions just to sanity check my own benchmark so I could believe the results of comparison to a multiply.

It might be a good addition to the math-neon benchmark wiki page to add a benchmark section for floating point multiplication, addition and subtraction as some reference points for comparison.

giuliomoro · Jun 29, 2017

ryjobil It might be a good addition to the math-neon benchmark wiki page to add a benchmark section for floating point multiplication, addition and subtraction as some reference points for comparison.

Also division could help, I guess, as that is probably very very slow (no hardware division on this NEON unit).

ryjobil · Jun 29, 2017

giuliomoro no hardware division on this NEON unit

//Perform division
// a/b
a*expf(-logf(b));

Or probably even faster:

den = invsqrtf_neon(b);
den *= den;
ans = a*den;

It would make me laugh if the neon FPU w/ fast math lib can do those faster than

a/b

ryjobil · Jun 30, 2017

Bump. I added some more plots to the original post to capture the exponential feedback mode.

Updated the block diagram to more completely capture the functions that have been implemented.

Cchrion · Oct 20, 2019

Wow, thanks for the contribution, really!
I don't have a bela board yet, which means I can't test it or tell how much load on the chip a stereo instance would cause. Anyone?

ryjobil · Oct 20, 2019

chrion Wow, thanks for the contribution, really!

It's always encouraging to see somebody interested. Makes the sharing more fun

One instance uses about 2% CPU time. Not a big deal to add several instances. Below is how I come up with 2% CPU usage.

Comparing the CPU with (mono) compressor active, then bypassed, I get the following at 8 audio frames per block:
active: 25.4 % CPU, realtime
bypass: 23.3 % CPU, realtime
Compressor CPU Usage: 2.1% per channel

It takes about 23% CPU just for Bela to process the ADC inputs and perform basic pass-through, so the compressor benchmark is based upon additional CPU needed when the compressor is active.

My interest in making CPU usage minimum is because I want this basic block to be easily implemented when a lot of instances are used, such as multi-channel and/or multi-band compression. There was never any concern that a few instances of the compressor would be a problem on Bela.

Cchrion · Oct 25, 2019

Those numbers sound fantastic to me. But man! 23% gone at the get go? My goal is to make a MPC2000XL type of sampler, I hope the BBB holds up. If only the BBB had the power of a Rpi4...the possibilities would almost be endless. Especially really nice looking GUIs

giuliomoro · Oct 25, 2019

chrion 23% gone at the get

That may be an old figure and also a bit excessive. I think a more accurate and recent one is 12% with a block size of 16 and 6% with a blocksize of 128 (with --high-performance-mode enabled)

ryjobil · Oct 28, 2019

giuliomoro That may be an old figure and also a bit excessive.

Yes, I have not updated software on my BBB for 2 years. Any improvements made since then would not be reflected in that figure. Also the sketch was scanning and filtering the ADC inputs looking for control set-point changes. That may have been worth a few percent, so this figure does exaggerate the performance hit.

giuliomoro 12% with a block size of 16

16% with block size of 8 on my PocketBeagle (Bela Mini) which has more recent software, running the audio and analog pass-through example. No measurable change when analog channel pass-through is commented out.

I do believe the 2% figure for the compressor remains valid.