- Edited
[UPDATE]
Deleted text here and updated original post to include improvements.
[UPDATE]
Deleted text here and updated original post to include improvements.
math-neon wins the prize! It looks like this implementation of expf() only takes about 1.5x CPU time as a single floating point multiply.
I was originally hesitant to use this thinking expf() was going to be computationally expensive and figured it wouldn't make an audible difference, but I was wrong on both counts. It's computationally cheap and makes a very significant difference in the sound.
Once I implemented this I started hearing a sound that really brought to mind the sound of a Dynacomp, only on Bela you don't have to mess around with that bias trimmer.
Of course, my discovery that e-x with math-neon is little more than a multiply, it makes implementation of a traditional feed-forward compressor trivial...although my feedback compressor performs well enough I'm not seeing a need for yet another compressor.
Now this compressor has added features:
Linear feedback gain mode (the original design)
Exponential feedback gain mode ( approximately log-linear transfer function)
Parallel compression (wet/dry mix)
Check it out if you like compressors like I do. I'm a compressor nerd (in case it isn't self-evident).
These can sound really cool on drums and percussion instruments. I have added enough parameterized control it's not only a guitar effect any more.
Yeah from the measurements I made back then expf_neon
was 5.5 times faster than expf
and still remarkably precise, so good news that it has been put to good use!
giuliomoro I was able to reproduce a similar result with my test program. I did the comparison between the two expf() functions just to sanity check my own benchmark so I could believe the results of comparison to a multiply.
It might be a good addition to the math-neon benchmark wiki page to add a benchmark section for floating point multiplication, addition and subtraction as some reference points for comparison.
ryjobil It might be a good addition to the math-neon benchmark wiki page to add a benchmark section for floating point multiplication, addition and subtraction as some reference points for comparison.
Also division could help, I guess, as that is probably very very slow (no hardware division on this NEON unit).
giuliomoro no hardware division on this NEON unit
//Perform division
// a/b
a*expf(-logf(b));
Or probably even faster:
den = invsqrtf_neon(b);
den *= den;
ans = a*den;
It would make me laugh if the neon FPU w/ fast math lib can do those faster than
a/b
Bump. I added some more plots to the original post to capture the exponential feedback mode.
Updated the block diagram to more completely capture the functions that have been implemented.
Wow, thanks for the contribution, really!
I don't have a bela board yet, which means I can't test it or tell how much load on the chip a stereo instance would cause. Anyone?
chrion Wow, thanks for the contribution, really!
It's always encouraging to see somebody interested. Makes the sharing more fun
One instance uses about 2% CPU time. Not a big deal to add several instances. Below is how I come up with 2% CPU usage.
Comparing the CPU with (mono) compressor active, then bypassed, I get the following at 8 audio frames per block:
active: 25.4 % CPU, realtime
bypass: 23.3 % CPU, realtime
Compressor CPU Usage: 2.1% per channel
It takes about 23% CPU just for Bela to process the ADC inputs and perform basic pass-through, so the compressor benchmark is based upon additional CPU needed when the compressor is active.
My interest in making CPU usage minimum is because I want this basic block to be easily implemented when a lot of instances are used, such as multi-channel and/or multi-band compression. There was never any concern that a few instances of the compressor would be a problem on Bela.
Those numbers sound fantastic to me. But man! 23% gone at the get go? My goal is to make a MPC2000XL type of sampler, I hope the BBB holds up. If only the BBB had the power of a Rpi4...the possibilities would almost be endless. Especially really nice looking GUIs
chrion 23% gone at the get
That may be an old figure and also a bit excessive. I think a more accurate and recent one is 12% with a block size of 16 and 6% with a blocksize of 128 (with --high-performance-mode
enabled)
giuliomoro That may be an old figure and also a bit excessive.
Yes, I have not updated software on my BBB for 2 years. Any improvements made since then would not be reflected in that figure. Also the sketch was scanning and filtering the ADC inputs looking for control set-point changes. That may have been worth a few percent, so this figure does exaggerate the performance hit.
giuliomoro 12% with a block size of 16
16% with block size of 8 on my PocketBeagle (Bela Mini) which has more recent software, running the audio and analog pass-through example. No measurable change when analog channel pass-through is commented out.
I do believe the 2% figure for the compressor remains valid.