[UPDATE]
Deleted text here and updated original post to include improvements.

14 days later

math-neon wins the prize! It looks like this implementation of expf() only takes about 1.5x CPU time as a single floating point multiply.

I was originally hesitant to use this thinking expf() was going to be computationally expensive and figured it wouldn't make an audible difference, but I was wrong on both counts. It's computationally cheap and makes a very significant difference in the sound.

Once I implemented this I started hearing a sound that really brought to mind the sound of a Dynacomp, only on Bela you don't have to mess around with that bias trimmer.

Of course, my discovery that e-x with math-neon is little more than a multiply, it makes implementation of a traditional feed-forward compressor trivial...although my feedback compressor performs well enough I'm not seeing a need for yet another compressor.

Now this compressor has added features:
Linear feedback gain mode (the original design)
Exponential feedback gain mode ( approximately log-linear transfer function)
Parallel compression (wet/dry mix)

Check it out if you like compressors like I do. I'm a compressor nerd (in case it isn't self-evident).

These can sound really cool on drums and percussion instruments. I have added enough parameterized control it's not only a guitar effect any more.

Yeah from the measurements I made back then expf_neon was 5.5 times faster than expf and still remarkably precise, so good news that it has been put to good use!

    giuliomoro I was able to reproduce a similar result with my test program. I did the comparison between the two expf() functions just to sanity check my own benchmark so I could believe the results of comparison to a multiply.

    It might be a good addition to the math-neon benchmark wiki page to add a benchmark section for floating point multiplication, addition and subtraction as some reference points for comparison.

      ryjobil It might be a good addition to the math-neon benchmark wiki page to add a benchmark section for floating point multiplication, addition and subtraction as some reference points for comparison.

      Also division could help, I guess, as that is probably very very slow (no hardware division on this NEON unit).

        giuliomoro no hardware division on this NEON unit

        //Perform division
        // a/b
        a*expf(-logf(b));

        Or probably even faster:

        den = invsqrtf_neon(b);
        den *= den;
        ans = a*den;

        🙂 It would make me laugh if the neon FPU w/ fast math lib can do those faster than

        a/b

        Bump. I added some more plots to the original post to capture the exponential feedback mode.

        Updated the block diagram to more completely capture the functions that have been implemented.

        2 years later

        Wow, thanks for the contribution, really!
        I don't have a bela board yet, which means I can't test it or tell how much load on the chip a stereo instance would cause. Anyone?

          chrion Wow, thanks for the contribution, really!

          It's always encouraging to see somebody interested. Makes the sharing more fun 😃

          One instance uses about 2% CPU time. Not a big deal to add several instances. Below is how I come up with 2% CPU usage.

          Comparing the CPU with (mono) compressor active, then bypassed, I get the following at 8 audio frames per block:
          active: 25.4 % CPU, realtime
          bypass: 23.3 % CPU, realtime
          Compressor CPU Usage: 2.1% per channel

          It takes about 23% CPU just for Bela to process the ADC inputs and perform basic pass-through, so the compressor benchmark is based upon additional CPU needed when the compressor is active.

          My interest in making CPU usage minimum is because I want this basic block to be easily implemented when a lot of instances are used, such as multi-channel and/or multi-band compression. There was never any concern that a few instances of the compressor would be a problem on Bela.

          5 days later

          Those numbers sound fantastic to me. But man! 23% gone at the get go? My goal is to make a MPC2000XL type of sampler, I hope the BBB holds up. If only the BBB had the power of a Rpi4...the possibilities would almost be endless. Especially really nice looking GUIs 🙂

            chrion 23% gone at the get

            That may be an old figure and also a bit excessive. I think a more accurate and recent one is 12% with a block size of 16 and 6% with a blocksize of 128 (with --high-performance-mode enabled)

              giuliomoro That may be an old figure and also a bit excessive.

              Yes, I have not updated software on my BBB for 2 years. Any improvements made since then would not be reflected in that figure. Also the sketch was scanning and filtering the ADC inputs looking for control set-point changes. That may have been worth a few percent, so this figure does exaggerate the performance hit.

              giuliomoro 12% with a block size of 16

              16% with block size of 8 on my PocketBeagle (Bela Mini) which has more recent software, running the audio and analog pass-through example. No measurable change when analog channel pass-through is commented out.

              I do believe the 2% figure for the compressor remains valid.