Hello,
I'm writing a program for real time convolution for every input audio sample. Right now the code I have is a simple for loop that looks like this:
float y = 0.0f;
for (unsigned int i = 0; i < numTaps; ++i) {
y += b[i] * buffer[numTaps - i];
if (bufIndex - i == 0) {
break;
}
}
I'm looking to optimize this piece of code so that the program can run faster/support longer buffers. I saw that bela has the ne10 library which contains useful functions that are optimized for ARM processors. However, I also read somewhere that the c++ compiler already automatically vectorizes the code to optimize for ARM NEON. Is this true?
Thanks in advance!