ARM_neon in C++

freddyliu

Hello,

I'm writing a program for real time convolution for every input audio sample. Right now the code I have is a simple for loop that looks like this:

		float y = 0.0f;
		for (unsigned int i = 0; i < numTaps; ++i) {
			y += b[i] * buffer[numTaps - i];
			if (bufIndex - i == 0) {
				break;
			}
		}

I'm looking to optimize this piece of code so that the program can run faster/support longer buffers. I saw that bela has the ne10 library which contains useful functions that are optimized for ARM processors. However, I also read somewhere that the c++ compiler already automatically vectorizes the code to optimize for ARM NEON. Is this true?

Thanks in advance!

giuliomoro

freddyliu Is this true?

the answer lies in the assembly! addCPPFLAGS=-save-temps=obj to the Make parameters box in the Project settings in the IDE, then force a recompile of your program. Then in the IDE go to settings -> show hidden files and look for the .s files in the build/ directory that appears in the project. That's the intermediate assembly file generated as part of the compilation. Find the relevant lines and look at them.

Or do it the fast and convenient way: use the Convolver library http://docs.bela.io/classConvolver.html, which uses Ne10 under the hood.