Mode Switches with 4x32bit Float Neon Instructions

William

Hi,

I'm working on some granular / convolution effect. Because of the performance issue of the time domain convolution, I wanted to use Neon instructions to speed up the rendering. It works fine for 2x floats, using the SIMD basically to speed up stereo processing. (I do two convolutions in parallel)
However, when I switched to using the 4 single-precision Neon types / instructions, I basically get swamped with mode switches. I wrote a small wrapper around the GCC Neon functions / types, the 2x float and 4x float version look identical (minus the matching functions, obviously) so I'm a bit lost, why the 2x version works but the 4x version somehow trashes the system.

Is there anything I'm missing here?

giuliomoro

Hi,
no idea why this would be the case without having an in-depth look at your code.

What I have seen in the past - though - was some very weird bugs being introduced by trying to use NEON instructions which assume the memory to be aligned to a 16 byte boundary when the memory was not actually aligned, so you may want to check that for a start.

William

Hi,

thanks for the hint. I'll do another round of bug hunting before I give up. Though Alignment actually seems to be the issue. In particular, alignas(16) or other attributes are ignored because 'new' only takes the max alignment (std::max_align_t) into account.

So all my buffers arn't properly aligned.

Is there a good way to get around this alignment issue with c++?

AndyCap

How about allocating a bit more memory than you need and offsetting into it at the correct alignment.

giuliomoro

I guess you could override the new operator if you really wanted to?
In practice I would have pointers in the C++ class which have to be aligned (and NOT the class instance itself), so either do the alignment manually, as AndyCap suggests, or use posix_memalign() or some of the NE10 allocators.

William

Manual alignment it is then. 🙂 That's what happens when you go from programming for consumer systems (Mac, Win) to embedded stuff. 😃

EDIT:
Yep, manually aligning the data worked nicely.