Thanks for your help, I have implemented a flag that can be passed on build
PRU.cpp which allows the use of
uint16_t directly, I am however unsure if I can implement the expander, because I don't have it, so it is mostly unsupported under this flag at the moment. I don’t know if anyone else needs this, so I won’t add a pull request yet. Let me know if it would be useful to you.
The fork is here:
Do you mean the
USE_NEON_FORMAT_CONVERSION flag? That’s interesting, does clang have good support for vectorisation on the ARM?
The main reason I want this is because I am writing a small debouncer for pad triggers (always 8 channels) that I want to be extremely lightweight. With a 16 bit int as the datatype we can put all channels of a frame in to a single quadword in NEON. I am now wondering if my assembly will be slower than letting the compiler figure it with some tricks. I read that most compilers don’t always find the best vectorisation for ARM. I guess I will have to go do some benchmarks.
Anyway, for anyone who might interested, it’s here:
Please let me know how/if I can speed this thing up any more, or if I’m doing something wrong, it’s the first time for me working with (inline at the moment) assembly on the ARM/Bela.