I'm trying to do some overlap-add STFT-processing on the Bela. I would like to do quick vector computations for audio blocks, such as adding together two STFTs bin-by-bin. I've tried to accomplish this using the Vector Math functions the NE10 library provides, since I figured it might make the code more efficient.
However, performing these operations has lead to segmentation faults. Here is a simple mockup example:
#include <Bela.h>
#include <ne10/NE10.h>
// Length of vectors added
ne10_uint32_t ARR_LEN = 2048;
bool setup(BelaContext *context, void *userData)
{
return true;
}
void render(BelaContext *context, void *userData)
{
ne10_float32_t src1[ARR_LEN];
ne10_float32_t src2[ARR_LEN];
ne10_float32_t dst[ARR_LEN];
// Generate test input values for `src1` and `src2` using `rand()`
for (int i = 0; i < ARR_LEN; i++)
{
src1[i] = (ne10_float32_t) rand() / RAND_MAX * 5.0f;
src2[i] = (ne10_float32_t) rand() / RAND_MAX * 5.0f;
}
// Element-wise add src1 and src2, store result in dst
ne10_add_float_neon (dst, src1, src2, ARR_LEN);
}
void cleanup(BelaContext *context, void *userData)
{
}
In this example, I am element-wise adding two vectors src1 and src2 and storing the result into dst. Using a vector length ARR_LEN = 2048, I get a segmentation fault. Using an ARR_LEN of 1024 or smaller, the code runs without errors.
What is causing this behavior and how could I do something like this?
In my actual code I would like to have a function which would take in as argument a STFT which is an array void do_something(ne10_fft_cpx_float32_t *stft) and perform some vector computations such as addition as in my example. There I would need to have local temporary arrays such as src1,src2 and dst as in my example. However, I run into these segmentation faults with slightly larger array sizes.