I am trying to do real-time speech enhancement, first i want to input the speech and do windowing with 50% overlap, fft, then ifft, add up three segments and output the middle part of the segment (the part with complete information), to check if i get the output from the speaker similar to the input from the mic. My feedthrough is okay but after some fft calculations I kept hearing beeping sounds when i run the program, anyone knows how to solve this?
thanks in advance 🙂
The following is my code, and the illustration of the concept is in the diagram

#include <Bela.h>
#include <cmath>
#include <vector>
#include <complex>
#include <libraries/Fft/Fft.h>
#include <libraries/Scope/Scope.h>

// feedthrough + hanning

unsigned int fftsize = 128;
unsigned int hopsize = fftsize / 2;
std::vector<float> window;
std::vector<float> inputBuffer(2 * fftsize, 0.0f); // Initialize the buffer size to 256 and value to 0
std::vector<float> outputBuffer(2 * fftsize, 0.0f); // Adjusted to hold the entire processed range
std::vector<float> fftfirst_real(fftsize, 0.0f); // Buffer to store real part of FFT output
std::vector<float> fftfirst_img(fftsize, 0.0f); // Buffer to store imaginary part of FFT output

unsigned int bufferIndex = 0; // Index to keep track of buffer filling
bool bufferFull = false; // Flag to check if buffer is full

Fft fft; // FFT processor

// Hanning window function
void hanningWindow(std::vector<float>& window, unsigned int size) {
    for (unsigned int i = 0; i < size; ++i) {  //++i is pre-increment order
        window[i] = 0.5 * (1 - cos(2 * M_PI * i / (size - 1)));
    }
}

bool setup(BelaContext *context, void *userData) {
    window.resize(fftsize);  // Properly resize the window vector before calling the hanning function
    hanningWindow(window, fftsize);  // Calling the Hanning window function to initialize the window vector to value of fftsize
    
    if (fft.setup(fftsize) != 0) { // Ensure the FFT setup is correct
        rt_printf("FFT setup failed\n");
        return false;
    }

    rt_printf("Number of input channels: %d\n", context->audioInChannels);
    rt_printf("Number of output channels: %d\n", context->audioOutChannels);
    return true;
}

void process_segment(const std::vector<float>& inputBuffer, unsigned int startIndex) { // process_segment function can read from inputBuffer but cannot modify it
    std::vector<float> windowedBuffer(fftsize, 0.0f);
    
    // Apply window and perform FFT
    for (unsigned int i = 0; i < fftsize; ++i) {
        windowedBuffer[i] = inputBuffer[startIndex + i] * window[i];
    }
    fft.fft(windowedBuffer);
    
    // Store FFT output in real and imaginary parts
    for (unsigned int i = 0; i < fftsize; ++i) {
        fftfirst_real[i] = fft.fdr(i);
        fftfirst_img[i] = fft.fdi(i);
    }

    // Perform IFFT
    fft.ifft(fftfirst_real, fftfirst_img);
}

void render(BelaContext *context, void *userData) {
    for (unsigned int n = 0; n < context->audioFrames; n++) {
        // Read audio input from channel 0 (microphone)
        float in = audioRead(context, n, 0); // in is a local var within the render function
        
        inputBuffer[bufferIndex] = in; // Store the input in the doubled-sized buffer (i call it inputBuffer here)
        bufferIndex++;
        
        // Check if buffer is full
        if (bufferIndex >= 2 * fftsize) {
            bufferFull = true;
            bufferIndex = fftsize; // Reset the buffer index for the next set of samples
        }
    }
    
    // Check if the buffer is full and process the samples
    if (bufferFull) {
        // Reset the buffer full flag
        bufferFull = false;
        std::fill(outputBuffer.begin(), outputBuffer.end(), 0.0f);
        
        // Process three overlapping segments
        process_segment(inputBuffer, 0); // Process first segment
        for (unsigned int i = 0; i < fftsize; ++i) {
            outputBuffer[i] += fft.td(i);
        }
        
        process_segment(inputBuffer, hopsize); // Process second segment
        for (unsigned int i = 0; i < fftsize; ++i) {
            outputBuffer[hopsize + i] += fft.td(i);
        }
        
        process_segment(inputBuffer, fftsize); // Process third segment
        for (unsigned int i = 0; i < fftsize; ++i) {
            outputBuffer[2 * hopsize + i] += fft.td(i);
        }
        
        // Write the processed output buffer to the audio output
        for (unsigned int n = 0; n < context->audioFrames; n++) {
            if (n >= hopsize && n < 3 * hopsize) {
                audioWrite(context, n, 0, outputBuffer[n]);
            }
        }
    }
}

void cleanup(BelaContext *context, void *userData) {
    fft.cleanup(); // Properly clean up the FFT resources
}

it could be that the FFT->processing->IFFT is not ready by the time the audio thread needs it? Try increasing the block size and step size.

13 days later

Hi, thanks for the prompt reply!
There is still noise (buzzing sound) after increasing the block size (same issue as what I discuss in the following bit)
Now I have this issue regarding this project:

`#include <Bela.h>
#include <cmath>
#include <vector>
#include <complex>
#include <libraries/Fft/Fft.h>
#include <libraries/Scope/Scope.h>

// feedthrough + hanning

unsigned int fftsize = 128;
unsigned int hopsize = fftsize / 2;
std::vector<float> window;
std::vector<float> inputBuffer(2 * fftsize, 0.0f); // Initialize the buffer size to 256 and value to 0
std::vector<float> outputBuffer(2 * fftsize, 0.0f); // Adjusted to hold the entire processed range
std::vector<float> fftfirst_real(fftsize, 0.0f); // Buffer to store real part of FFT output
std::vector<float> fftfirst_img(fftsize, 0.0f); // Buffer to store imaginary part of FFT output

unsigned int bufferIndex = 0; // Index to keep track of buffer filling
bool bufferFull = false; // Flag to check if buffer is full

Fft fft; // FFT processor

// Hanning window function
void hanningWindow(std::vector<float>& window, unsigned int size) {
for (unsigned int i = 0; i < size; i++) { //++i is pre-increment order
window = 0.5 * (1 - cos(2 * M_PI * i / (size - 1)));
}
}

bool setup(BelaContext *context, void *userData) {
window.resize(fftsize); // Properly resize the window vector before calling the hanning function
hanningWindow(window, fftsize); // Calling the Hanning window function to initialize the window vector to value of fftsize

if (fft.setup(fftsize) != 0) { // Ensure the FFT setup is correct
    rt_printf("FFT setup failed\n");
    return false;
}

rt_printf("Number of input channels: %d\n", context->audioInChannels);
rt_printf("Number of output channels: %d\n", context->audioOutChannels);
return true;

}

void process_segment(const std::vector<float>& inputBuffer, unsigned int startIndex, std::vector<float>& outputSegment) {
std::vector<float> windowedBuffer(fftsize, 0.0f);

// Apply window and perform FFT
for (unsigned int i = 0; i < fftsize; i++) {
    windowedBuffer[i] = inputBuffer[startIndex + i] * window[i];
}

fft.fft(windowedBuffer);

// Store FFT output in real and imaginary parts
for (unsigned int i = 0; i < fftsize; i++) {
    fftfirst_real[i] = fft.fdr(i);
    fftfirst_img[i] = fft.fdi(i);
}

// Perform IFFT
fft.ifft(fftfirst_real, fftfirst_img);

// Store the IFFT output
for (unsigned int i = 0; i < fftsize; i++) {
    outputSegment[i] = fft.td(i);
}

}

void render(BelaContext *context, void *userData) {
for (unsigned int n = 0; n < context->audioFrames; n++) {
// Read audio input from channel 0 (microphone)
float in = audioRead(context, n, 0); // in is a local var within the render function


    inputBuffer[bufferIndex] = in; // Store the input in the doubled-sized buffer (i call it inputBuffer here)
    bufferIndex++;
    
    // Check if buffer is full
    if (bufferIndex >= 2 * fftsize) {
        bufferFull = true;
        bufferIndex = fftsize; // Set to 128 (129th), ready to overwrite the second half
    }
}

// Check if the buffer is full and process the samples
if (bufferFull) {
    // Reset the buffer full flag
    bufferFull = false;
    std::fill(outputBuffer.begin(), outputBuffer.end(), 0.0f);
    
    std::vector<float> tempoutput(fftsize, 0.0f);
    
    // Process three overlapping segments
    process_segment(inputBuffer, 0, tempoutput); // Process first segment
    for (unsigned int i = 0; i < fftsize; i++) {
        outputBuffer[i] += tempoutput[i];
    }
    
    process_segment(inputBuffer, hopsize, tempoutput); // Process second segment
    for (unsigned int i = 0; i < fftsize; i++) {
        outputBuffer[hopsize + i] += tempoutput[i];
    }
    
    process_segment(inputBuffer, fftsize, tempoutput); // Process third segment
    for (unsigned int i = 0; i < fftsize; i++) {
        outputBuffer[2 * hopsize + i] += tempoutput[i];
    }
    
    // Write the processed output buffer to the audio output
    for (unsigned int n = 0; n < context->audioFrames; n++) {
        for (unsigned int channel = 0; channel < context->audioOutChannels; channel++) {
            if (n >= hopsize && n < 3 * hopsize) {
                audioWrite(context, n, channel, 10 * outputBuffer[n]);
            }
        }
    }

    // Preserve the second half of the inputBuffer and shift it to the first half
    rt_printf("Buffer before shifting: \n");
    for (unsigned int i = 0; i < fftsize; i++) {
        rt_printf("%f ", inputBuffer[i]);
    }
    rt_printf("\n");

    for (unsigned int i = 0; i < fftsize; i++) {
        inputBuffer[i] = inputBuffer[fftsize + i];
    }

    rt_printf("Buffer after shifting: \n");
    for (unsigned int i = 0; i < fftsize; i++) {
        rt_printf("%f ", inputBuffer[i]);
    }
    rt_printf("\n");

    // Print the second half of the buffer for verification
    rt_printf("Second half of the buffer before shifting to first half: \n");
    for (unsigned int i = fftsize; i < 2 * fftsize; i++) {
        rt_printf("%f ", inputBuffer[i]);
    }
    rt_printf("\n");

    // Print the first half of the buffer after shifting
    rt_printf("First half of the buffer after shifting: \n");
    for (unsigned int i = 0; i < fftsize; i++) {
        rt_printf("%f ", inputBuffer[i]);
    }
    rt_printf("\n");
}

}

void cleanup(BelaContext *context, void *userData) {
fft.cleanup(); // Properly clean up the FFT resources
}
`
The above is my current code, I changed the buffer shifting part, and I think now the logic is all correct, but there is still a buzzing sound from the speaker output.

the buffer shifting is like this:
Preserve the second half of the inputBuffer and shift it to the first half, ready for the next render function to be called and fill the new samples into the second half of the buffer
for (unsigned int i = 0; i < fftsize; i++) {
inputBuffer[i] = inputBuffer[fftsize + i];
}

in the code, I have printed the inputBuffer values before and after buffer shifting and compare them, sometimes i see some value mismatch, anyone can suggest what to do ?

On the other hand, I have been trying using gdb for setting breakpoints and debugging, and I have been using cmd or window powershell, but i think the bela interface can do as well right ? (the red circle part)

sorry I don't know why not all my code go into the 'insert code' thing

    v41827 sorry I don't know why not all my code go into the 'insert code' thing

    If you press the button with more than one line selected it will use ``` for code blocks, otherwise it will only put ` ... ` for inline code.

    v41827 n the other hand, I have been trying using gdb for setting breakpoints and debugging, and I have been using cmd or window powershell, but i think the bela interface can do as well right ? (the red circle part)

    It's not very convenient to use gdb there. The best way of doing it is to ssh onto the board, cd into /root/Bela/projects/projectname and use gdb ./projectname there.

    I see you are calling process_segment() three times from within the audio thread ... does that not give you dropouts? Or you have increased the block size enough that you get no 'dropout detected' messages?

    The

        std::vector<float> windowedBuffer(fftsize, 0.0f);

    is reallocated every time the function is entered. As long as the application is single threaded, make it a global or static to avoid pointless allocations every time it is called. Same for std::vector<float> tempoutput(fftsize, 0.0f); in render()

     std::fill(outputBuffer.begin(), outputBuffer.end(), 0.0f);

    that's another costly operation that you could save by replacing the += in the first loop with a =.

    The rt_printf() themselves may be cause of underruns as you are printing so much data per callback.

    v41827 in the code, I have printed the inputBuffer values before and after buffer shifting and compare them, sometimes i see some value mismatch, anyone can suggest what to do ?

    As far as I understand you are doing this:

    1- print Buffer before shifting [prints first half]
    2- copy second half to first half
    3- print Buffer after shifting [prints first half]
    4- print Second half of the buffer before shifting to first half [prints second half]
    5- print First half of the buffer after shifting: [prints first half]

    so 1- is expected to be different than everything else as it's old data that gets overwritten in 2-. Then 3- and 5- print the very same data (same memory locations). 4- should also print the same data because the second half has been copied to the first half in 2-.
    Keep in mind that rt_printf() prints to a circular buffer that is read and printed to stdout by another thread. As you are printing a lot of data from the real-time thread without giving the printing thread a chance to run, you may end up with lost or corrupted data in the print. Could this be the cause of the alleged 'value mismatch' you see? Try removing 1- (useless) and 5- (a repetition of 3-) and that may help getting rid of corrupted output.