EDIT:
To summarize all below into more simple steps:
1) Figure out how to make a background task, test it with a printf statement or something. Look at examples in Bela git Repo for how to schedule a non-RT task. Your system will choke if you try to do this within the RT audio thread.
This is out of sequence from advice I gave earlier, but thinking practically how to implement cross-correlation without a non-rt environment in which to do so requires something like processing some wave files on your computer...and this mixes in an extra challenge for somebody new to programming (learning wav file I/O and type-casting).
2) Prevent the task from being rescheduled until it is complete (something like clearing a bool gTaskCompleted variable in the audio thread that is set by the non-rt thread when it completes. Audio thread tests it on every loop to decide whether to reschedule the task). This is known as a semophore (or a spin-lock). Normally a spin-lock for thread synchronization is considered crude and inefficient, but in the current context a spin lock doesn't cost you very much (just an if() statement each loop). You might use the usleep() function in the non-rt thread to make it take 1/2 second to complete and then do a printf() output when it finishes so you can see the background task ran.
3) You need a circular buffer (delay line) for each channel in the RT audio thread even though the output of the Mic Input will not be delayed. This is just for recording data to be processed: 1 for Line In and 1 for Mic In. Line In channel is larger than Mic In channel by an additional number of samples representing something more than the maximum delay you expect between the 2 channels. For the Line In channel this circular buffer doubles as your delay line since you can pull your Line output from this buffer lagging the write pointer by however much delay you need.
4) Every time the gTaskCompleted is set, "unwrap" the circular buffers into a pair of linear buffers (something that doesn't change while you are processing it). Remember, Line is longer than Mic by maximum expected delay between the 2. These 2 linear buffers are what you process with the cross-correlation function. When you get cross-correlation implemented, print an integer value for the delay between these 2 buffers, or maybe write it out to a log file that you can examine in a spreadsheet or even just a text editor. Here is an example of using fprintf() (C-style): http://www.cplusplus.com/reference/cstdio/fprintf/
5) When you feel like your log file is giving you reasonable values for delay when signal level is high enough to be valid, then you can work on the peak/level detector to trigger processing.
6) Implement peak level detector. Just a really rough sketch below (Copy/paste ok if you understand how to fill in the missing pieces):
#define T_ATK 10/1000 //Attack time on peak detector to avoid activating on brief glitches, pops, noise, etc.
#define T_RLS 50/1000 //Release time to hold peak value to hold between cycles
float gLineLevel;
float gMicLevel;
float gaa, gab, gra, grb; //1-pole LPF coefficients, 1 set for attack, 1 set for release
.
.
.
//Initialize
float dt = 1/SampleRate; //get SampleRate from context
//setup coefficients
// see https://en.wikipedia.org/wiki/Low-pass_filter#Simple_infinite_impulse_response_filter
gaa = dt/(dt + T_ATK);
gra = dt/(dt + T_RLS);
gab = 1.0 - gaa;
grb = 1.0 - gra;
gLine_Level = 0.0;
gMic_Level = 0.0;
.
.
.
//In render loop, do this for every sample:
// This is a 1/2 wave rectifier. I doubt you will need more than this for simple
// signal level detection. To do full-wave you use abs(input_mic[n])
if(input_mic[n] > gMic_Level) {
gMic_Level = gaa*input_mic[n] + gab*gMic_Level; //run 1-pole LPF with attack time
} else {
gMic_Level = gra*input_mic[n] + grb*gMic_Level; //run 1-pole LPF with release time
}
//Repeat above for line input channel, then:
if( (gMic_Level > gProcessingThreshold) && (gLine_Level > gProcessingThreshold) && (gTaskCompleted == true) ) {
//needs some logic to check that it has been above the threshold enough during the past
//MAX_LENGTH samples so you know the majority of your circular buffers are full of valid audio
//frames instead of noise or silence.
//then populate buffers and schedule background processing.
}
Here is a sketch of brute-force cross-correlation:
//EDIT: See following post. I worked out the details to make it into code that compiles and works as expected.
After several of these cycles do some statistics on the computed delay times to gain a confidence in the mean (remove extreme outliers). After that you should be able to stop computing cross-correlation unless you want to periodically re-evaluate to make sure somebody didn't move the mic.
I don't know your level of knowledge in signal processing theory, but a cross-correlation is a convolution between two signals where one is time-reversed. This convolution can be performed by a single point-wise multiplication in the frequency domain + the cost of FFT and iFFT instead of NxN multiplications.
Compare these:
https://en.wikipedia.org/wiki/Cross-correlation
https://en.wikipedia.org/wiki/Convolution
You will see the formulas are identical except for the sign on τ , which is merely a time reference reversal, which is a minor detail to keep in mind when interpreting the results of a convolution in the context of cross-correlation.
You can do this efficiently with an FFT :
http://dsp.stackexchange.com/questions/736/how-do-i-implement-cross-correlation-to-prove-two-audio-files-are-similar
But you probably want to prove that out to yourself in MATLAB (octave) or similar environment before you attempt to implement it in C++.
Because you can push this to the background, outside of the RT audio thread, and you don't need to keep up with it in real time then it may not matter...I would put off the FFT convolution method to the last thing you do if you want to optimize for performance.
Here is an example for how to schedule a low priority auxiliary task:
https://github.com/BelaPlatform/Bela/blob/master/examples/11-Extras/oscillator-bank/render.cpp