• CTAG-ALSA
  • Round-trip latency measurement not consistent in CTAG

Hi there!
I am using the CTAG (Beast) version for a project where I need to know exactly the round-trip latency of the system, so I was measuring it as follows:
- Loopback from output to input using a mini jack to mini jack cable
- 48 kHz sample rate and 64 samples blocksize
- I load a 5 seconds exponential sine sweep from a wav file in the init function, which I then play in the audio thread, recording the audio inputs as well. I store the input signal in a wav file as part of the cleanup function.

I then compute the IR by spectral division input/output in MATLAB, compensate for the 2*BLOCKSIZE delay (I actually want to know the HW delay) and get my delay by looking at the Dirac delta.

The problem is that most of the times I get consistent results (58 samples delay) but sometimes I get 58+32 samples delay or 58+64 samples delay. So half/one blocksize extra delay. This happens randomly and I have seen the probability of getting that unexpected delay seems to be higher when Scope is enabled.

I would say the code is right. I have even tried using MLS sequences, and the problem remains.

So my question is if this is somehow expected and could have any explanation or I may be doing something wrong?

Thanks!

    apeiro - 48 kHz sample rate and 64 samples blocksize

    this uses an internal fifo that until recently made it so that the roundtrip latency would start at the lowest value and would then increase towards the nominal value every time you'd get an underrun in the audio thread. This has now be fixed on the dev branch. Could you test again after updating your board to the latest commit of the dev branch?

    Thanks for the quick reply!
    Right, with the dev branch it seems I always get a consistent result.
    So for 48000 Hz I am getting a round-trip latency of:
    - Blocksize 64: 90+64x2 = 218 samples (4.54ms latency)
    - Blocksize 32: 58+32x2 = 122 samples (2.54 ms)
    Are those the expected values?
    Thanks!

    The values make sense. I'd expect about fewer 8 samples than that: the roundtrip group delay of the codec is 48 samples. There will be a couple of samples lost in the McASP serialiser/deserialiser, so you should have 50 + blocksize*2 at blocksizes up to 32. When going above 32 with the CTAG BEAST there is this internal FIFO of size 32, so there you should have 50 + blocksize * 2 + 32. You are measuring 8 samples above these values. Not sure where they are coming from ... maybe the McASP FIFO? I'd need to do a deeper analysis.

    Thanks! So just to confirm, both in the master and in the dev branches, the extra FIFO is only for blocksize > 32. In the dev branch the FIFO size is now fixed to 32, in master it may be variable.

    21 days later

    Hi Giulio,
    Some more questions about the roundtrip latency...

    I am now comparing BEAST Vs. FACE in terms of latency and these are the numbers I get (for the values below I have subtracted the 2*blocksize delay):

    BEAST (Fs = 48 kHz)
    BS 32: 58 samples
    BS 64: 90 samples
    BS 128: 90 samples

    FACE (Fs = 48 kHz)
    BS 32: 58 samples
    BS 64: 58 samples
    BS 128: 90 samples

    So it looks like for the FACE setup, the extra buffer is added for BS > 64 only.

    The questions:
    1) Are these numbers expected?
    2) Whatever the latency is I would need to ensure it is always a fixed number. Is that also the case for the FACE setup? Is there any case for which the latency could be not fixed?

    (I am using the dev branch, so the question is related to it)

    Thanks!

      apeiro So it looks like for the FACE setup, the extra buffer is added for BS > 64 only.

      The questions:
      1) Are these numbers expected?

      yes. See https://github.com/BelaPlatform/Bela/blob/master/core/RTAudio.cpp#L627-L637

      Is there any case for which the latency could be not fixed?

      Now that I fixed it as discussed above, there is no known situation where latency is not fixed.

      The rationale behind all this: the PRU stores operates on PRU RAM to read and write inputs and outputs. The space available in PRU RAM is limited (a handful kilobytes) and it needs to hold all the data for audio channels, analog channels, digital channels. With large channel counts and larger blocksizes, the space available in PRU RAM may no longer be sufficient. When that is the case, we keep the I/O buffer at a smaller blocksize (128 for Bela cape, 64 for CTAG FACE, 32 for CTAG BEAST) and then use a FIFO to achieve the user-requested blocksize.