Sensors increasingly off scale the more Bela stays on?

Rrobinm · May 6, 2022

Hi everybody, I have a weird situation occurring on a Bela running Pure Data with 6 sensors, not sure about the cause. I'm running a quite heavy PD patch on Bela (average 70% CPU usage) on boot. I've noticed that 3 ultrasonic sensors I'm using start to get off scale and keep increasing their measured values, until I reboot Bela and run the code again. The "jumps" seem to always amount to about 50 additional cm every 20-30 minutes or so. I've only measured them with the IDE running, but I can hear from the sound produced that they go off scale even when not connected to the computer.
I have tried different power sources (computer and a dedicated power supply), not sure what else to try. I thought for the way it looks it might involve the audio block size, which is now set to 2048 (it's apparently too heavy to reduce it without glitches..), because the jumps occur always kinda at the same time and always for the same amount.

Does anybody have any idea about the source of this problem, or anybody could suggest other ways to troubleshoot?
Thank you all, Robin

giuliomoro · May 7, 2022

How are you measuring the distance in your Pd patch? 50cm is what you'd get from an offset of 128 samples, which is what Bela's internal block size uses when you run it at 2048 samples. Not sure how those would interact without seeing the relevant part (distance measurement) of your patch, though.

Rrobinm · May 7, 2022

Hi Giulio,
I'm using this idea that you introduced me to a couple of months ago, trying to reduce crosstalk:
https://forum.bela.io/d/2171-render-cpp-for-3-ultrasonic-sensors-beginner/2

alt text

digital in 17 receives 1-0 values from a radar sensor, which turns on and off the snapshot metro/pulse metro for all distance sensors.

Beside this, it's pretty much identical to the basic ultrasonic distance sensor PD patch on the Learn section of the Bela website.

If you meant physically how I'm measuring it, I'm using just flat non-reflective surfaces like cardboard or thick fabric. in front of each of them to check how off scale they get. That's where my empirical measurements come from.

As you see, I've tweaked the correction offset enough to relate measured distances to their right scale when I turn the patch on, and I didn't notice any problem until I started to keep the patch and the circuit on for longer and longer times.
Any idea what might be happening?

giuliomoro · May 7, 2022

hmmm I can't see anything immediately wrong. I am wondering whether this has anything to do with the underlying buffering for achieving 2048 samples where something related to the digital I/O gets screwed up over time (it surely hasn't been tested as much as other parts of our codebase) ... one way to verify that would be to run the code replacing one of the digital I/O pairs for one sensor with analog I/O), assuming you have a Bela cape with analog outs. E.g.:
- replace [adc~ 12] with [adc~ 10] (analog in 7)
- replace [dac~ 11] with [dac~ 10] (analog out 7)
and verify whether this misbehaviour still affects all three channels.

Btw, could you verify whether all three sensors start adding those 50cm at exactly the same time? I assumed it would be the case, but just double checking ...

Rrobinm · May 8, 2022

I've been trying to use a pair of analog I/O, but sensors seem to get way too noisy when connected there. Am I supposed to use the same voltage divider I'd use while connecting HCSR04 to digital I/O? I'm asking because I'm getting quite meaningless values from analog I/O and I'm wondering if it might depend on that, since I've tried different 5V and GND pins across Bela, different individual sensors and cable sets and different pin pairs, and still get completely messed up readings.

I will make some more attempts to use analog outputs and digital inputs, but if the aim was to avoid digital ins I'll fall onto the same problem again.

About your question, yes, all 3 sensors display the same behavior after the same time. They basically go coherently off scale - this makes me think of a couple more "drastic" solutions, or possible solutions. One is to just count some time before subtracting -50 from each of them, repeatedly - it's not a very clean nor elegant way, but it could work. The other would be to hardcore powering Bela off and on every now and then.. for the way the patch works now, every 6 seconds of no movement detected from the radar all the synth sequences etc. are cleared anyway. So I'm thinking since this installation is gonna be on for a day, I might just take 2 technical minutes between each group coming in to literally reboot Bela manually (unless there is a way to program cyclical reboots of Bela).
I'm getting too sloppy with these ideas tho, I hope I can sort it out in a neater way.

giuliomoro · May 8, 2022

robinm 've been trying to use a pair of analog I/O, but sensors seem to get way too noisy when connected there.

you are right, sorry, there is some thresholding needed. Between [adc~ 10] and [rzero~ 1] add a "signal greater than", e.g.: [expr~ $v1>=0.5] or the below may be slightly cheaper

alt text

(this is inspired by [hv.gte] from heavylib).

Rrobinm · May 8, 2022

Yes you nailed it, it worked with [expr~ $v1>=0.5]. I don't know why but the other slightly cheaper way doesn't seem to make any difference.
Anyway, unfortunately moving to analog I/O didn't solve the drifting problem. I have been running a test this past couple of hours and noticed now that it's already reading +50cm for the sensor connected to the analog I/O.

Does it make sense to run tests with all sensors to analog I/O? or should I consider other workarounds?
I'm all ears (well, eyes)

giuliomoro · May 8, 2022

Ok, next try the following (still with one sensor on analog ins and the others on digitals). Disable the more CPU intensive parts of the patch (essential run only the code in your screenshot above, or little more than that). Run it with the same settings, do you get the same issue?
Then scale back the blocksize to 512 first and then 64: do you still get the same issue?

Rrobinm · May 9, 2022

It seems to be only related to CPU consumption. After few hours keeping on a patch with just the stuff in the screenshot, no drift appears. I had to resize a bit the offset value and kept the blocksize at 2048, so that it could be comparable with the entire patch at least on that end. Seems like I should basically get rid of some heavy stuff in PD and bring it fairly down from 80% CPU usage, or is there any other way I'm not seeing?

giuliomoro · May 9, 2022

Hmmm this is weird unless there is something in your patch that changes its behaviour the longer the patch runs, which somehow has a cascade effect on the rest of the stuff? Namely, as floats are single precision, if you are increasing e.g.: a counter without ever wrapping it, it may show some issues after a long time the patch runs.

Just to be sure, can you confirm that when your patch runs for long time with all the processing enabled it is the actual readings from the sensors that have an unexpected offset? For instance, I want to make sure we are not in a situation such as "oh the pitch of the oscillator s off by the equivalent of 50cm, therefore there's something wrong with the sensor", when the problem is actually with the oscillator or something else between the sensor reading and the oscillator.

Rrobinm · May 9, 2022

well firstly thanks for taking the issue seriously. I do mean readings from sensors like the actual distance (the one obtained from that stuff in the screenshot, printed out right after the offset correction value). I mentioned at the beginning that I noticed this effect sonically, from the effect of an increased minimum distance to synth parameters, but then I made sure to print distance values as close to the source as possible to minimize other possible effects and check the actual readings. So yea, it really changes the minimum detected distance in blocks, 50cm at the time (but not so constant in time as I thought at first), regardless of analog and digital IO. It seems to have stopped occurring when I loaded just the stuff in the screenshot, dropping CPU consumption to like 20-25%. I've run it for just 2 hours or so, so I can't ensure it's completely clear for like 6 hours or more, but by that point in all the other cases I'd get drift up to 100cm already. If you think it's unlikely that a CPU overload would cause this, I'll stop focusing so much on it, but as per now the two things do seem correlated..

giuliomoro Namely, as floats are single precision, if you are increasing e.g.: a counter without ever wrapping it, it may show some issues after a long time the patch runs.

Im not sure exactly what you mean in reference to a counter, could you be a bit more specific? I am using quite a lot of counters so I wanna make sure that I'm not causing this problem by a dirty usage of [f] and [+1] and so on.

giuliomoro · May 9, 2022

What I was referring to is that one could equivalently write a counter in these two ways:
alt text

however, only the left hand side one will work forever. The right hand side one won't, because it doesn't wrap back to 0 the value that is stored into the [f ], which means that if the patch runs for long enough, the number will eventually become big enough that it won't be represented well as a single-precision float (which is what Pd uses internally) and it will start misbehaving. There are other cases where long-running patches start misbehaving, often triggered by something like this. However, this doesn't seem to be the case for your patch case, as the code in the screenshot has none of these problems, and as you are checking the values as soon as they are output by each sensor reader, I don't see how a problem elsewhere in the patch could cause that.

Unfortunately this means I have to look back at one of the ugliest parts in the Bela core code where this issue could be coming from ... Could you try your code with 1024 and 4096 block size? Does the issue persist and does it occur at the same point in time (i.e.: after 30 minutes)?

In the meantime, have you tried restarting the patch without restarting the board? This is done by tapping the button that's on the cape. That should give you about 2 seconds downtime instead of 10-15 that you'd get otherwise.

Rrobinm · May 9, 2022

I see what you mean. I've corrected them wherever I've found stuff like the right one inside my patch, i'll keep this in mind from now on.

giuliomoro Unfortunately this means I have to look back at one of the ugliest parts in the Bela core code where this issue could be coming from

i'm sorry you have to face your darkest demons. we can't run away forever :<
The issue persists at 1040 and 4096 block size, seemingly after a similar amount of time (hard to point at a specific moment, but I have noticed in both cases an increase in all distance values comparable to the issue we're talking about).

giuliomoro This is done by tapping the button that's on the cape

you're talking about the white button which says "OFF"?

Rrobinm · May 9, 2022

Oh no actually my mistake. It seems to not appear on 4096 block size. For some reason I avoided this setting until now, since it seemed impossible to find a proper correction offset in this case (I tried some time ago and abandoned it to stick to 2048). The more I tried to calibrate it to start from as close to 0 as possible, the less it seem to work (and kept losing the calibration). Now it motivates me to try to sort this part out, as it might be a neat solution for the drifting problem.
I'm going to run some more tests tomorrow to make sure I'm not double seeing stuff, but if that was the case - and the issue doesn't appear at 4096 - what would be the explanation for it?

giuliomoro · May 10, 2022

robinm you're talking about the white button which says "OFF"?

yes! Perhaps a misleading label ...

robinm I'm going to run some more tests tomorrow to make sure I'm not double seeing stuff, but if that was the case - and the issue doesn't appear at 4096 - what would be the explanation for it?

I have no idea ... as this seems to be somehow related to CPU usage, maybe this affects it ... but it is a very weird behaviour nevertheless ...

robinm For some reason I avoided this setting until now, since it seemed impossible to find a proper correction offset in this case (I tried some time ago and abandoned it to stick to 2048). The more I tried to calibrate it to start from as close to 0 as possible, the less it seem to work (and kept losing the calibratio

This is also hard to explain for me .... at some point you may want to share your full patch so I can do some tests on my side ...

Rrobinm · May 10, 2022

giuliomoro at some point you may want to share your full patch so I can do some tests on my side ...

Sure thing. Can I attach it here somehow? (I'm sorry about the dumb question, I couldn't find a way to do it)

Anyway, after few more tests today the problem seems to still appear at 4096, and it still seems to relate to recurrent CPU peaks (probably caused by an overlapping of expensive processes that occur when the radar sensor turns to 1, I'm working on it to see if I can at least distribute them better).

giuliomoro yes! Perhaps a misleading label ...

I mean it does what it says.. it stops the patch from running. Is there a way to reboot it in a similar way too?
I have to specify that my efforts have been so far into keeping this entire project "off the box", meaning no computer in sight, so whatever process involving rebooting/restarting would be preferably executed without the need for the IDE.

giuliomoro · May 11, 2022

robinm it stops the patch from running. Is there a way to reboot it in a similar way too?

When the patch is running on boot, it will restart automatically when stopped. So you stop it tapping that and it will restart in a couple of seconds.

giuliomoro · May 11, 2022

robinm Sure thing. Can I attach it here somehow? (I'm sorry about the dumb question, I couldn't find a way to do it)

Best option is to put it on github or a file sharing system (e.g.: onedrive, google drive, dropbox) and share the link here

giuliomoro · May 11, 2022

robinm (probably caused by an overlapping of expensive processes that occur when the radar sensor turns to 1, I'm working on it to see if I can at least distribute them better).

now this makes more sense. I think I have an intuition as to why a CPU spike would cause the large-blocksize audio thread to start drifting away from the other thread performing I/O at 128 samples per block.

Rrobinm · May 11, 2022

Great, I've put the entire Bela project here:
https://listahaskoliislands-my.sharepoint.com/:u:/g/personal/robin19_lhi_is/ETvYB-Nt0j9DjyR10uCodsMBOP9uCoKY1DhvAiRgrxkhKw?e=dhoQpH

Let me know if you're able to access the link (should be public). I wanna clean it up before I put it on Github..
nonetheless it should be all commented enough for you to get oriented, if you need to take a look inside.
Sensors settings are in the top-left area of the patch.

giuliomoro I think I have an intuition as to why a CPU spike would cause the large-blocksize audio thread to start drifting away from the other thread performing I/O at 128 samples per block.

I tried to use delays of 100ms for each instance of the radar sensor input, so that every time it turned to 1 it would switch processes on one by one (the ones that I thought are overlapping, wherever synchronicity wasn't needed). It didn't change absolutely anything in the CPU consumption, and the drifting occurred anyway. It's possible that I'm missing something stupid and obvious on the way, so if you happen to do tests in that sense I'd be happy to hear what you get.