Killing aux tasks

Ward · Jul 25, 2023

Hey, I have an aux task that sometimes gets stuck in an infinite loop.

If I detect that It does not schedule anymore, I would like to be able to kill the auxtask so I can restart it again, and thus exit the infinite loop.

It seems there is a function to stop all auxtasks but not a specific one..

giuliomoro · Jul 25, 2023

Ward It seems there is a function to stop all auxtasks but not a specific one..

That's a function for internal use only.

In general, it's not a good idea to cancel a thread. This is because by cancelling it at an arbitrary point, you may leak resources, and/or have them left in an inconsistent state. The best approach to solving your issue of the task that gets stuck in an infinite loop is to fix the underlying cause for the unexpected infinite loop, or at least detect it from the thread itself and resolve it by exiting the infinite loop.

Failing that and only as a last resort, you can use pthread_cancel(), combined with pthread_setcanceltype() and pthread_cleanup_push(), a strategy which has enough pitfalls that I refer you to the relevant man pages. In order to get the thread's pid_t opaque id to be used in some of these calls, use pthread_self() and do not use the AuxiliaryTask object.

Ward · May 6, 2024

Can higher priority tasks yield/interrupt lower priority tasks?

I got a low prio task that recursively searches through a tree which results in dropouts when audio CPU is 70%+ and the tree is large. I added usleep(0) to every x steps searching through the tree and that solved the dropouts.

Am I right in assuming that when the audio task is requested but another task is still running, the audio task has to wait for the other task to yield?

giuliomoro · May 6, 2024

Ward Am I right in assuming that when the audio task is requested but another task is still running, the audio task has to wait for the other task to yield?

As long as the audio task has higher priority than the other task, this shouldn't be the case.

What priority does your task have? The fact that usleep(0) (which is roughly equivalent to pthread_yield()) works makes me think you may have a thread that has the same priority as the audio thread.

Ward · May 8, 2024

So the audio task is 95 right?

I checked and these are the tasks I have running.

IO task (80): handles MIDI over UART and OSC, doesn't seem to cause trouble, but not extensively tested.
GUI task (60): scheduled around 30 times per second. Generates graphics in a 256x64 x 4bit frame buffer. Also sometimes parses JSON to/from files. Doesn't seem to cause trouble (as long as I don't accidentally program infinite loops haha)
Tree query task (35): searches a tree for points matching certain data. Occasionally called when the search data has changed. If Bela_cpuMonitoringGet()->percentage is 70+% and the tree is queried for a lot of points (with recursive calls to the same function) it can create dropouts. In one instance where I had a very large tree and a search that basically had it traverse the whole tree I got it to drop multiple blocks and trigger a CPU timeout watchdog.

giuliomoro · May 8, 2024

Lower priority tasks won't preempt the audio thread. See the toy example below to confirm that: even though the aux task never stops, the audio thread produces pristine audio output until the watchdog intervenes. The watchdog intervenes when one thread (any thread) does not block, yield or sleep for over a set time. This time by default is 2 seconds but it can be increased for testing purposes, e.g.: echo 5 > /sys/module/xenomai/parameters/watchdog_timeout.

#include <Bela.h>
#include <cmath>

float gFrequency = 440.0;
float gPhase;
float gInverseSampleRate;

AuxiliaryTask task;

void taskFun(void*)
{
	while(!Bela_stopRequested())
	{
		// waste time
	}
}

bool setup(BelaContext *context, void *userData)
{
	gInverseSampleRate = 1.0 / context->audioSampleRate;
	gPhase = 0.0;
	task = Bela_createAuxiliaryTask(taskFun, 35, "taskFun");
	return true;
}

void render(BelaContext *context, void *userData)
{
	if(0 == context->audioFramesElapsed)
		Bela_scheduleAuxiliaryTask(task);
	for(unsigned int n = 0; n < context->audioFrames; n++) {
		float out = 0.8 * sinf(gPhase);
		gPhase += 2.0 * M_PI * gFrequency * gInverseSampleRate;
		if(gPhase > 2.0 * M_PI)
			gPhase -= 2.0 * M_PI;
		for(unsigned int channel = 0; channel < context->audioOutChannels; channel++) {
			audioWrite(context, n, channel, out);
		}
	}
}

void cleanup(BelaContext *context, void *userData)
{
}

Is it possible that what you are observing is that the auxiliary task triggers some action which in turn causes the audio thread to perform several expensive operations all at once, causing the dropouts?

giuliomoro · May 17, 2024

actually by coincidence I just came across something very similar but I struggle to reproduce it ... currently under investigation.

giuliomoro · May 18, 2024

@Ward are you using the Scope in your project and is it possible that it is the cause of the issue? The Scope's trigger thread is where I came across it.

Ward · May 18, 2024

I have occasionally used the scope but it is currently not included or used

giuliomoro · May 18, 2024

Did you manage to narrow down the issue and make a minimum example that reproduces it?

Ward · May 21, 2024

No I haven't spent any time on reproducing it. Just arrived home from Superbooth, I'll let you know once I find something

giuliomoro · May 21, 2024

OK I verified that what I am observing in the Scope is actually not a scheduling issue per-se: the message queue between the two threads becomes full and thus the writing thread (the audio thread in my case) has to stop and wait for the other thread to complete what is currently doing and retrieve the next element from the queue so that it can write the next element, resulting in a priority inversion. The user error in this case is that messages are sent to the queue faster than they are processed, i.e.: the callback runs longer than the period between successive messages being sent to the queue. This is when using the AuxTaskRT class (which implements a RT thread and a message queue) but I think a similar issue could be encountered when using the Pipe class for passing messages to a thread started in any way (e.g.: with the Bela_...AuxiliaryTask() API), if one writes to the queue faster than the thread can empty it and it eventually fills up.

Ward · May 22, 2024

I think this might explain what was happening.

Every time a parameter changes in a SynthVoice that uses tree data this happens:

if (change) {
    pipe.writeRt(/*struct with tree query parameters*/);
    Bela_scheduleAuxiliaryTask(treeQueryTask);
}

This is in the audio thread.

It is very much possible that data is sent faster than it it consumed. There can be up to 8 SynthVoice's per track and if each track has selected the SynthVoice type that needs tree data there are up to 40 voices. Now if all of these voices are playing, and if the parameters have modulation each voice will write into the pipe each audio block (64 samples). And if the trees are very large and thus queries take some time, it is very much possible indeed that the Pipe is not emptied fast enough.

This means that my fix of adding usleep(0) into the recursive tree query doesn't really solve the issue. Dropouts did decrease quite a bit with usleep(0) but I think I still heard the occasional dropout during demonstrations at Superbooth.

Now that I'm typing up this post I realize that I don't need request new data every block, but actually once every 8 blocks or less often. I'm going to try and see if it works by simply sending requests less often.

If that doesn't work perhaps another solution would be to empty the pipe at once, look for duplicate requests or requests that have been superseded by newer requests and then process only the newest requests. Is there a way to read the messages in Pipe in LIFO?

giuliomoro · May 22, 2024

Ward If that doesn't work perhaps another solution would be to empty the pipe at once, look for duplicate requests or requests that have been superseded by newer requests and then process only the newest requests. Is there a way to read the messages in Pipe in LIFO?

No, but you can drain the pipe on the receiver side before processing the messages:

MyStruct mystruct;
bool read = false;
while(pipe.readNonRt(mystruct) > 0)
{
  // read all pending messages and discard them apart from the last one.
  // this should be cheap enough and not use too much CPU
  read = true;
}
if(read)
{
    // process content of mystruct, which contains the latest message
}

though it may be more complex in your case.

Once you have appropriate "reasonable" throttling on the sending side, this will take care of quickly draining the pipe avoiding the effort of parsing/processing stale messages and - most importantly in this case - it will ensure the queue doesn't fill up.

Ward · May 29, 2024

To prevent the pipe from overflowing I now only write into the pipe when new data is needed, and when the data we need is different from what we already have. I also now fully drain the pipe and then only handle the valid messages.

Querying the tree does not cause underruns anymore and I was able to remove this bit:

Ward I added usleep(0) to every x steps searching through the tree and that solved the dropouts.

So I think it is safe to say the underruns where indeed caused by filling the pipe faster than it was emptied. Thanks for the help!