measuring performance in render.cpp

Ward · May 6, 2022

So in my instrument I want to be able to have some form of CPU measurement to print on a display.

Can one acquire the CPU percentage that the IDE displays in C++? Here I read that "Monitoring CPU and mode switches from the IDE carries quite some CPU overhead.", which means monitoring CPU is wasting CPU right?

In the IDE I get about ~14% but in /proc/xenomai/sched/stat I get the following:

CPU  PID    MSW        CSW        XSC        PF    STAT       %CPU  NAME
  0  0      0          3660892    0          0     00018000   90.6  [ROOT]
  0  7605   9          11         28         0     000600c0    0.0  poly-workstatio
  0  7617   2          3          4          0     000480c0    0.0  0x42f80
  0  7619   2          3          4          0     000480c0    0.0  0x4313c
  0  7620   1          17932      19054      0     00048046    8.6  bela-audio
  0  0      0          2545781    0          0     00000000    0.3  [IRQ16: [timer]]
  0  0      0          8976       0          0     00000000    0.3  [IRQ21: rtdm_pruss_irq_irq]

Am I missing something? How does 8.6% become 14%? Is the IDE CPU percentage a sum of /proc/xenomai/sched/stat and ps -p 'myAppPID' -o %cpu --no-headers?

giuliomoro · May 7, 2022

/proc/xenomai/sched/stat only lists the time spent in primary (Xenomai) mode, which is why some of your threads have exactly 0 CPU time. So you should get the time they spend in secondary (Linux) mode from ps. Furthemore, the total amount of time that Linux (and ps) knows about is the one marked as ROOT in /proc/xenomai/sched/stat.

The details of how that's done in the IDE are here https://github.com/BelaPlatform/Bela/blob/master/IDE/src/CPUMonitor.ts

On the other hand, if all you care about is the CPU usage of the audio thread, you can get a pretty good estimate for relatively cheap by using the functions showcased here (API is not finalised and may change) https://github.com/BelaPlatform/Bela/tree/dev/examples/Extras/cpu-monitoring (requires the latest dev branch).

Iinfinitedigits · May 10, 2022

Great thread. I'm so glad to see Bela getting its own API for profiling the audio thread!

I recently rolled my own version of this, it calculates the time at the beginning of the render block and then at the end it prints out the output (https://github.com/schollz/palms/blob/5734b4b66b594575f8a87f7d3dbbf3a3d30fa9a2/render.cpp#L152-L160) and shows it as a percentage of the total available time for executing a single block (assumed to be number of samples in block / sample rate). Its been really useful to see when I'm overloading it...so far I've seen that percentages >70-80% will have a block dropped every few minutes whereas percentages <70% never drop a block.

EDIT: Do not do this! See giuliomoro's comment below

giuliomoro · May 10, 2022

Thanks for sharing that, one issue with your approach is that std::chrono causes a mode switch (because it calls into the Linux kernel) and is therefore not real-time safe (it may even be the cause of dropouts). You should be able to achieve 100% CPU usage without dropouts by adding --high-performance-mode to your command line options (at the expense of the IDE becoming unresponsive, of course, but you can recover by stopping the program with the button on the Bela cape). I just realised that this option is not exposed to Supercollider while it probably should be (or enabled by default).

giuliomoro · May 10, 2022

Another issues with using std::chrono is that it may not give an accurate measurement of time (because it goes through Linux, etc). Btw, just a note that there could perhaps be a better way of measuring CPU time of a Xenomai thread (see https://www.xenomai.org/pipermail/xenomai/2022-May/047672.html ).

Iinfinitedigits · May 10, 2022

Cool! I did not not know that, thanks for steering me back to the real-time track I will give the cpu-monitoring a try instead!

giuliomoro · May 10, 2022

Btw it would be nice to implement a minimum of stats (e.g.: min, max, (or even 95% percentile, but that's significantly more complex than the others)). Currently we are only taking the mean of the last count measurements, but if you have code paths with different CPU load (and/or your code is severely affected by cache misses and is memory-intensive or something memory-intensive is running alongside), it may be good to see what the max is, as that's the one that matters the most for real-time purposes.

Ward · Feb 23, 2023

giuliomoro The details of how that's done in the IDE are here https://github.com/BelaPlatform/Bela/blob/master/IDE/src/CPUMonitor.ts

This is very handy, thanks!

I'm running this example as currently included in the most recent image. I get the following results:
CPU value displayed in IDE: 15%
total: 8%
noise: 0.85%
filter: 0.75%

The value the IDE reports is the total CPU usage including Linux, the IDE, the scope and gui server and the users application? The value total reports is all time spent in the void render() function? What does .count = 100 do? Is the cpu percentage averaged over 100 measurements?

Assuming high performance mode is enabled, how high can the value reported by Bela_cpuMonitoringGet() before underruns occur / blocks are dropped? 99%?

My application has quite a dynamic CPU load due to various synthesis models, polyphony, audio effect chains. I'm monitoring context->underrunCount to notify the user if blocks are dropped. I want to undertake actions like muting old or barely audible synth voices when the max CPU crosses a certain threshold (e.g. 95%). Can I set the count to 1 and then check the value against a threshold?

Does setting count to a lower value mean more instructions are executed to measure the CPU? Is there a maximum number of BelaCpuData instances I can add (e.g. to measure what synthesizer or audio effect chain on which track uses the most CPU)?

giuliomoro · Feb 28, 2023

Ward The value the IDE reports is the total CPU usage including Linux, the IDE, the scope and gui server and the users application?

It doesn't include any IDE CPU consumption, only CPU usage of the Bela project.

Ward he value total reports is all time spent in the void render() function?

yes

Ward What does .count = 100 do? Is the cpu percentage averaged over 100 measurements?

yes. See https://github.com/BelaPlatform/Bela/blob/dev/core/RTAudio.cpp#L1130-L1156

Ward Assuming high performance mode is enabled, how high can the value reported by Bela_cpuMonitoringGet() before underruns occur / blocks are dropped? 99%?

In principle, 100%.

Ward Does setting count to a lower value mean more instructions are executed to measure the CPU?

one double division:

	if(data->count == data->currentCount)
	{
		data->percentage = (double(data->busy) / data->total) * 100;
		data->busy = 0;
		data->total = 0;
		data->currentCount = 0;
	}

Ward . Can I set the count to 1 and then check the value against a threshold?

sure.

Ward Is there a maximum number of BelaCpuData instances I can add (e.g. to measure what synthesizer or audio effect chain on which track uses the most CPU)?

no hard limit, but each call to Bela_cpuTic() and Bela_cpuToc() has a (small) overhead associated with the clock_gettime() call. To estimate the overhead you can do something like this in render():

Bela_cpuToc(estimateCpuData);
Bela_cpuToc(dummyCpuData);
Bela_cpuTic(dummyCpuData);
Bela_cpuTic(estimateCpuData);

(with properly intialised estimateCpuData and dummyCpuData) and look at estimateCpuData->percentage every so often.