Hi, pretty new to Bela, and this is my first forum post!
I couldn't find anything on how to use the Cycle Counter for execution time profiling anywhere, but after a couple of evenings playing around and Googling, I've managed to piece it all together and got it working.
Thought I'd share it for anyone else interested.
The ARM CPU has a 32 bit cycle counter register (called CCNT), which increments every clock cycle (and can also be set to increment every 64 clock cycles instead, for longer intervals).
By reading this register before and after a piece of code and taking the difference, you can tell how many clock cycles the code took to execute.
If doing this in a high priority primary thread (e.g. in render()) then you will get the true execution time of your code, without interruptions from other tasks.
You can use this to directly measure the execution time of any piece(s) of code you want.
By default, however, this register cannot be read by user programs (attempting to do so generates an illegal instruction error - ouch!). A kernel module can be used to change the register settings to allow user access.
So, there are three things that you need to do:
1. Build and install a kernel module that allows access to cycle counter to user code.
2. Configure the counter registers.
3. Read the cycle counter at appropriate points in your code, whilst running in a primary (real-time) thread.
To make this as easy as possible, I've created a couple of github projects:
Firstly, [https://github.com/sjbaines/kernelModuleCycleCounter]
This isn't actually a Bela project - it is a simple project to build the required kernel module that then allows Bela projects to access the CCNT.
However, you can still drag it into the IDE as an easy way to transfer the files, though the IDE will complain a little.
If you prefer, you can copy the contents manually to wherever you want on your Bela - no need for it to be within the projects folder.
Once the files are copied, follow the instructions from the top of the C file to make and install the kernel module.
(Basically this is just make to build, then insmod to load the module).
Once the module is loaded, the Cycle Counter can be used.
The second project [https://github.com/sjbaines/belaCycleCountProfileTest/tree/main] provides a simple example framework which times the execution of a bunch of small sections of code, both from setup and from render. In both cases, the code is run in primary mode, so the timing shouldn't be affected by anything else.
Example output (truncated):
SETUP: Cycles for 'profileTest_empty': 448 45 445 32 32 32 32 32
SETUP: Cycles for 'profileTest_sin': 32662 14691 14593 14479 14477 14583 14437 14465 SETUP: Cycles for 'profileTest_sinf': 14821 14476 14531 14585 14421 14502 14584 14506 SETUP: Cycles for 'profileTest_sinf_neon': 17413 8800 8759 8763 8765 9009 8808 8760
RENDER: Cycles for 'profileTest_empty': 413 44 231 44 44 234 45 244
RENDER: Cycles for 'profileTest_sin': 18533 14989 14824 15317 15410 15106 14623 15879 RENDER: Cycles for 'profileTest_sinf': 16995 16176 15500 16080 15668 15705 14833 15023 RENDER: Cycles for 'profileTest_sinf_neon': 9848 9463 8850 8850 9274 9424 9093 9043
The 'SETUP' results are the cycle counts for running the named function a bunch of times in primary mode from setup() using AuxTasks.
The 'RENDER' results are the same directly from within the render thread, for comparison.
I am surprised at how much variation there is in the counts from run to run, especially between the first and second runs of 'profileTest_sinf_neon', where the first takes almost an additional 9k cycles.
The tests all run in primary mode, so nothing else should be interrupting them.
If anyone can explain sources of this variation, I'd be interested.
Anyway, hope this is of use to someone!
Cheers - Steve