ok, this is a bit of a repeat of a previous post, but I wanted to dive into it a bit...
background:
a while back I ported the mutable instruments modules to 'pd externals' and released them for the Organelle, naturally Im now using these on bela, and planning on releasing them for bela and salt!
source code (including makefile) is here
https://github.com/TheTechnobear/Mi4Pd
the issue is I'm seeing pretty poor performance on Bela.
lets take one external as and example rngs~ , aka rings.
at a high level Organelle, both use an 1ghz single core arm, so are 'similar'
on an Organelle i can have 5 instances, and reach around 80% or less, still no audio issues, organelle is responsive.
(pd -audiobuf = 4 , so ~176 samples?)
on bela/salt, 2 instances, bring it to its knees - unresponsive etc.
(libpd -p128, i tried 256 is made little difference really, still not to organelle levels)
lets also consider, the rings code is written to run on a STM32F4 at 168mhz, so really the organelle is more in the ballpark of what id expect. ( iirc stm32f4 is neon)...
bela is performing more like one F4/168mhz, surely that's not right?
ok, there are some differences
organelle A9 1ghz, running at 44100 SR, pd -audiobuf 4 = 176
bela A8 ghz, running 48000 SR, 128 samples.
but I really do not think this accounts for the difference of 3x performance.
also bare in mind, the organelle is also running another 'mother process' and osc for the display, 4 pots... and is running a 'vanilla' arch linux distro.
i dont think the difference is in PD either ...
when running in libpd or pd, its the rngs external which is eating cpu, the overhead of pd is going to be neglible, esp when running multiple instances... as pd is only really glue thats calling the render function on the external.
compiler options, ok, you can see these in the CMakefile...
Ive got the recommend ones for Bela, but does not help, and actually makes little difference to using the same as i use for Organelle.
audio mode switches - there are no more switches, after the initial startup
monitoring
im using 'top' on the organelle, and looking at /proc/xenomai/sched/stat on bela.
on bela i can see most load (87%) on the bela audio thread.
so... im wondering really what to look at next?
compiler options :
i dont think this is the cause
libpd :
perhaps some oddity with buffer size?
im thinking to get rngs~ to report the buffer size and then see what i see on both organelle and bela,
but the code copes with different buffer sizes.
more effort, but I could compile rings as a native bela render.cpp, but that's a bit more effort.
id only be able to run one instance, but i could see if that load for one, was signficantly different to running one instance inside libpd (it shouldn't be)... but its a few hours for work that id probably 'bin' later.
also, generally, id say this thread is not about this one case...
it feels like the libpd perhaps on a number of my patches, is actually quite poor, so perhaps there is something a miss in the libpd render, that can be fine tuned... or something in the compile options.
im really not sure
finally, its quite possible im missing something,
perhaps how im measuring things, expectation, or something else?
so really im here, just to get some ideas of what to look at / try...
its not a critcism ... really just seeing if we can improve things.
thoughts on what to try?
footnotes
testing...
with above source, you can build, and then just throw on a pd patch, if you'd like me to put together a test patch i can do that too.
last tests were on salt from workshop which is running 3.3?
(ps. for this build, you should remove your global github creditials )