the smallest one that fail!
heavy on newest bela release
if i try "list" in gdb i get:
1 ../sysdeps/arm/crti.S: No such file or directory.
backtrace
, then frame x
to move the frame x
of the ones listed. I will have a look at what you sent me later today.
i don't really have an idea what this tells me...but i get the feeling this is not to promising
root@bela:~/Bela/projects/reverb# gdb reverb
GNU gdb (Debian 7.12-6) 7.12.0.20161007-git
Copyright (C) 2016 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law. Type "show copying"
and "show warranty" for details.
This GDB was configured as "arm-linux-gnueabihf".
Type "show configuration" for configuration details.
For bug reporting instructions, please see:
<http://www.gnu.org/software/gdb/bugs/>.
Find the GDB manual and other documentation resources online at:
<http://www.gnu.org/software/gdb/documentation/>.
For help, type "help".
Type "apropos word" to search for commands related to "word"...
Reading symbols from reverb...done.
(gdb) backtrace
No stack.
(gdb) run
Starting program: /root/Bela/projects/reverb/reverb
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib/arm-linux-gnueabihf/libthread_db.so.1".
[New Thread 0xb693d450 (LWP 822)]
__hv_noteout: 0xd1d4ac2
__hv_ctlout: 0xe5e2a040
__hv_pgmout: 0x8753e39e
__hv_touchout: 0x476d4387
__hv_polytouchout: 0xd5aca9d1
__hv_bendout: 0xe8458013
__hv_midiout: 0x6511de55
: 0
Thread 1 "reverb" received signal SIGBUS, Bus error.
0x00041e48 in sLine_init ()
(gdb) list
1 ../sysdeps/arm/crti.S: No such file or directory.
(gdb) backtrace
#0 0x00041e48 in sLine_init ()
#1 0x0004474c in Heavy_bela::Heavy_bela (this=0xa2e58, sampleRate=44100,
poolKb=10, inQueueKb=2, outQueueKb=0)
at /root/Bela/projects/reverb/Heavy_bela.cpp:66
#2 0x000446c8 in hv_bela_new_with_options (sampleRate=44100, poolKb=10,
inQueueKb=2, outQueueKb=0) at /root/Bela/projects/reverb/Heavy_bela.cpp:50
#3 0x00067c14 in setup (context=0x8a9f0 <gContext>, userData=0x0)
at /root/Bela/projects/reverb/render.cpp:399
#4 0x00026b3c in Bela_initAudio ()
#5 0x00019312 in main ()
Warning: the current language does not match this frame.
(gdb) frame 0
#0 0x00041e48 in sLine_init ()
(gdb) frame 1
#1 0x0004474c in Heavy_bela::Heavy_bela (this=0xa2e58, sampleRate=44100,
poolKb=10, inQueueKb=2, outQueueKb=0)
at /root/Bela/projects/reverb/Heavy_bela.cpp:66
66 numBytes += sLine_init(&sLine_Eco7brLC);
(gdb) frame 2
#2 0x000446c8 in hv_bela_new_with_options (sampleRate=44100, poolKb=10,
inQueueKb=2, outQueueKb=0) at /root/Bela/projects/reverb/Heavy_bela.cpp:50
50 return new Heavy_bela(sampleRate, poolKb, inQueueKb, outQueueKb);
(gdb) frame 3
#3 0x00067c14 in setup (context=0x8a9f0 <gContext>, userData=0x0)
at /root/Bela/projects/reverb/render.cpp:399
399 gHeavyContext = hv_bela_new_with_options(context->audioSampleRate, 10, 2, 0);
(gdb) frame 4
#4 0x00026b3c in Bela_initAudio ()
(gdb) frame 5
#5 0x00019312 in main ()
(gdb)
- Edited
well, so it's indeed an alignment issue as in the earlier issue. First off, I should have asked you to compile with CPPFLAGS="-O0 -g" CFLAGS="-O0 -g"
(as it affects both C++ and C files), then you'd be able to see:
Using host libthread_db library "/lib/arm-linux-gnueabihf/libthread_db.so.1".
print o-[New Thread 0xb693c450 (LWP 28552)]
__hv_noteout: 0xd1d4ac2
__hv_ctlout: 0xe5e2a040
__hv_pgmout: 0x8753e39e
__hv_touchout: 0x476d4387
__hv_polytouchout: 0xd5aca9d1
__hv_bendout: 0xe8458013
__hv_midiout: 0x6511de55
: 0
>
Thread 1 "reverbtest" received signal SIGBUS, Bus error.
sLine_init (o=0xa6d48) at /root/Bela/projects/reverbtest/HvSignalLine.c:32
32 o->x = vdupq_n_f32(0.0f);
Then you'd inspect the address of o->x
:
(gdb) print &o->x
$3 = (float32x4_t *) 0xa6d58
(gdb)
and you see that it is a float32x4_t
which should be aligned to 16 bytes, but is actually aligned to 8 bytes.
Turns out the problem is the whole Heavy_bela
object should be aligned to 16 bytes, but it is not. I will have to dig deeper into this.
giuliomoro I will have to dig deeper into this.
i see, so would one option be to change the heavy generated c files before compiling? or is there no way around the bug in clang?
thanks for looking into this and i guess i will stay with gcc for the moment.
- Edited
digging deeper, it seems it really is an issue with the clang
.
This simple test program
#include <arm_neon.h>
#include <stdio.h>
#include <assert.h>
class MyClass {
public:
float32x4_t c;
};
int main()
{
printf("alignof(MyClass): %d\n", alignof(MyClass));
printf("alignof(float32x4_t): %d\n", alignof(float32x4_t));
for(int n = 0; n < 100; ++n)
{
auto ptr = new MyClass;
printf("ptr: %p\n", ptr);
printf("c: %p\n", &ptr->c);
assert(((unsigned int)ptr & (alignof(MyClass) - 1)) == 0);
}
return 0;
}
fails with clang
. It work with gcc
, but then gcc
claims that alignof(float32x4_t)
is 8 instead of 16, which I don't understand ...
So, a current workaround would be to override operator new
in the Heavy_bela
class adding to the class Heavy_bela
class declaration in Heavy_bela.hpp
:
#include <new>
....
class Heavy_bela {
...
void* operator new(size_t sz) {
auto ptr = aligned_alloc(16, sz);
if(!ptr)
{
std::bad_alloc exception;
throw exception;
}
return ptr;
}
...
}
however, the issue is fairly worrying and I need to find out if this is related to the version of clang
, and if there is a more up-to-date one we can use instead.
If this turns out to be needed, we could add the change to the hvcc
template files.
- Edited
While I was looking into the performance align I noticed there is a Clang command line flag to set the default new() align, might be worth looking at that?
What I found is an ancient discussion of gcc devs: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=15795 . They mention that according to the c++ standard, the alignment of the pointer returned by operator new
is undefined. As such, the behaviout observed with clang
is the expected behaviour, however this makes all c++ code using the default operator new
for classes with intrinsics unusable on Linux, where the underlying call to malloc()
returns values aligned to 8 byte. On Mac (and I shall assume iOS), malloc()
always returns a pointer aligned to 16 bytes, making this not an issue for many intrinsics.
So yes, please, if you found a suitable flag for clang
, tell us!
Try:
-faligned-new=16
looks like that option is not supported on the clang
/ gcc
on the board. It was probably introduced more recently.
- Edited
Ah, what version of clang is in the board?
It's 3.9 on the board. clang 6.0
seems to have it, and you can install it with apt-get
from stretch
backport (though I haven't tried: I just downloaded if from llvm.org).
Haven't managed to make it produce the desired code yet, though.
For whatever reason, on clang-3.9
, alignof(float32x4_t)
is 16 bytes, while on clang-6
(and on gcc-6.3
), it is 8 bytes. Especially the clang-3.9 vs 6 is particularly weird, because their respective arm_neon.h
includes are very similar ....
I'm a bit confused!
I though the original issue was that 3.9 wasn't aligning to 16?
this is for android (arm) but seems to talk about the clang problem:
- Edited
AndyCap I though the original issue was that 3.9 wasn't aligning to 16?
The original issue is that on clang-3.9
, alignof(float32x4_t)
is 16, and it (probably) generates assembly code that assumes the memory to be aligned. If the memory is not effectively aligned to a 16 bytes boundaries, you get a SIGBUS.
However, when allocating with operator new
an object of a class that contains one or more float32x4_t
, there is no guarantee that these will effectively be allocated with the appropriate alignment. The underlying issue is that according to the C++ standard, the alignment of a pointer returned by operator new
is undefined. Hence the workaround above with a custom allocator. It seems that C++17 offers some more clever support for aligned heap allocation, but I didn't get into the details of that.
What is surprising is that while alignof(float32x4_t)
is 16 bytes on clang-3.9
, it is actually 8 bytes on gcc-6.3
and clang-6.0
. I guess that this could explain the both:
- the performance penalty for gcc-6.3
vs clang-3.9
when using intrinsics
- the lack of SIGBUS on gcc-6.3
: the generated assembly does not make assumptions about alignment that will then be neglected at runtime.
Seeing how this is alignof(float32x4_t)
has become 8 bytes in clang-6.0
, I am afraid this may also come with a performance penalty (but I haven't verified it yet). However, I am wondering where that difference comes from, as the arm_neon.h
files for the two versions of clang
provide identical definition for all the neon vector types.
Incidentally, given how the change in alignof
can also change the size of struct
s and class
es, this suggest that header files for libraries should abstract away such classes and structs, so that the library will work fine even if the compiler used to compile it is different from the one used to build the user code that uses it.
so where exactly do i put this in the file? in the public part of the class? sorry if this is obvious....
yes
giuliomoro thanks, this works!
- Edited
Did some tests for CPU usage with the three compilers. On a v0.3.6b image:
Techno-world:
heavy/gcc-6.3 25%
heavy/clang-3.9 21%
heavy/clang-6.0 24%
Your reverbtest patch:
heavy/gcc-6.3 16%
heavy/clang-3.9 18%
heavy/clang-6.0 16%
noting that, in both cases, clang-3.9
needs the fix above in order to run.
The fact that clang-6.0 is slower on techno-world
is pretty annoying, though.
giuliomoro interesting...
so that is the value, the IDE shows,or how do you measure? might need to check on my actual granular patch...
It's actually the value you see by doing
watch -n 0.5 cat /proc/xenomai/sched/stat
which is what the IDE grabs and displays. Running the above in the terminal is more lightweight than through the IDE, so you can run it even if the CPU usage is so high that the IDE is not responsive.
ok, so i notice something funny:
when the IDE is not running my patch is running at 69 to 70 %, when i run the IDE that value fluctuates from 58 to 70 %
The CPU usage indicator in the IDE takes into account all the threads of the program, and both their Xenomai CPU usage (as shown in /proc/xenomai/sched/stat
) and their Linux CPU usage (as shown by the top
command). So for instance, if you have the scope window open, and you are using the Scope, then the CPU usage will differ between the IDE and the terminal, but in the IDE it should be higher. So I don't think it makes sense to explain your case.
Is there anything in the patch that would change the CPU usage over time? Wanna send it over?
giuliomoro Is there anything in the patch that would change the CPU usage over time? Wanna send it over?
i don't think so. also when the IDE is not running the CPU usage stays the same all the time. note that i used the "watch" method in both cases (with and without IDE) to measure usage.
sure, i'll send the patch.
I have included the aligned new
allocator in the heavy template file on my repo : https://github.com/giuliomoro/hvcc/commit/35a5bda6063025e2fc99904e1c47702152924c08 so now the fix will be automatically applied to newly built projects. Also fixed behaviour (on the Bela side) when the expected MIDI interface is not connected (https://github.com/BelaPlatform/Bela/commit/19349c4760b03e19fb3dfd343b604cebd1c506f9 ).
As for the performance of your patch, it runs at about 26% for me, not sure if it's because I don't have any buttons/MIDI hooked up.
what? 26% with clang? which version? how could not connected buttons have an influence on cpu load? i will check out your repo and see. since i have a spare bela i can also try with a bareboard
that's clang-3.9
, stock compiler on the board. Maybe you still have some files that were built with -g -O0
above? -g
adds debugging info to the file, but -O0
disable all optimizations. Maybe try cleaning the project (delete the build/
folder in there, or equivalently run make -C ~/Bela PROJECT=projectName clean
and rebuild?
lokki ? how could not connected buttons have an influence on cpu load?
I don't know if there is something you do with buttons and MIDI that changes the CPU processing? Your patch is too complicated for me to understand it at a glance!
- Edited
giuliomoro I don't know if there is something you do with buttons and MIDI that changes the CPU processing? Your patch is too complicated for me to understand it at a glance!
sorry, of course it is a mess actually
i only start writing to tables and change which tables are played back with the buttons. midi is only for note input to change playback speed and aftertouch to scroll through the table when in "scrub" mode. i am now recompiling with your new hvcc repo, and i will disconnect all midi controllers to see if it makes a difference in load. but my CPU was a steady 69 to 70 % regardless of any midi activity or button presses. that is why i was so astonished, the simple fact that a midi device and buttons are connected should not make that much of a difference.
EDIT: ahem stupid me i guess, should i be looking at the bela-audio load? that is as you suggested 26-28% i was looking at the ROOT process which takes 70%..
Every 0.5s: cat /proc/xenomai/sched/stat bela: Wed May 8 19:10:56 2019
CPU PID MSW CSW XSC PF STAT %CPU NAME
0 0 0 4353657 0 0 00018000 70.0 [ROOT]
0 1681 9 11 26 0 000600c0 0.0 granular
0 1691 2 3 4 0 000480c0 0.0 bela-midiIn_hw:0,0,0
0 1692 2 3 4 0 000480c0 0.0 bela-midiOut_hw:0,0,0
0 1693 5 761787 761794 0 00048046 28.1 bela-audio
0 0 0 2427976 0 0 00000000 0.7 [IRQ16: [timer]]
0 0 0 380980 0 0 00000000 0.7 [IRQ180: rtdm_pruss_irq_irq]
does that look reasonable? does ROOT just take all that is left to get to 100% then? (i seem to remember some discussion in the forum between you and thetechnobear where this came up as well)
lokki EDIT: ahem stupid me i guess, should i be looking at the bela-audio load? that is as you suggested 26-28% i was looking at the ROOT process which takes 70%..
Yes!
lokki ? does ROOT just take all that is left to get to 100% then?
yes. That is the time dedicated to Linux (including threads from your Bela program that run in secondary mode). For reference, if you were using 70% of the CPU for audio, the IDE would basically stop working, and commands on the terminal would be fairly sloppy.
ah good. sorry for the noise then
- Edited
one more question regarding cpu load:
i get some underrun one block dropped messages when i run my patch. however i can never ever hear any dropouts, could those messages be wrong or am i just lucky i never hear them (maybe because all changes to table playback speed and direction only happen via a samplehold~ of the controlling phasor)