Using Bela as a controller for MRP

giuliomoro · Jan 19, 2019

wa3573 I actually stuck with boost for the circular_buffer, even though I had already implemented one, if it's not broken, why fix it?

A good reason would be to avoid the dependency on boost, especially if that is the only component you are using from the whole library. Looking at the API of boost::circular_buffer, it should be fairly easy to replicate with std::vector and an extra index.

wa3573 cut down on CPU load

One place where I think the original TK software is pretty inefficient is in the fact that a new mapping object is created every time the key position exceeds a given threshold. I'd suggest to pre-allocate all the objects and only start calling Trigger() when they key position exceeds a given threshold, until the key press is completed.

wa3573 Since I used the pthread library to retrofit the TouchKeys source code

How many threads do you have currently?

wa3573 Since I used the pthread library to retrofit the TouchKeys source code, I assume I could just use the Xenomai wraps as is done here

The use of plain or __wrap() threads really depends on what each thread is doing. What I normally do, as a general rule, is:
- a thread with real-time requirements, and which is real-time-safe, should be created as a Xenomai thread (using __wrap_pthread... functions)
- a thread that is not real-time safe (e.g.: it accesses Linux drivers, does disk or network or USB I/O) should be created as a regular thread (using pthread_... functions).
However, I am not 100% positive that this is the best approach in all cases.

Things become more complicated when using synchronization primitives: any Xenomai thread should only use mutexes, condition variables, semaphores, message queues, that are themselves managed by Xenomai (i.e.: created, modified and accessed with the __wrap_ functions). Failing to do so would cause a "mode switch", that is the thread would temporarily turn into a Linux thread, thus losing real-time guarantees. You definitely don't want to do this for threads that have real-time requirements. On the other hand, non-Xenomai threads cannot access synchronization primitives that were created by Xenomai.
There is only one synchronization primitive, that also works as message-passing, that would allow Xenomai threads to communicate with non-Xenomai threads (even across-processes) without causing a mode switch in the Xenomai thread: XDDP (cross-domain datagram protocol). This is what we use e.g.: in the Midi and AuxTaskNonRT classes. Check out the Xenomai XDDP examples for some more details.

So, I think (but I didn't benchmark it) that XDDP is the fastest way of communicating between a RT and a non-RT task, where the former is a Xenomai task and the latter is not. However, it make the code less portable, by using this custom protocol.
As an alternative, if there are one real-time and one non-real-time threads that need to share resources, I sometimes use an unorthodox approach: make them both Xenomai threads. You can keep the priority of the non-real-time thread to 0, and it this thread will switch automatically from primary ("Xenomai RT-safe") and secondary ("Linux") mode whenever needed. I.e.: a thread that does serial I/O but also shares a mutex with the audio thread would switch to primary mode when it tries to get the mutex, and would switch back to secondary mode when doing I/O. This comes with a performance penalty: every time the thread switches mode, it wastes some CPU cycles. I am not sure HOW MANY, I think you lose about 20-40 microseconds for each mode switch.

If you are planning on making all of your threads Xenomai threads, and all of the pthread_ calls Xenomai calls, then you actually ... could avoid that altogether: there are some compiler flags provided by Xenomai that will just turn all of your pthread_ (and some more) calls into the equivalent Xenomai call. See the documentation here (section: Under the hood: the --wrap flag). The Bela Makefile currently deliberately removes these linker flags (see the Makefile source where it assigns DEFAULT_XENOMAI_LDFLAGS). You will have to re-add them for your build.

To sum up, I would encourage you to implement all you need using the regular pthread_, then compile and link your application adding the appropriate LDFLAGS. Try to run it this way and see if it all works fine. If you have one thread that switches mode very often (you can monitor the MSW column in the threads stats in /proc/xenomai/sched/stat while the program is running), so much that it becomes a performance issue, then you may want to consider moving that (those) thread(s) to a regular Linux thread (calling __real_pthread_...), and use XDDP to communicate with it from a RT thread (if needed).

Hope this helps.

giuliomoro · Jan 19, 2019

Another note: I'd discourage using automatic symbol wrapping in combination with some standard C++ library classes (e.g.: std::thread, std::mutex). See here https://www.xenomai.org/pipermail/xenomai/2017-March/037223.html

Wwa3573 · Jan 20, 2019

giuliomoro A good reason would be to avoid the dependency on boost, especially if that is the only component you are using from the whole library. Looking at the API of boost::circular_buffer, it should be fairly easy to replicate with std::vector and an extra index.

I have actually built it successfully with the boost version of the buffer. It's a header-only include for that particular object. There are also some other uses of boost like bind() and prior(), but those have std:: library counterparts which could replace them. I simply felt that I could count on the reliability of boost more so than my own implementation, in this case. Either way, I do still have my implementation (you can see it here, if you wish https://github.com/wa3573/Drexel_MRP_Key_Scanner/blob/157246dea25403c95c2b11f91594e68a77b3a80c/serial-piano-scanner/Utility/circular_buffer.h), if I does seem that boost is causing problems.

giuliomoro I'd suggest to pre-allocate all the objects and only start calling Trigger() when they key position exceeds a given threshold, until the key press is completed.

Indeed, good call.

giuliomoro How many threads do you have currently?

Currently there is the main program controller thread, an IO thread (which handles USB communication), a RGB LED control thread, a Mapping Scheduler thread and a future events Scheduler. The LED control thread will be disabled until we can verify the functionality of the rest of the program, and may not ever be enabled at all, depending on performance. The scheduling threads may turn out to be unnecessary, and it would be ideal to remove them.

giuliomoro The use of plain or __wrap() threads really depends on what each thread is doing.

Great, that clears things up well for me.

giuliomoro To sum up, I would encourage you to implement all you need using the regular pthread, then compile and link your application adding the appropriate LDFLAGS. Try to run it this way and see if it all works fine. If you have one thread that switches mode very often (you can monitor the MSW column in the threads stats in /proc/xenomai/sched/stat while the program is running), so much that it becomes a performance issue, then you may want to consider moving that (those) thread(s) to a regular Linux thread (calling __real_pthread...), and use XDDP to communicate with it from a RT thread (if needed).

That makes sense to me, I will definitely go this route and let you know how it turns out.

giuliomoro Another note: I'd discourage using automatic symbol wrapping in combination with some standard C++ library classes (e.g.: std::thread, std::mutex). See here https://www.xenomai.org/pipermail/xenomai/2017-March/037223.html

Yeah, I have stuck with only the pthread library, as I was under the impression the standard C++ library didn't necessarily play well with Xenomai, after reading this thread: https://forum.bela.io/d/530-audio-thread-mode-switch-std-thread/21 and this message: https://www.xenomai.org/pipermail/xenomai/2017-March/037220.html

Thanks again for your help, this has all been extremely useful.

giuliomoro · Jan 20, 2019

wa3573 Yeah, I have stuck with only the pthread class, as I was under the impression the standard C++ library didn't necessarily play well with Xenomai,

Don't get me wrong: you can use std::thread, std::condition_variable_any and std::mutex with no problem at all within a Xenomai application, as long as you do not enable automatic symbol wrapping. If you do, then most likely these will stop working, depending on what happened with symbol wrapping.

I did a limited re-implementation of std::mutex and std::condition_variable_any here that automatically turns the current thread to a Xenomai one if needed.

Wwa3573 · Feb 22, 2019

Hey Giulio, hope you're doing well. I am in the process of integrating Touchkeys with the Bela core, and am running into a strange issue. When I compile a test project (external, not in the IDE) containing a single source file (main.cpp) which is simply a combination of the default libpd_render.cpp and default_main.cpp files. I attempted to attach it to this reply, but the upload is not completing. The source compiles fine, but when running, it always breaks on the first call to new Midi(); in openMidiDevice(). Here's some debug info:

[New Thread 0xb6987450 (LWP 724)] Running Pd 0.48-2 Audio channels in use: 2 Analog channels in use: 8 Digital channels in use: 16
Thread 1 "bela_custom_mai" hit Breakpoint 1, openMidiDevice (name="hw:1,0,0", verboseSuccess=false, verboseError=false) at ../Main.cpp:86 86 Midi* newMidi = new Midi(); (gdb) step bela_custom_main: malloc.c:2406: sysmalloc: Assertion '(old_top == initial_top (av) && old_size == 0) || ((unsigned long) (old_size) >= MINSIZE && prev_inuse (old_top) && ((unsigned long) old_end & (pagesize - 1)) == 0)' failed.

When swapped out for malloc() I get memory corruption from the same spot, every time. This is confusing to me, as I am not sure what could be causing the memory corruption with the default renderer and default main files.

Could this have anything to do with optimizations used (or not) when compiling? I believe I still had -O0 but will double-check -O3 yields the same results. PD projects compiled in the IDE do not crash. Any advice or ideas would be appreciated, thanks!

giuliomoro · Feb 22, 2019

Hmm not sure, I would guess there is something wrong before that line. You can paste the full file in your post, but 600 lines in here can be hard to parse. So I'd recommend you put it on github and put a link here so we can review it there.

Wwa3573 · Feb 22, 2019

https://github.com/wa3573/misc/blob/master/Main.cpp

Indeed, I figure the problem is before the first call to openMidiDevice(). The only explicit memory allocation I see before that point is a malloc() at line 346 char* str = (char*)malloc(sizeof(char) * strSize); but this is free'd just afterwards.

I did not change anything from the default files, which is why I am stumped, and am thinking that perhaps there is a problem being introduced during linking

giuliomoro · Feb 22, 2019

that builds and runs fine for me. Do you know what version of the core code you have? Also the image number would be of interest (grep v0 /etc/motd).

I checked that earlier memory allocation again and it looks good, can you try commenting out lines 343 to 352 and see if that changes anything?

Also, can you try compiling simply default_libpd_render.cpp in your project folder without the lines from main.cpp?

Wwa3573 · Feb 22, 2019

Interesting. What flags did you use for the compiler/linker? Just so we're on the same page. From what I understand, g++ on Bela is just an alias for arm-linux-gnueabihf-g++, correct? As in, it will link against those libraries without having to call that linker/compiler explicitly. I recently updated the core, around when we started this project, so it is recent. However, I will post that info in a little while.

I will try all of that and get back to you with the info shortly. Although, if I do not include the lines from main.cpp the linker complains of an undefined reference to main(), since I am compiling this outside of the IDE, I assume.

giuliomoro · Feb 22, 2019

Hmm ok, I had just built it as a Bela program. Why don't you send me the full command line you use to build the file?

g++ should be an alias for arm-linux-gnueabihf-g++ , yes. We use clang++ for all of the Bela stuff, but g++ should work equally fine. To see all the flags used by Bela for compilation and linking, just add AT= to your command line when building a project. For instance, after modifying some files in the project myProject, run this (equivalent to adding AT= in the "Make parameters" field in the IDE and hitting the "build" button in the IDE)

make -C ~/Bela PROJECT=myProject AT=

Wwa3573 · Feb 22, 2019

Thanks, that AT= addition was helpful just to see how it was building internally. Here's the output for that when pointed towards a test PD project without the custom render and main files:

/usr/bin/clang++ -Llib/ -pthread -o "/root/Bela/projects/test/test" build/core/FormatConvert.o build/core/OscillatorBank_routines.o build/core/math_runfast.o build/core/Gpio.o build/core/I2c_Codec.o build/core/PulseIn.o build/core/scope_ws.o build/core/RTAudio.o build/core/UdpClient.o build/core/WriteFile.o build/core/RTAudioCommandLine.o build/core/OSCClient.o build/core/WriteFile_c.o build/core/AuxTaskRT.o build/core/board_detect.o build/core/AuxTaskNonRT.o build/core/Midi.o build/core/AuxiliaryTasks.o build/core/I2c_TouchKey.o build/core/Midi_c.o build/core/Scope.o build/core/PruBinary.o build/core/PRU.o build/core/UdpServer.o build/core/OSCServer.o build/core/GPIOcontrol.o build/core/Spi_Codec.o build/core/JSONValue.o build/core/DigitalChannelManager.o build/core/JSON.o ./build/core/default_main.o ./build/core/default_libpd_render.o -Wl,--no-as-needed -L/usr/xenomai/lib -lcobalt -lmodechk -lpthread -lrt -lprussdrv -lstdc++ -Wl,--no-as-needed -L/usr/xenomai/lib -lcobalt -lmodechk -lpthread -lrt -lasound -lseasocks -lNE10 -lmathneon -lsndfile -lpd -lpthread

So, I've tried using that combination, building as follows:

Invoking: GCC C++ Compiler g++ -std=c++14 -I/usr/local/include/libpd/ -I/home/juniper/Downloads/liblo-0.29 -I/home/juniper/Downloads/boost_1_69_0 -I/root/Bela/include -pthread -O3 -g3 -Wall -c -fmessage-length=0 -MMD -MP -MF"Main.d" -MT"Main.o" -o "Main.o" "../Main.cpp"

Invoking: GCC C++ Linker g++ -o "bela_custom_main" ./Main.o -L/root/Bela/lib/ -pthread -lbelaextra -lbela -llo -Wl,--no-as-needed -L/usr/xenomai/lib -lcobalt -lmodechk -lpthread -lrt -lprussdrv -lstdc++ -Wl,--no-as-needed -L/usr/xenomai/lib -lcobalt -lmodechk -lpthread -lrt -lasound -lseasocks -lNE10 -lmathneon -lsndfile -lpd -lpthread

And this compiles and links successfully, but I get the same error. Same thing if I switch out g++ for clang++

grep v0 /etc/motd Bela image, v0.3.6b, 23 October 2018

Not sure where to look for the core code version

giuliomoro · Feb 22, 2019

Right the problem is that in the compilation step you don't have the command-line -D options that Bela uses.

If I run

make -C ~/Bela PROJECT=test-we run AT=

I get

clang++ -I/root/Bela/projects/test-we -I./include -I./build/pru/ -I/usr/xenomai/include/cobalt -I/usr/xenomai/include -march=armv7-a -mfpu=vfp3 -D_GNU_SOURCE -D_REENTRANT -fasynchronous-unwind-tables -D__COBALT__ -D__COBALT_WRAP__ -DXENOMAI_SKIN_posix -DXENOMAI_MAJOR=3 -O3 -march=armv7-a -mtune=cortex-a8 -mfloat-abi=hard -mfpu=neon -ftree-vectorize -ffast-math -DNDEBUG -DBELA_USE_RTDM -I/root/Bela/resources/stretch/include -std=c++11 -DNDEBUG -Wall -c -fmessage-length=0 -U_FORTIFY_SOURCE -MMD -MP -MF"/root/Bela/projects/test-we/build/Main.d" -o "/root/Bela/projects/test-we/build/Main.o" "/root/Bela/projects/test-we/Main.cpp"

Some of those defines are needed for the header files to work properly. include/Midi.h, for instance, requires XENOMAI_SKIN_native or XENOMAI_SKIN_posix to be defined. This will change at some point when we drop support for the native skin, but for now it's important to have them. Also, I think you should make sure you use the same include paths (including the xenomai ones).

Wwa3573 · Feb 22, 2019

Ah! Thanks so much. That did it. Of course, I was running AT= on a project without any custom source files, so the compiler was never invoked at all :facepalm:

FYI, we have decided to use OSC for all the communication between the Touchkeys process and Pd. This simplifies things, and the actual MIDI on/off we require is just encapsulated in an OSC message automatically anyway.

As you suggested, I have edited the way the Mappings work to avoid unnecessary reallocation during a state change. My solution was to add a member MRPMapping* mrpMapping_; to the PianoKey class, which is pre-allocated when the class is instantiated. The mapping is simply reset() and engage()d when active, and disengage()d when idle. Seems to work well so far. That was certainly a good suggestion!

The next stage is seeing how it runs on the Bela, as is (likely extremely slow, if at all) and then seeing what we can strip out. My gut would be to look at the Scheduler classes first, and try to eliminate them. I think there is a chance that all actions (except for timeouts, obviously) could simply be performed immediately, without too much ill effect. We may not need timeouts at all, really, since we are not receiving from an OSC or MIDI device (other than sysex messages from the external audio routing hardware). If we do, then we could simply delegate only the timeout actions to the scheduler, and perform all other actions (mappings, mainly) immediately. It's a theory that is worth testing, at least.

Let me know if you have any more insights in to the matter, you've been more than helpful so far.

Edit: For reference/posterity, here's the final compile command:
g++ -std=c++14 -I/usr/local/include/libpd/ -I/home/juniper/Downloads/liblo-0.29 -I/home/juniper/Downloads/boost_1_69_0 -I/root/Bela/include -I/root/Bela/build/pru/ -I/usr/xenomai/include/cobalt -I/usr/xenomai/include -march=armv7-a -mfpu=vfp3 -D_GNU_SOURCE -D_REENTRANT -fasynchronous-unwind-tables -D__COBALT__ -D__COBALT_WRAP__ -DXENOMAI_SKIN_posix -DXENOMAI_MAJOR=3 -march=armv7-a -mtune=cortex-a8 -mfloat-abi=hard -mfpu=neon -ftree-vectorize -ffast-math -DNDEBUG -DBELA_USE_RTDM -I/root/Bela/resources/stretch/include -DNDEBUG -O3 -g3 -Wall -c -fmessage-length=0 -MMD -MP -MF"Main.d" -MT"Main.o" -o "Main.o" "../Main.cpp"

Wwa3573 · Feb 26, 2019

One thing I would like to try, and I will let you know how this works out:

If communicating through USB (UART) we are not utilizing Bela's real-time capabilities for anything but the audio thread, which is doing the lifting for our Pd patch. Hence, I think it is worth trying to communicate to the Bela via ethernet from a dedicated board (likely a Raspberry Pi), which will be running the Touchkeys code and generating OSC messages.

My thinking is this, although that would be adding another layer of latency, it may be acceptable and more importantly, it would ensure that the audio processing blocks always have enough time to complete each cycle, maintaining their real-time promise, and other threads don't have to compete for the CPU time that is left over. In essence, this would constitute a mult-core machine (in abstract, at least).

The RPi 3 B+ has a 1.4GHz A8 quad-core CPU, with 1 GB of RAM, so the increased performance may actually beat out any latency introduced from the ethernet interface. I will post updates once we get some testing done on this.

giuliomoro · Feb 27, 2019

Did you hit the CPU limit on Bela already?
Do you have a sense of how much CPU is needed by the audio engine and by the MRP software on Bela
Adding a computer to the game seems more hassle than is needed, if it's not strictly necessary.

Wwa3573 · Feb 27, 2019

I haven't been able to test the CPU load yet. That is more of a contingency plan than anything else. We are also trying to incorporate other features (wifi enabled parameter adjustment and presets, machine learning features), which might end up pushing us over the line. Ideally, yes we wouldn't go that route.

Wwa3573 · Feb 28, 2019

alt text

Something is not right here. I have everything working on my personal (Ubuntu Linux) machine to the point where I can run my adapted Touchkeys code on my machine and send OSC messages to Bela which is running out Pd patch, and everything works as expected and is quite responsive.

However, once I attempt to run Touchkeys on the Bela at the same time, it hangs while attempting to open the Touchkey device. gdb does not seem to play well with the Bela core, so I cannot see the exact line where it is hanging (blocking perhaps?) but through deduction it is this line: device_ = open(inputDevicePath, O_RDWR | O_NOCTTY | O_NDELAY); Is there any reason this would not work on Bela? That is should I use the same flags as you did here: https://github.com/giuliomoro/serial-piano-scanner/blob/master/SerialInterface.cpp. Namely _handle = open(portname, O_RDWR | O_NOCTTY | O_SYNC).

I am attempting to run all threads as vanilla pthreads. Is it possible my pthread calls are being wrapped (and turned into realtime threads) because of the flags being passed to the compiler? Or does that happen just during linking? Do you think there something else that I need to change about the LDFLAGS I posted above?

For now, I will attempt building the Touchkeys code alone, with the only the normal CFLAGS and LDFLAGS to rule those out as the culprits, and to allow me to use gdb if needed.

Thanks again.

giuliomoro · Feb 28, 2019

wa3573 Invoking: GCC C++ Linker
g++ -o "bela_custom_main" ./Main.o -L/root/Bela/lib/ -pthread -lbelaextra -lbela -llo -Wl,--no-as-needed -L/usr/xenomai/lib -lcobalt -lmodechk -lpthread -lrt -lprussdrv -lstdc++ -Wl,--no-as-needed -L/usr/xenomai/lib -lcobalt -lmodechk -lpthread -lrt -lasound -lseasocks -lNE10 -lmathneon -lsndfile -lpd -lpthread

Nothing here is automatically wrapping pthread calls.

wa3573 That is should I use the same flags as you did here: https://github.com/giuliomoro/serial-piano-scanner/blob/master/SerialInterface.cpp. Namely _handle = open(portname, O_RDWR | O_NOCTTY | O_SYNC).

That was used to open an actual UART port. I have not tried opening a USB serial instead.

Can you simply cat /dev/tty... from the terminal for the port you want to use? Does it work?

Wwa3573 · Feb 28, 2019

so in this case portname is actually "/dev/serial/by-id/usb-APM_TouchKeys_B6A358563433-if00". On both my machine and Bela, cat /dev/serial/by-id/usb-APM_TouchKeys_B6A358563433-if00 echoes terminal input until it receives a SIGINT, as expected. I'm not sure if /dev/tty... ports would work for out case because we will (likely) be using a USB hub, as we need to connect to two USB interfaces (the key sensors and the MRP routing hardware). Not sure how the tty... ports are assigned in that case.

So, after compiling and running the Touchkeys app alone on Bela, the same issue persists. I can confirm open() is not functioning as expected.

Thread 1 "serial-piano-sc" hit Breakpoint 1, 0xb6fbfbea in open () from /lib/arm-linux-gnueabihf/libpthread.so.0 (gdb) step Single stepping until exit from function open, which has no line number information. Starting the TouchKeys on /dev/serial/by-id/usb-APM_TouchKeys_8D73428C5751-if00 ... failed: Failed to open [Thread 0xb2879450 (LWP 1048) exited]

giuliomoro · Feb 28, 2019

What is the return value of open() ? Send me the exact code you are using, and I can try with the piano scanner I have here.