Beaglebone AI

SEDur

I just came across this while browsing the Beagleboard github
https://beagleboard.org/ai
https://github.com/beagleboard/beaglebone-ai

Has anyone here got C66 experience?
Do you think this will be Bela compatible in the future?

giuliomoro

The pinout should be compatible, and this seems a very promising development. We will try to get our hands on one as soon as it's released (or sooner). Henrik, who made the CTAG cape, has worked on a C++ API for the DSP on the X15 which uses the same chip, so we are hopeful that code can be reused here.

SEDur

Nice to see that the beagleboard platform isn't dying out as the AM3358 ages

giuliomoro

indeed, a long awaited upgrade!

AndyCap

Looks very interesting indeed.

AndyCap

Anyone heard anything more about this?

giuliomoro

We checked that pin compatibility indeed works fine, but the release date of the BB AI is yet unknown. The System Reference Manual, is coming together fast, though: https://github.com/beagleboard/beaglebone-ai/wiki/System-Reference-Manual .

SEDur

There is an 'add me to the mailing list' thing on the BBAI page

AndyCap

Thanks guys, just got this link through email with some more info: https://beagleboard.us3.list-manage.com/track/click?u=6e7c2131525b7b1b74c71e2cb&id=8f82749453&e=392bbb8449

Looks like the only EVE access is via a EVE TIDL lib (closed firmware) so it looks like it may not be easy to use the EVE cores for anything else 🙁

elbow-macaroni

Really looking forward to hearing more about this.

AndyCap , not sure I follow your frown -- are you disappointed that the EVE cores are too special purpose for your application, or are you concerned they won't be accessible in any practical sense?

AndyCap

I was hoping that the EVEs could be used as general vector processors, the way it is being done is with some middleware. So you don't get access to the EVEs just to the middleware.

AndyCap

They seem to be available now, I have ordered one from Mouser.

@giuliomoro Do you have a Bela setup for these?

thetechnobear

yeah, i just got the email too ... ill also order one to test with,
really hope its compatible with Bela (and Salt)

115 gbp, pretty pricey compared to a rPI , but its specs look great!

without explicitly coding for the AI, what can we expect improvements wise?

its now dual core 1.5 ghz, so that's a 3x improvement over single core 1.ghz

does the A15 also have a full FPU now...
(something ive struggled with on the beaglebone black is its performance in this area)

really excited by this 🙂

AndyCap

The A15 should have decent fpu I'm hoping, pretty sure each core has a quad MAC as well!

It also has two floating point C66x DSPs as well.

And the EVEs if we can get at them, not sure on this one.

I am guessing floating point throughput will be quite an improvement.

AndyCap

Just been looking here: https://en.wikipedia.org/wiki/Comparison_of_ARM_cores

A15 has VFPv4 instead of VFPv3 and the VFP is also pipelined.

Neon is also 128 bits instead of 64.

thetechnobear

AndyCap yup i saw that... it gave me much pleasure 🙂

i wonder if the dual PRU is useful to bela, or perhaps a future bela product ? @giuliomoro

fingers crossed we can get this working with existing bela/salt soon... exciting times!
( i wonder if this needs a new xenomai build... and if this a big task or not)

giuliomoro

So, I've had an AI for a few months, but PhD submission / IDE / Trill / Soul and other things meant we didn't do a lot with it yet.
Current state is:

we can run audio with ALSA, however one of the internal clocks seems to be a few MHz off, so when requesting 48kHz sampling rate. This at least validates that the pinouts should 99% be fine and the Bela cape should work without hardware modifications.
haven't tested SPI, it seems that the driver doesn't load properly, so that's the next thing I will try out
PRU code should need a fair bit of rewriting (besides the offsets, the McASP is slightly different on the AM5729 vs the AM3358)
haven't looked for a Xenomai build yet, however this is the very last step (all the code base can run with some minor mods without Xenomai).

As soon as I have managed to run the audio codec in ALSA mode on the release image (my experiments so far were on a pre-release image), I will post it here. I think the CTAG codec could also be made to work fairly easily (although @henrix may have mentioned that the driver broke on recent Linux releases?). It also seems that to change the pinmuxing, you have to rebuild uboot and bake-in the pinmux settings (the .dtb/.dtbo are only read/used by Linux to load the appropriate drivers, but the pins would have to be set BEFORE linux starts, though I don't remember exactly why).

In terms of raw CPU performance, generating 100 seconds of techno-world (without I/O, simply crunching the numbers) takes:

- 48.2s on BBB (at 1GHz)
- 9.4s on the AI (at 1GHz)
- 6.3s on the AI (at 1.5GHz)
- 0.94s on my laptop (i5 2.9GHz)

So, for the same clock, using mainly VFP instructions (as libpd does), the AI is about 5x faster (as expected, because the VFP itself should be 10x faster, but then there are load/store operations in there as well). I would expect fully-optimized NEON code to run about the same on the two, although the A15 could have some extra boost in that one too (or not: the NEON on the A9, for instance, is slightly slower than on the A8).

The main issue with the AI going forward, in my opinion, is heat dissipation. It ships with a heatsink, but I understand it needs a fan to be able to run at full speed (1.5GHz). This means that a cape cannot fit the normal way, as there is need to leave some extra room for the fan). I am using some extra stackable headers to leave more vertical space for the fan, but that could be an issue in some environments where space is at a premium (e.g.: eurorack modules (wink wink nudge nudge)).

(the following is meta in that I am resting the board on a heatsink for the purpose of taking the picture)

The image it ships with relies heavily on the cpu governor for thermal throttling, but keep in mind that this would have to be disabled on a Xenomai image ( and is - either way - not very RT-friendly). I haven't looked at the GPU/DSP/EVE/M4 yet, although I would want at least to be able to switch them off when unused, to reduce power consumption, but it seems that this is difficult/impossible at present, although maybe their clocks can be dropped. My understanding is that currently you may be able to run at 1GHz with just the heatsink, but I haven't seen this confirmed anywhere yet ( the system reference manual is quickly coming together but is not yet completed).

@henrix worked on a DSP library for the DSPs on the X15 which should work just fine on the AI (the AM5728 on the X15 is the same SoC as the AM5729 on the AI, except it doesn't have the EVEs), here it is, though I haven't looked at it in years.

thetechnobear i wonder if the dual PRU is useful to bela, or perhaps a future bela product ? @giuliomoro

This is actually a dual PRU subsystem, which means 2x of what was already on the BBB (2 cores, 1 UART, 1 industrial ethernet, 1 IRQ controller, 28k of DATA RAM, etc). I don't think there are any immediate consequences for our application (we manage to do everything on one PRU core, and use only one of the 10 channels on the interrupt controller, and nothing else), but I think it could help for some higher-bandwidth applications (such as beaglelogic), o running several of these in parallel, or if there was custom communication to be implemented, though - honestly - in that case I'd probably go for one of the M4 instead 🙂.

A final thought: the AI is a great board, but the heat dissipation and its cost (note: it comes with built-in wifi) do not make it ideal for all situations, and taking advantage of all its extra features (DSP/EVE/GPU) from within the Bela environment would require considerable effort from both us and the user. While we hope that existing and future Bela users will be able to have the possibility to upgrade their Bela setup to include the AI, and enjoy the associated "free" speed up, we also have to strike the right balance between how much time we can pour into it vs supporting the existing community and working towards new products.

AndyCap

From the looks of it we might be able to get OpenCL going on the SGX544 as well.

quadbela

Yes, I suspect the Bela community would prefer a slightly stripped down version of the BB-AI.

Personally in the short term I see the benefit of the better cpu (cores, clock, instruction set), FP performance, extra ram, larger eMMC, wifi, bluetooth. Seems like any BB with onboard wifi is a challenge for Bela though. Presumably one core could be dedicated to specific real-time tasks which might have some benefits (or might be a waste). A lot of what is baked into the chipset may be idle (e.g., EVE, GPU) for most Bela apps. The extra PRU's (possibly HDMI, GPU) would be great if Bela leveraged it for integrated, real-time displays, etc. but there is an extra PRU today (and displays don't seem to be a core focus for Bela).

The DSP's are intriguing. I'm amazed what Dual SHARC DSP's can do today to process audio. If the dual C66x DSP's are anything close to that ... and the ability to program these is within the realm of mere mortals without custom tools, that opens up a lot of possibilities to transform sound.

I know this is easier said than done!

AndyCap

It has arrived.

It gets hot, the whole board is hot even the USB connectors and ethernet port are getting hot!

I just powered it off to add some legs to raise it up a bit.

I think even at the slow clock speed with nothing much running this might well need a fan.

It isn't a warm day here, the BB AI was on for maybe 10 minutes with me looking around with ssh and the temps were up to 66 and rising.