Ah I had not seen this, thats right!
But Tensforflow Lite and Tensorflow Lite for Microcontrollers are not exactly the same thing.
TFL4MCU is like a TFLite subset that is specifically designed to be run embedded on microcontrollers in (sort of) real time.
Its like a c++ lib that you include in your build system, and when you compile your application you do so with a pre-processed trained network that is optimised to be small in memory footprint and really fast and liteweight to execute. This means you can run the thing without all the massive ugly overhead of tensorflow lite, but instead have just the stuff needed to run the thing on something like an STM32. This is attractive even on the BBB because you are still resource constrained, running in something reminiscent of real time with a fairly finite memory footprint.
Heres the part of the TF git repo where the bit for microcontrollers starts:
https://github.com/tensorflow/tensorflow/tree/master/tensorflow/lite/micro
Because ARM cortex M7s are already officially supported it wasn't so bad to get it to build for the Daisy, being an STM32H750, but there is still loads of stuff on that project that needs to be finished before I could use that as a platform to do exploration. Bela is such a mature (relatively) ecosystem that I was thinking, if I could compile for the Arm A(7?) arch that is in bela, I could do the same kinds of things, though probably as more of a proof of concept