thetechnobear giuliomoro , do you think there is a better way to detect bela?
Not sure.
Truth is, there is nothing special about Bela in the way you build externals (except if you want to use POSIX wrappers for Xenomai, but that is a different story), so the compiler flags I suggested above are good for any Cortex-A8. In fact, the Pi2 has 4 Cortex-A8 cores, so the same flags should work fine for it as well.
Perhaps try to grep /proc/cpuinfo
processor : 0
model name : ARMv7 Processor rev 2 (v7l)
BogoMIPS : 993.47
Features : swp half thumb fastmult vfp edsp thumbee neon vfpv3 tls
CPU implementer : 0x41
CPU architecture: 7
CPU variant : 0x3
CPU part : 0xc08
CPU revision : 2
Hardware : Generic AM33XX (Flattened Device Tree)
Revision : 0000
Serial : 0000000000000000
If you find things like neon
in there, this will be good indicators that you would need -mfpu=neon
, and probably --fast-math
. Note: using -mfpu=neon
is needed to actually use the Neon fpu because otherwise the compiler would tend to use the vfpv3
, for most stuff, as the latter is IEEE754-compliant, while Neon is not. But we tend not to care about denormals and/or signed NANs when doing audio, so we are happy with the faster, non-compliant Neon. Similarly. -ffast-math
tells the compiler to feel free to use non-compliant unit and operation re-ordering. Note: I just found out that -ffast-math
is actually supported by clang
. I guess--fast-math
, which I was using earlier, was just a legacy gcc
option? Bottom line: you can use -ffast-math
regardless of the compiler.
However, grep
ping for neon
is not enough for -mtune=cortex-a8
. In practice, I am not even sure -mtune=cortex-a8
is needed when compiling natively. You can check running gcc
with --verbose
without -mtune
if -mtune=cortex-a8
is implied. If that is implied, it would also be interesting to see how gcc
detects that.
-ftree-vectorize
is good for any architecture which has a SIMD unit (that is: probably all architectures that you would build Pd externals for, these days). It does not seem to be implied by -O3
, but also I do not understand the difference between this and -free-loop-vectorize
. More here.
-mfloat=hard
is system dependent but I am not sure how you would detect it. Again, probably the presence of neon
or vfp
would make you think that there is a floating point unit and that -mfloat=hard
, because it would be silly - and slower - to run soft-float. Again, this can probably be omitted when compiling natively, and again you can check with --verbose
.
Last, if from cpuinfo
you get
Hardware : Generic AM33XX (Flattened Device Tree)
there is a pretty good chance you are on a BeagleBone, and if you get that, plus something from uname -a | grep -i xeno
, then you most likely are on Bela.
Forgot to mention:
sometimes a great performance booster on the Cortex A8 can be achieved using single precision constants. This is because expressions like
float b = 1;
float a = 0.5 * b;
the second line will do something along the lines of:
- notice that
0.5
is a double
constant
- extend
b
to a double
and to a full 64-bit precision double
multiplication
- narrow down the result to a
float
and store it into a
double
operation on the A8 are supported only on the vfp
, therefore the multiplication above would never run on the Neon unit.
If you write the code, you could fix it by adding an f
to the floating point constant:
float b = 1;
float a = 0.5f * b;
or type-casting the constant it or assigning it to a variable before using it (best is to type cast to something like sample_t
which is also appropriately typedef
ined elsewhere in the code.
Instead of changing the source code, a low-hanging fruit is the solution of telling gcc
: -fsingle-precision-constant
(unsupported by clang
). However, this may lead to potential precision issues where the programmer actually wanted to store a double
precision constant, as it would turn ALL floating point constants to single-precision (including those assigned to a double
variable). Therefore, the program
#include <stdio.h>
void main(void)
{
float a = 99.99999999;
double b = 99.9999999;
printf("a: %.20f,\nb: %.20f\n", a, b);
}
would produce:
without -fsingle-precision-constant
a: 100.00000000000000000000,
b: 99.99999990000000593682
with -fsingle-precision-constant
a: 100.00000000000000000000,
b: 100.00000000000000000000
Whether this is a problem or not, is very much dependent on the codebase, so I would advise against adding it to the compiler flags by default.