So, you will first want to convert all the files to 16bit wav, as it's faster to load (and I don't think there is any mp3 decoder on Bela, though you could install one if you so wish).
Then, without getting language-specific yet, if you can load all the files from disk into RAM at startup, it would be much easier to implement this; otherwise you should store in RAM only the initial fragment of each file and load the rest from disk as you start playing the head. To find out which approach will work for your case, read below.
Each second of mono audio takes (44100*4) bytes when loaded into RAM (considering that you are storing it as a 32-bit float
). A reasonable RAM usage for your program is about 200MB. You could probably push this a bit further (maybe 300MB?), but the OOM (out-of-memory) killer may eventually kill your program.
Assuming the 200MB threshold, this means you have about 200000000/(44100*4) = 1133.79
seconds of monophonic audio you can store in memory. Could you go through all your files and get an estimate of whether you are above or below this threshold? Note that the above measure is for MONOPHONIC signals. Stereo audio files will take twice as much RAM per second, so please take care of that in your computations. Also, if you have some stereo files where the two channels are exactly the same, and could be downmixed to mono, make a note of that.