• General
  • Bela crashing after some hours...

Hi again!!

For my sound installation I use a Bela with a microSD in FAT32 (only to manage soundfiles, the system remains on Bela' eMMC). Rsync copies files on it from a remote computer on the network.
Then a PD patch on the Bela reads the files. The structure is working pretty good now, but for some reason it stops working after some hours.
I explained about rsync and microSD because it could be related.
Here is the syslog:

241224-belasyslog.txt
369kB

-it begins with the boot
-Dec 23 12:28:15 I run the project
-runs as it should
-Dec 23 20:25:34 mmcblk0 error (which is memory card)
-Dec 23 21:42:12 rsync error
then everything goes wrong, with 'out of memory' messages and more...
syslog continues if you need, it seemed to be repeating .

It seems like you are running out of RAM and the oom killer is invoked. It seems that node gets killed for it, I assume you started the program from within the IDE? Still, it seems weird that it would kill the IDE instead of the Bela process, as the latter should be detached and as such I wouldn't expect it to be even killed when the IDE dies ...are you doing anything that increases the RAM usage over time ?

Run top -p $(pidof YOURPROJECTNAME) -p $(pidof node) while the program is running to see how much RAM node and YOURPROJECTNAME are using.

root@bela:~# top -p $(pidof CommunityEcho) -p $(pidof node)

top - 16:52:00 up  2:46,  1 user,  load average: 0.47, 0.35, 0.29
Tasks:   2 total,   0 running,   2 sleeping,   0 stopped,   0 zombie
%Cpu(s):  3.2 us, 11.4 sy,  0.0 ni, 85.4 id,  0.0 wa,  0.0 hi,  0.0 si,  0.0 st
KiB Mem :   499284 total,    77484 free,   100768 used,   321032 buff/cache
KiB Swap:        0 total,        0 free,        0 used.   371944 avail Mem 

  PID USER      PR  NI    VIRT    RES    SHR S %CPU %MEM     TIME+ COMMAND                                                                                                                                                                    
  422 root      20   0  137196  47356  11980 S  5.3  9.5   5:34.49 node                                                                                                                                                                       
  649 root      20   0   29320  25784   5364 S  3.0  5.2   4:27.59 CommunityEcho

No chance the fifo service is guilty?
https://forum.bela.io/d/5382-bela-log

    rph-r root@bela:~# top -p $(pidof CommunityEcho) -p $(pidof node)

    that looks good, does the amount of ram in use increase over time?

    rph-r No chance the fifo service is guilty?
    https://forum.bela.io/d/5382-bela-log

    Possibly? And that would be a prime suspect, given how that's different from virtually anything else that's been running on Bela in the last 10 years ... although running out of memory with the fifo should only happen if you are not running the service and the content of the fifo gets bigger and bigger , but even then it shouldn't cause node to crash...

    ok, I came this afternoon and the patch was running since 20h. The router had crashed (an old Linksys for my tests, but normally works). After restarting the router, I've noticed the Bela patch is still running. The only thing that doesn't work is the rsync copy of files.
    Here is the "top":

    PID USER      PR  NI    VIRT    RES    SHR S %CPU %MEM     TIME+ COMMAND                                                                                                                                                                    
       64 root      20   0       0      0      0 S 13.4  0.0  17:41.06 mmcqd/0                                                                                                                                                                    
       27 root      20   0       0      0      0 S  6.1  0.0   5:37.26 kswapd0                                                                                                                                                                    
     7942 root      20   0    7172   2576   1940 R  5.9  0.5   0:01.39 top                                                                                                                                                                        
    28225 root      20   0   29320  25784   5364 S  4.9  5.2  30:14.12 CommunityEcho                                                                                                                                                              
     7960 root      20   0    4620   1756   1520 R  3.7  0.4   0:00.15 ps                                                                                                                                                                         
    26496 root      20   0       0      0      0 D  3.4  0.0   1:06.30 kworker/u2:2                                                                                                                                                               
      422 root      20   0  138084  48524  11384 S  2.2  9.7  43:51.50 node         

    The 13% CPU process mmcqd is used to write on SD card.

    ok, actually it was transferring files, about 100 files that couldn't be copied this night have just been copied. So until now Bela is working, I don't know what made crash the router.
    I keep it running and see later or tomorrow what's happening.

    What files are you copying with rsync, how big are they and how often do you copy them??

    .wav, around 220Mo, every 10mn
    Normally it takes les than 10seconds and no log complains.
    The micro SD card is a middle quality, I will try tomorrow with the hi-q I've bought.

    Are you copying them to the SD card or from it? Are you also writing those files to the SD card from the Bela program?

    No, the files are recorded on the remote computer, then send to the Bela, who only read them...

    Ok,
    it crashed again after some days. I can only access Bela by SSH, but fortunately I can access it.
    The router didn't crashed this time.
    Here is the "top":

      PID USER      PR  NI    VIRT    RES    SHR S %CPU %MEM     TIME+ COMMAND
     9697 root      20   0    5508    932    640 R 21.2  0.2   0:05.77 systemd-cgroups
       27 root      20   0       0      0      0 R 19.5  0.0 346:47.83 kswapd0
     9694 root      20   0    7712   1400    224 R 17.0  0.3   0:25.11 top
        1 root      20   0   37752  13020    604 R 16.2  2.6  71:35.11 systemd
     9682 root      20   0    8712   2244   1880 R 14.6  0.4   0:28.18 systemd-journal
     9548 root      20   0   22340    860     20 S  4.6  0.2   0:13.31 rsyslogd
     8669 root      20   0    5632    364     12 S  3.9  0.1   0:41.63 systemd-logind
     6780 root      20   0    4332    440      8 D  2.8  0.1   0:04.61 rsync
     9882 root      20   0    4332    468     16 D  2.5  0.1   0:08.66 rsync
       67 root      20   0       0      0      0 S  1.7  0.0  27:19.77 mmcqd/1
      171 message+  20   0    5328    380      8 S  1.2  0.1   3:05.38 dbus-daemon
     5535 root      20   0    4332    456      4 D  1.0  0.1   0:07.60 rsync
     4405 root      20   0    4332    448      8 D  0.6  0.1   0:07.20 rsync
       82 root      20   0       0      0      0 S  0.4  0.0   2:28.69 jbd2/mmcblk1p2-
     9560 root      20   0       0      0      0 I  0.4  0.0   0:00.28 kworker/u2:2
     9668 root      20   0       0      0      0 I  0.4  0.0   0:00.88 kworker/0:2
        8 root      20   0       0      0      0 I  0.2  0.0  13:10.20 rcu_preempt
     4171 root      20   0    4332    444      8 D  0.2  0.1   0:08.05 rsync
     6935 root      20   0    4332    476     44 D  0.1  0.1   0:07.60 rsync
     9676 root      20   0    8500    604      8 S  0.1  0.1   0:10.33 sshd
        7 root      20   0       0      0      0 S  0.1  0.0   5:01.73 ksoftirqd/0
       81 root       0 -20       0      0      0 I  0.1  0.0   2:34.13 kworker/0:1H
    12093 root      20   0    4332    560     80 D  0.1  0.1   0:07.94 rsync

    and this is the free output:

    total        used        free      shared  buff/cache   available
    Mem:         499284      388028        3120       28268      108136       71052
    Swap:             0           0           0

    You have several rsync processes running at the same time, is that expected? Is it possible that you are spawning a new rsync at regular intervals but that the the previous ones don't end in time before the new one starts, thus resulting in the many rsyncs running at once that we are observing?

    rsync runs on the remote computer every 10mn, launched by a cron task. As I said the files are normally copied in 10s. I understand a new instance starts before the other is finished, I guess I can manage that. But still, something is going wrong if it takes more than 10mn. I'll check that...

    it may be that once the board is under load it takes much longer? 10s to 10 minutes seems like a long time but worth checking. You may want to change your host script so that it runs in a loop: rsync then wait 10 minutes and repeat. This way you are sure it won't run more than one instance at a time. This may be jittery, so if it has to be exactly 10 minutes, you may have to adjust things.

    I tried without rsync, only reading the files. It is still running after 24h. However there are errors in syslog concerning the microSD card:
    Jan 5 04:52:05 bela kernel: [53496.586774] mmcblk0: error -110 transferring data, sector 53229792, nr 256, cmd response 0x900, card status 0xb00
    I try with another card, in ext4

    SD card sounds like a reasonable culprit in this case ...

    besides those errors - which I do not understand - does the program run fine?

    14 days later

    Yes. It is running since >one week now with no errors, with all features enabled:
    -reading files constantly
    -sending log to the remote computer with a fifo as a service
    -receiving files through rsync

    I guess the other card was too slow to constantly read 4 tracks @44100Kz/16bits and receiving the files at the same time.

    maybe it is interesting to mention the µSD card types.
    The one that doesn't work: SDXC I, class 10 U3, A1, V30
    The new one: SDXC I, class ? U3, A2, V30

    so not a large difference, only A1->A2. Maybe the first one is a cheap one, the second is a Sandisk Extreme Pro.