is there a one block delay introduced when i use subpatches or send~ and receive~ pairs on bela? (either in libpd or heavy) i seem to have a vague memory, that PD vanilla behaves that way...

hmm, my tests seem to indicate that they don't :-)

It may depend on the order of creation? Whatever happens in Pd also happens in libpd and therefore Bela. Not sure about Heavy.

    got an in-depth answer on the pd list, putting it here for reference.


    first let's make clear what is causing latency.
    e.g. consider three signal objects that are chained up "A->B->C".
    when turning DSP "on", those objects will start calculating samples
    based on their input.
    every 1.45ms ("each DSP tick") all 3 objects will need to calculate 64
    samples.

    if "A" does its calculations before "B" and "B" before "C", then the 3
    objects will do their calculations with zero latency. that is: if all
    objects just pass their input samples to their output, and at the
    beginning of the DSP tick the "A" objects gets a single sample value of
    1 (with all other samples before and after being 0), then it will read
    this pulse and pass it on to "B" which will read the pulse and pass it
    on the "C" which in turn will pass it to its output. once all the
    calculations are done, the pulse has passed through all objects.

    conversely, what happens if "C" does its calculation before "B" and "B"
    before "A"? we still feed the pulse to "A", but since "C" is being
    executed first, its input has all zeros which it passes to the final
    output, then "B" will read all zeros, passing them to its output, and
    finally "A" will pass the pulse to its output.
    all DSP calculations are now done, and at the output we get silence (but
    there's a pulse lingering between "A" and "B")
    in the next DSP tick, "C" will first read its input (which is the output
    that "B" 'just' (in the last DSP-tick) created, that is: zeros) and pass
    it on; then "B" will pass its input (which is the output that "A" just
    created: a pulse) and pass it on.
    all DSP calculations are now done, and at the output we get silence (but
    now the pulse is pulse lingering between "B" and "C").
    in the next DSP tick, "C" will again read its input (which is the output
    that "B" just created: a pulse).
    once all DSP calculations are done, the output (of "C") will be a pulse.

    comparing this to the 1st A, 2nd B, 3rd C calculation above, we see that
    the output is late by 2 DSP ticks, which is 2 blocks (each 64 samples,
    usually).

    obviously it is better to do the calculations in the correct order,
    because it allows you to achieve zero latency in a whole chain of objects.
    So Pd tries hard to do exactly that: if two objects are connected with a
    signal connection, the source object will always be processed before the
    sink object.
    this is done by the Pd-scheduler that sorts the DSP graph (as a directed
    acyclic graph that expresses the inter-object dependencies) whenever you
    turn DSP "on"

    incidentally, this also works for meta-objects ("abstractions" or
    "sub-patches"): if you connect two abstractions with signal connections
    ("X->Y"), then the entire source abstraction (all signal objects
    within "X") will be processed before the entire sink abstraction (all
    signal objects withing "Y").

    now what's different with implicit connections ([s~], [r~],...)?
    simply put: the Pd-scheduler (that does the sorting of the DSP
    calculations) does not know that a [s~ foo] object is connected to a
    [receive~ foo].
    these two objects are connected via their own logic, but the
    Pd-scheduler doesn't know anything about their inner logic and just
    looks at their explicit connections to determine which object needs to
    be evaluated first.

    depending on the patch, there are three possibilities:
    1) somehow the [r~ foo] is connected explicitely to [s~ foo], typically
    if you are doing feedback.
    in this case, the [r~] must be processed before the [s~], so the [s~]
    didn't have the chance yet to generate the new sample block, leaving
    [r~] with the last sample block - introducing a delay of one block.

    2) somehow the [s~ foo] is connected explicitely to [r~ foo]. this is a
    bit harder to acchieve, since [s~] has no outlet and [r~] to inlet.
    however, if we put both into abstractions/subpatches with iolet~s, then
    we can connect these iolets. since the entire source abstraction is
    processed before the entire sink abstraction, any [s~] in the source
    abstraction will also be processed before any [r~] in the sink abstraction.
    whenever [r~] does it's thing, [s~] will already have produced a new
    block of samples, so [r~] will get the fresh stuff and no delay occurs.

    3) the [s~] and [r~] are not connected at all (e.g. "[adc~ 1]->[s~ foo]
    [r~ foo]->[dac~ 1]")
    in this case, Pd doesn't know whether it should first process the [s~]
    or the [r~] and will pick one "randomly". depending on which it picked,
    you will get a block delay or not.
    this is the "fan out" for signals.

    i don’t fully get the strategy to avoid them.
    do i have to put them into a subpatch and connect inlets~ and outlets~

    yes.
    the only safe way to guarantee a certain order of execution for signal
    objects is by making their dependencies explicit.
    the only way to make dependencies explicit is to use signal connections
    (which involves inlet~ and outlet~ when it comes to subpatches)

    (somewhat defeating the purpose of the s~ and r~ objects)

    only somewhat.
    there are multiple purposes of [s~] and [r~] objects.
    avoiding explicit connections is the worst use.

    also, inlet~ and outlet~ never add latency, right?

    jein.

    they don't add latency by themselves, as they are only crutches to make
    explicit connections between windows.

    but if your subpatch runs on a higher blocksize (e.g. 1024 instead of
    the default 64, using [block~ 1024]) then the [inlet~]/[outlet~] objects are the place were the actual re-blocking happens.