Would be grateful for any help/advice!
I'm working on a multi effect for saxophone using Bela. I already have a few effects working well. One of those is a time-domain pitch-shifter using amdf as described in this paper:
This pitchshifting algorithm works rather well and I feel the responsiveness and low latency is muuch better than some other pitchshifting effects I've tried. But I stumbled upon the PSOLA algorithm and was intrigued that it supposedly keeps the formants of the signal!
So I read up a bit on the PSOLA algorithm and took a try at implementing it in the BELA. I now have something that works. Kinda...
But. There are some things that aren't totally clear to me.
The "pitch marks". Must they be positioned on a high energy peak and if so, why? I'm currently using the amdf for the pitch tracking and it won't find peaks in the periodic signal.
The grains/splices/segments. Must/should they be 2 periods long as I've read in many places? Why? Should it always be a new grain for every period, regardless of grain size (i.e. with a grain size of 2 periods, there would always be two grains overlapping)?
The windowing of the grains. Must/should it be the same window regardless of distance between grains, or does it make sense to aim for crossfadeish between grains? Or at least make the grainsize/window larger when the grains are far apart? As of now, a down shift of two octaves, will introduce a lot of zero samples between the grains.
Intuitively I have a feeling that the windowing/grain size is key in creating the pitch shift. I was considering the case of a grain/window size of 3 periods and a pitch ratio of 0.5 (shifting down 1 octave). This would have the combined grains overlap by exactly one period, basically resulting in no pitch shift.
Would be happy to get some better understanding of this. Grateful for any help!