A common trick to reduce the CPU usage due to cutoff changes in filters is to recompute the coefficients every so often (e.g.: once per block) and then apply a smoothing filter (e.g.: one-pole) to the coefficients themselves. If you are responding to non-smooth inputs (e.g.: occasional messages from a GUI or a Trill sensor), one could even have a onepole for the cutoff frequency (which smooths it e.g.: within about 10ms) and then a onepole for each coefficient (which smooths it e.g.: within about 1 block size). This was suggested in a paper by jos IIRC, but can't find it right now.
Then there are more rigorous approaches that investigate how the filter topology affects the filter's transient response (see here). Haven't read the paper since 2015, but IIRC the gist was that it's not like biquads are inherently bad at step response, but some topologies are better than others at that.