I know this is old, but I thought I would add this about filtering the analog outputs. @andrew mentioned more than a year ago that the stair-stepping is in the ultrasonic frequency range and inaudible.
While this may be true, there are 2 things to keep in mind:
#1) Shannon's sampling theorem: You can perfectly reproduce any original band-limited signal at 1/2 the sampling rate. In other words, your maximum bandwidth from a 22.05 kHz sampling rate is about 11 kHz. What comes out of the ADC is the audio signal modulated onto a pulse train. The same image repeats every 22.05kHz, spanning 11.025 kHz above and below, starting at 0 Hz. The 0 Hz to 11.025 kHz image is what you hear...nominally.
Anything between 11 kHz an 22.05 kHz is noise, and this is not what I consider ultrasonic. 22.05 kHz down to 11.025 kHz is a mirror-image of every thing from 0 Hz to 11.025 kHz.
As an experiment, play a 6 kHz tone out of the Analog Output back into the audio input (which is sampled at 44.1kHz) and look at the FFT. You will see a tone at 6 kHz and a tone at 16.05 kHz in the spectrum.
#2) Class D amplifiers and/or filters with long transient response:
If the ultrasonic noise remains in the signal it can still be modulated with other high frequency noise sources to create cross-product terms that ARE audible. Class D amplifiers will have a switching frequency that is unlikely to be a perfectly synchronized multiple of 22.05 kHz. Any difference will create both inaudible and audible tones in the output spectrum.
Filters with long-duration transient responses also have an opportunity to create audible cross-products if there is any amount of nonlinearity in the following amplifier stages high frequency range.
The best thing to do is to create an output filter that has fairly good stop-band rejection at 11.025 kHz and above.
Practically speaking, this indicates a filter cut-off at 6 kHz to 8 kHz, depending how aggressively you want to filter the output and how many stages you use.
For example, a 6th-order low-pass with 5.5 kHz cut-off will yield (6 order)*(-6dB/octave) = -36 dB rejection at 11 kHz, and -72 dB rejection at 22 kHz (comparable to 12-bit SNR).
Ideally one would aim for -90 dB rejection at 11 kHz to maintain 16-bit SNR levels, but as you can see, that would require a 16th order filter at 5.5 kHz, or much higher order filters if you want more bandwidth.
This is the trade-off, but any amount of filtering is an improvement. The advantage with audio signals is typically the relative amplitude decreases with increasing frequency, so practically speaking you will take care of most of the noise if you have good rejection within 5 kHz of the sampling rate (in this case, good rejection at about 17 kHz).
As for filter design, the Multiple Feedback structure maintains the best high-frequency characteristic (the Sallen-Key structures start to do unexpected things at higher frequencies due to op amp characteristics). The multiple feedback filter design is pretty straightforward with easy-to-find online calculators. One such example is here:
http://sim.okawa-denshi.jp/en/MultipleFB3Lowkeisan.htm