|Noise characteristics of PCM vs. DSD - image found here.|
I. DSD-to-PCM - foobar SACD plug-in & AuI ConverteRSo, I downloaded the newest SACD plug-in currently at version 0.7.7 dated 2015/03/16. I deleted the DSDIFF plug-in from my computer so there are no interactions, and installed the new files.
Notice that the SACD plugin has a configuration panel for settings:
Because this plug-in does not directly output to 24/96, I figure let's try with the highest output (352.8kHz) and I will use the best samplerate converter (SRC) I have (the excellent iZotope RX 4) to bring it down to 24/96 for analysis as before. Here are the settings I used in iZotope RX 4:
|Sharpest "max" filter for 24/96 in iZotope RX 4, linear phase without suppression of pre-ringing - nothing fancy...|
The other parameter we can play with in the SACD plug-in is the DSD2PCM mathematics setting. By default, it's the standard fixed-point integer mode. Let us also analyze the result from the highest precision mode "Multistage (Double Precision)".
As well, I downloaded the AuI ConverteR 48x44 software. The current demo is version 4.1.20. Other than setting the output for WAV 24/96, I left the rest of the settings to default.
Using the exact same procedure as last week, here's a summary of what I got:
Interesting... It looks like the SACD plug-in is actually about the same as the old DSDIFF 1.4 (<1dB difference) in terms of noise level and dynamic range. Notice just like last week's results from XLD, that going from fixed-point to double floating-point calculations made no difference here.
AuI ConverteR resulted in some impressive numbers! Let's see what it's doing in detail...
As you can see, AuI ConverteR is using a sharp "brick-wall" filter right at ~20kHz to remove essentially everything after 20kHz. As such, have a look at what it does with the noise profile:
Wow. That is an impressively sharp, precise filter at 20kHz! I can approximate that effect with iZotope RX 4's EQ plug-in with a low-pass at 20kHz, high Q of 25 or so (not shown) but AuI ConverteR looks even cleaner with less noise floor irregularity.
That little bit of high-frequency "rippling" with DSDIFF is probably a result of the resampling algorithm. Otherwise, foobar DSDIFF and the newer SACD plug-ins appear very similar.
Basically, this is what we can say at this point...
1. foobar SACD plug-in works about the same as the old DSDIFF plug-in. I would not be surprised if the algorithm (DSD2PCM) is essentially the same if we look "under the hood".
2. The AuI ConverteR software puts up some impressive numbers. This is done with a very strong low-pass filter. If you feel there is no need to retain frequencies >20kHz, then this will clearly get the job done.
II. All that noise!But wait, there's more! SACD Plug-in also has a 30kHz lowpass mode - "Direct - (Double Precision, 30kHz LF)". Hmmm, I wonder how that looks?
Engaging the 30kHz lowpass mode really resulted in a step down in calculated accuracy. Here's a look at the graphs:
Indeed, the 30kHz low-pass filter is doing the job (yellow).
We can see the effect of that 30kHz filter on the noise floor... Certainly not the prettiest filter out there! Realize that although the differences are there in these graphs with a synthetic test signal, we're talking about noise down below -150dB (below 20kHz). It's just not an issue in terms of audibility.
Now, let us see if we can do it better by using iZotope RX 4 to do the 30kHz low-pass filtering instead of the algorithm used by SACD plug-in. Here's a simple setting:
Low-pass filter at 30kHz as seen, Q = 5.0 (not too steep), linear-phase FIR with FFT size of 32k.
III. Impulse Response and DSD-to-PCM ConvertersDespite the inherent noise in DSD, we can drop overall noise levels substantially with a good low-pass filter. In fact, since a picture is worth a thousand words, this is what a 15kHz (-12dBFS) sine wave looks like comparing the unfiltered foobar SACD plug-in output at 352kHz with the 30kHz iZotope RX 4 low-pass filtering (again, this is with Saracon as the encoding software for PCM-to-DSD):
This is what all that extra high-frequency noise looks like in DSD when you don't filter it out at all. Notice that DSD128 is significantly less noisy. The question is, just how much noise reduction should we actually do? (You can also see the noise through an analogue oscilloscope - as shown here.)
As noted by Juergen in the comments to the previous post, there is this matter about time-domain behaviour as well which can be skewed as we apply various filters.
Let's see what a 24/96 impulse looks like after going through the DSD encoder [Saracon] and most of the decoders I looked at (DSD-to-PCM converter output set to 24/352 for each, AudioGate's max was 192kHz):
|(Click to enlarge.)|
When I convert this waveform to DSD64 and DSD128 with Saracon and then back to PCM with the foobar SACD plug-in to 24/352 unfiltered (retaining all that ultrasonic noise), you get the 2nd and 3rd left images. Notice again the amount of noise in the signal and again, we see the superiority of DSD128. From a time domain perspective, the SACD conversion process is excellent. The shape and timing of the impulse would be completely retained since the 2.8224 MHz sampling rate of DSD64 provides ~29 samples within each 96kHz time period.
When we use iZotope RX 4 with 30kHz low-pass filtering (4th left image from the top), the "impulse" amplitude is significantly reduced and we see the corresponding ringing pattern as the high frequency noise is removed and no longer obstructing the picture.
AudioGate and Saracon both look very similar. Both use linear phase filters with characteristic symmetrical pre- and post-ringing. Whereas AudioGate allows high frequencies through (and thus well formed impulse), we see the effect of Saracon's filter (pre- and post-ringing ~30kHz). JRiver looks like it uses an intermediate phase filter (with 24kHz or 30kHz low-pass) which minimizes but does not remove pre-ringing. Comparatively, we see that DSD Master is using a form of minimum phase filter that removes the pre-ringing but the post-ringing is augmented.
AuI ConverteR is an interesting case. As we saw above with the RightMark tests, it implements a very sharp ~20kHz low-pass filter. This impulse response looks to be linear phase with accentuated pre- and post-ringing due to the sharpness of the filter; the "price" to pay I suppose.
I'll leave you to decide how you feel about this information and whether you think the relative time domain effects resulting from implementation of the filters are audible. Back in 2013 I had a listen to some filter settings off the TEAC DAC and had difficulty noticing much of a difference; again here, I listen and fail to convince myself that I have any clear preferences among the converters including using ABX Comparator. So far I'm using headphones (Sennheiser HD800 + TEAC UD501 DAC, ASIO driver playing DSD64 converted to 24/352kHz) so perhaps I need to try again with the speaker system. You guys up for an internet "blind" test to see if there's a preference towards linear phase vs. minimum phase upsampling???
IV. ConclusionI hope we can appreciate the compromises we face with DSD to PCM conversion. How much noise can we tolerate from the 1-bit quantization when we move the signal to PCM? What's the best frequency to set a low-pass filter assuming one believes it's necessary? What parameters should we use to filter (minimum / linear / intermediate phase, sharp vs. gradual roll-off...)? What's the best sampling rate to spit out the PCM data (eg. do we need to produce >96kHz files if we roll-off before 48kHz)?
As I noted last week, I really am not convinced that these differences are audible beyond volume level changes and whether the ultrasonic noise causes problems for one's audio system (eg. intermodulation distortions, interaction with tweeter ultrasonic peaks, and other non-linearities). This is why I don't think there's any point in "crowning" any software package as being superior. Although it's interesting to demonstrate and experiment with, I suspect this is all rather obsessive academic results of interest to audio geeks :-).
You can perhaps imagine, after "penning" these last 2 posts, I'm pretty well done with talking about DSD for awhile. The most interesting question for me currently as suggested by the discussions with the previous post is this whole notion of just how much significance we should place on resolution in the time-domain irrespective of audible frequencies.
If it is significant (I hesitant to use the word "important" since that should be obvious by now if it is the case), then how much is enough? Should we take research like this paper by Kunchur (2008) seriously? Or is it possible that for practical purposes, it doesn't really matter that much when we're listening to real music as opposed to test signals? In any case, I have a strong suspicion that we will be revisiting this in the days ahead since this seems like an area that will be brought out when Meridian's MQA becomes available as I suspect they will emphasize time parameters, digital filter types, and samplerate given their apparent satisfaction with 16-bit resolution.
Finally, it has come to my attention that there was much unhappiness regarding a recent blog post on the importance of noise (here also) in digital audio reproduction to the point of using speculation to support an underlying belief that expensive ethernet cables could somehow impart beneficial effects (as you know, I found no evidence of significance in my testing with various types of ethernet cables). As usual, no empirical data or real-life examples were provided and support came from more testimony from the like-minded and some links that are at best tangential to high-fidelity audio. It looks like bans from commenting were issued for what seems like rather fair statements calling out the obvious lack of substance. I guess that's how people not felt to be "true believers in the audiophile experience" are dealt with. IMO, this is unfortunate behaviour for a site reporting on mature audio computing technology.
There is much that can be said, argued and refuted in that article, but I think for most reasonable audiophiles it's rather obvious and many excellent points can be found in the comments... What is of relevance to this blog entry is that if one believes that expensive ethernet cabling can reduce noise in the "system" (in a way that appears difficult for these people to produce empirical evidence for), why would any audiophile who subscribes to this theory even want to listen to DSD64 (where the noise is obviously demonstrable and a potential cause of distortion)? Or even consider DSD64 superior to 24/192 at times? Would it not be just as likely that some folks actually like the ultrasonic noise and what it actually is doing through the system? Perhaps similar to how some tube-lovers talk about certain types of distortion being unobjectionable? In fact, back in late 2013, I posted on my impressions with realtime PCM-to-DSD transcoding with JRiver 19 and felt that DSD64 did impart a subtle change to the sound. I wouldn't say that I felt the sonic difference compelled me to convert all my PCM files to DSD for listening, but it was an interesting effect. Maybe that's why some people would prefer a DAC that purposely converts PCM to DSD like the PS Audio DirectStream DAC (the signal is purposely downsampled to DSD128, and then only noise filtered by 80kHz according to this review).
I'm also thoroughly enjoying David Byrne's book How Music Works (2012, with 2013 update) - check it out for entertaining reading!
Enjoy the music folks :-).
Note that Adobe Audition renders the PCM data with a linear phase interpolation filter. Here are renderings of some impulse waveforms using Audacity which does not do the fancy interpolation for reference: