A couple weeks ago, Whackamus posed this interesting comment and question which I thought would be a good topic to discuss and explore in greater detail and with some examples/samples:
"I've been reading your blog for years. Or for almost four years, at any rate. I have to thank you for doing what you do. I've likewise always wanted to ask you a question, too, but I don't know how the bleep to to contact you. In any case, since I've been fretting over it afresh, I thought I'd just post it here. If you ever do decide to get to/address it, that'd be great. If not -- hey, no sweat. :)
In any case, I read the following (tonight) on the Stereophile forums:
"I personally think that MQA has some noble goals, in terms of getting as close to the original master as possible, but I think that is far less important than the elimination of the damaging pre-ringing distortion. This has been the bane of digital playback for 30 years, and over-sampling and various filter techniques have tried to deal with it, with limited success."
I won't say that I've never heard ringing -- because I probably have -- but I will say that I've never explicitly said: "Aha! Eureka! Thar be ringing!" Because -- outside of maybe a blurring during transients? -- I have no idea what it sounds like. But my question is less about MY having heard ringing than about the AUDIBILITY of ringing -- pre, post, or otherwise. In a quality DAC (which I've got to assume most of the folks posting on Stereophile.com have access to), how audible are ringing effects? Or, rather, how COMMON are they? I kind of imagine that the Meitners, Lavrys, Levinsons, Stuarts, etc. of the audio world take great care to minimize (pre-/post-)ringing effects and to eliminate ringing in the audible realm. I likewise imagine that both such things are doable, inasmuch as most of us have been enjoying digital audio for decades now. But the Stereophile poster makes it seem as if ringing is the apodeictic bane of digital audio. What am I missing?"
Beautiful question! Like other audiophiles, I've heard that the "dreaded ringing" (like the "dreaded jitter"), over the years has been on the minds of audiophiles as a nemesis which must be slaughtered! Typically, we see images like this in magazines which are of course extremely frightening to look at:
Before freaking out, let's think this through.
Since 2013, I had been exploring this phenomenon and trying to figure out for myself just how much of a problem this is from the perspective of magnitude of audible effect. Folks might want to have a look at previous articles on this:
MEASUREMENTS: Digital Filters and Impulse Response... (TEAC UD-501)
MEASUREMENTS: "Pulse Response" - 5kHz & 10kHz.
Consider for a moment what an "impulse" is in the digital world. It's a sharp transition or transient where from a baseline of 0, it instantaneously goes up to full amplitude. Numerically it looks like this (+32767 being the largest signed number for 16-bits, and -32768 the smallest):
...0, 0, 0, 0, +32767, 0, 0, 0, 0...
I think it's useful to see it as a number sequence of discreet sample points rather than some kind of waveform image as a start. When we look at images of this data with an audio editor where the "points" are conveniently connected for us, we are actually seeing the calculated interpolation as applied by the software. How this interpolation happens is a result of the function being applied which in an audio editor is represented by the line drawing we see.
When I measure an "impulse response", basically what I'm asking the DAC to reproduce (typically with a 16/44.1 signal), is that sudden sharp transition of exactly one sample in duration, asking the device to interpolate all the individual samples around that discontinuity with the filter function programmed into it. For a typical 8x oversampling DAC, that 44.1kHz is upsampled to 44.1 x 8 = 352.8kHz; or 8 intermediate samples are calculated for every single point. Realize that a "Dirac impulse" (the idealized single point spike) is not inherent in natural sounds. We do not get instantaneous transitions like this that suddenly start and stop the air waves in real life physical systems. Nor would single impulses like this sound any good anyhow! We can model it in a computer of course just like in electrical systems we can show true square waves even though in nature, ideal square waves of vertical slope do not exist either.
Suppose we start with the most basic DAC, one that does NOT offer an anti-imaging filter. A system where that single impulse point is held over the sample duration. This results in a square waveform representing that impulse across the time of the single sample as shown in the image above. When we do this, our digital data gets converted to an analogue electrical output with all the ultrasonic components of square waves - remember, an ideal square wave is a composite of all the odd-order harmonics ad infinitum. Instead of smooth sine waves, we see these blocky "digital" tracings and if we are to pass the "impulse" data through like this, the result is literally an unmodified square wave. This is what's called a "zero order hold" model of signal reconstruction; more commonly known in the audiophile world as the "non-oversampling" DAC, "NOS" DAC, and people like Audio Note might call it "1X oversampling".
When you see people show images of the squarish digital waveform like this image of a blocky sine wave:
|Image from this Kickstarter project.|
By turning off the digital filter on my TEAC UD-501 DAC, I can listen to this, measure it and demonstrate the effect of the lack of filtering.
Notice the "jaggy" unfiltered 1kHz sine wave at 16/44, with a rather "nice" looking impulse response measured without significant ringing (since this is an actual recording using a 24/192 ADC, note the "Gibbs Phenomenon" with the impulse waveform - see below).
But look at the "Digital Filter Composite" (again, thanks to Jürgen Reis for suggesting the use of this measurement method):
We see a terribly "dirty" result when examined in the frequency domain. Tons of noise beyond Nyquist (22.05kHz), plus the 19 and 20kHz sine waves are echoed across the spectrum. As much as some would want us to believe that time domain qualities are extremely important down to the ringing, remember that for human hearing, the frequency domain is no doubt essential to get right (the cochlea performs a type of FFT processing, and similarly this is how cochlear implants function to artificially aid in hearing when the natural cochlea fails).
Remember, digital audio is by definition bandwidth limited. That is, when we sample using a CD samplerate of 44.1kHz, reconstruction of the waveform is accurate based on Nyquist-Shannon theorem up to Fs/2, or the "Nyquist frequency" of 22.05kHz for the CD. When we reconstruct the output and do not bandwidth limit the signal, as in the case of these NOS DACs, notice all the harmonics and distortion products seeping through beyond 22.05kHz. The analog to this in the world of video and digital photography would be Moiré patterns either in the fine details or in the color banding of the image. We clearly recognize this as unwanted "detail" which was not found in the original image we captured.
So, how do we remove all that extra high frequency distortion? We use a filter of course! And in modern DAC's this is typically done with a digital oversampling process that interpolates the data so it doesn't look like these nasty square waveforms any more, but rather something approximating the sinusoidal physical air waves that we eventually hear, while suppressing frequencies not represented in the original digital signal as best we can.
Enter the Whittaker-Shannon interpolation formula - commonly known as the sinc filter. This is the mathematically "ideal" impulse response for a brick-wall low-pass filter. Behold... "Ringing":
A filter function that respects the bandwidth limited nature of the sampling theorem obviously means that the output waveform when faced with such an extreme input as the unnatural "impulse" should interpolate the signal with minimal seepage beyond the Nyquist frequency. You will see this ringing phenomenon wherever there are sudden transients containing constituent frequencies above Nyquist. For example square waves will show the "Gibbs Phenomenon" during the transitions:
Despite ringing in the time domain, when we examine the frequency domain, things look much nicer! Here then again is my TEAC UD-501, but with a sharp/steep 8X oversampling anti-imaging filter turned on:
As you can see, sine waves are smoothed out and the frequency-domain FFT composite demonstrates the benefit of the filter - good suppression of high frequency imaging; a relatively sharp "cliff" around 22.05kHz, and clean 19 & 20kHz signals with no high amplitude harmonics and intermodulation products. IMO, this is a much better result than a NOS DAC.
Which brings us to the main issue. Whereas frequency domain imaging distortion and intermodulation distortion clearly can be audible (for an example of this, go download Monty's "Intermod Tests" and have a listen), just how audible is the impulse ringing which is unavoidable for a steep low-pass filter? Specifically, how audible is the pre-ringing (because post-ringing will likely be masked naturally by reverb trails)?
IMO, the audibility is minimal if at all. Here's why:
1. The ringing is typically at Nyquist. For CD samplerate, this is 22.05kHz folks. What human can hear a low amplitude pre-ringing coming about a millisecond before an impulse at this frequency? Remember that the amplitude of the ringing is correlated to the amplitude of the "impulse". When you see measurements of the impulse response, typically this is at 100% amplitude (like that +32767 above) so the ringing you see is really a "worst case scenario", not representative of actual music.
2. Microphones and ADCs are bandwidth limited devices. Most microphones have little frequency response above 20kHz anyway as discussed recently. Remember, as I noted above, square waves and certainly single sample impulse signals are not natural sonic phenomena. Furthermore, the analogue signal from the microphone will typically be filtered by the ADC's low-pass filter as well which we never talk or obsess about in the audiophile world!
You can in fact take some music you have and upsample it from 44kHz to 176.4kHz with a steep upsampler that demonstrates strong ringing with an impulse response. Have a look in an audio editor with the "Spectral Frequency Display" and see if you notice much ringing being added around the Nyquist frequency. I have done this many times and cannot recall ever having seen any strong ringing other than with artificial test signals.
3. Empirical evidence is lacking. Talk is cheap and testimony is legion, including folks like the fellow quoted above by Whackamus, from Bob Stuart, and audiophile folk heroes like John Swenson. There seems to be this belief out there that digital filters somehow play a huge role in the sound and that somehow it needs to be specially tuned by the "gurus". I suppose promoting this point of view allows manufacturers to differentiate themselves with their version of digital filtering and allows talk of fancy terminology like an FPGA programmed to perform the signal processing. Furthermore, these claims seem to be gobbled up by the mainstream audiophile media as some kind of massive step forward in digital audio design!
Seriously folks, many audiophiles feel that NOS DACs sound great to them, yet most digital audio is designed with relatively steep filters with ringing and generally people don't complain, how much difference is there really? I have never seen a purely subjective reviewer come out and say "Aha! I know this device used a steep filter and I hear ringing!" without them knowing what the impulse response for the device looked like a priori. The difference is clearly not very obvious.
You might recall that we looked at one part of the audibility question last year on this blog with a little blind test:
INTERNET BLIND TEST: Linear vs. Minimum Phase Upsampling Filters
Using naturally recorded music starting at 24/44, a comparison was made between two upsampling filters (interpolation to 176.4kHz) with impulse responses looking like this:
Guess what, as a group, there was no evidence in the blind test results that the 45 audiophiles who tried this test actually had a significant subjective preference for one or the other filter setting. You would think that the linear phase filter with the long pre-echo would be less desirable if the effects were all that big. (See the results beginning here: The Linear vs. Minimum Phase Upsampling Filters Test [Part I]: RESULTS.)
[Please folks, let's not bring up Meridian's AES 2014 paper: The Audibility of Typical Digital Audio Filters in a High-Fidelity Playback System which confounds all kinds of things like sub-optimal dithering and as far as I can tell, didn't convincingly prove what the title claims.]
Having said this, am I saying then that filtering settings are not important? Well, I guess that depends on how one defines "important". I do want the low-pass filtering because I believe clean frequency domain performance is important - NOS would not be my preference. A flat frequency response to 20kHz, reasonable suppression of imaging, and maybe modest suppression of impulse response ringing IMO is good enough. Therefore I suspect the majority of typical settings used by DAC manufacturers would be fine if not indistinguishable.
Whether one hears it or not, as I suggested above, I think there's nothing wrong with achieving modest suppression of the ringing, especially the pre-ringing... It's a "perfectionist audio" argument rather than empirical claims of audibility I believe. What could be done? Here are a few options.
1. Go high-res. With 88.2kHz samplerate, Nyquist would be 44.1kHz, and ringing at that frequency would be way beyond the hearing ability of humans. Basically we've bought even more insurance in the event that in some situations the 22.05kHz ringing from a steep "brick wall" filter may seep into the audible range. Furthermore, it's unlikely many speakers would be able to reproduce this frequency without significant attenuation. Whether one uses a sharp digital filter or a weak one or even none at all will make little difference. Of course, not all albums currently are available in high-res (and sadly very few are deserving to be called high-resolution recordings). Note that this does not include albums that are just upsampled which applies the ringing of the algorithm used and may in fact be worse than your DAC's interpolation.
2. Use a minimum phase filter setting. Technically this isn't reducing ringing, just removing the pre-ringing component. Over the years, we've seen minimum phase settings be used in all kinds of devices from the iPhone 4/6, to the Samsung Galaxy Note 5, and even motherboards like the Gigabyte GA-Z170X-Gaming 7 a couple weeks back. Obviously even inexpensive devices can be programmed to do this. I've been using iZotope RX 5 these days as an easy tool to experiment and listen to different settings. Changing the "Pre-ringing" setting to 0 will result in a minimum phase filter.
|iZotope RX 5 - Upsampling of 44kHz to 176.4kHz with linear phase interpolation.|
|iZotope RX 5 - Upsampling of 44.1kHz to 176.4kHz with minimum phase interpolation, same steepness.|
For the sake of completeness, there are "intermediate phase" settings you can use for filter design. We actually have see this type of setting used over the years in my hardware tests like the old WD TV Live! This can be demonstrated by using an intermediate setting in iZotope with the "Pre-ringing" set to 0.5:
3. Use a slow roll-off setting. Many DACs including my TEAC UD-501 has a slow roll-off filter setting these days. One can easily do this in iZotope RX 5 by changing the "Filter steepness" setting:
|iZotope RX 5 - Upsampling of 44.1kHz to 176.4kHz with steepness setting of "200". Lots of ringing.|
|iZotope RX 5 - Upsampling of 44.1kHz to 176.4kHz with steepness setting of "10". Ringing obviously attenuated.|
Like many things in nature, the act of "beautifying" one characteristic will result in less ideal performance in another domain. It would be great if we could have a nice and clean sharp low-pass filter but this would be at the expense of time domain ringing and potential temporal smear demonstrated by the impulse response. Conversely, reduction of ringing in the time domain means the strength of the low-pass filter will be reduced and the ability to suppress imaging will weaken.
Of course, there's nothing to stop us from combining points 2 and 3. For example, we can model what I found with the PonoPlayer with these settings:
Using my Focusrite Forte ADC, here are the actual measured impulse response and "digital filter composites" from the PonoPlayer compared to test tones played back using my TEAC UD-501 with 16/44 files upsampled to 24/176.4 using the filter settings in iZotope above:
Pretty close, right? In fact, I should have used an even weaker filter setting in iZotope to approximate the PonoPlayer. I think a steepness factor of 1.2 would be very close. It's of course unlikely that the "filter composite" image would look exactly the same... These are quite different DACs after all with analogue electronics different and the 64-bit iZotope RX calculations likely would be different from the mathematical precision in the PonoPlayer hardware.
There is an important point here though. If you know one of the transform pairs, like what the impulse response looks like, you'll be able to predict the frequency domain result. As you can see, it looks like Ayre used a very gentle minimum phase filter setting that allows significant amounts of frequencies >22.05kHz to pass through when playing 44.1kHz music. The designers obviously felt that this was a desirable balance for this device and the target audience.
Go experiment. Have a listen to a NOS DAC or if your DAC allows the filter to be turned off, give that a try. Go try listening to various filter settings with SoX or even easier, iZotope RX with all these parameters to play with. Try some unsighted listening and see if you can consistently tell a difference. Try different types of music. For example, an aggressive, over-compressed "loud" mastering, with clipping may excite more ringing and imaging distortions (but then this kind of music is inherently distorted anyway).
No matter how much we obsess over the design of these filters, realize that there are a multitude of other extremely important factors in ultimate sound quality. No matter how picky we become as consumers, there's nothing we can do about the production side. For example, what do we know about the quality of the ADC used to convert the original performance and the nature of the low pass filtering used (see this article on the use of analogue vs. digital filters before an ADC)? Even more importantly, the quality of the mastering job. We have already seen examples of suboptimal studio mixes, pseudo 24-bit audio, and music resellers providing nothing more than Loudness Wars "hi-res" files. Unless the DAC digital filter settings are truly atrocious, do we honestly think it would make much difference given all the factors outside of our control?
Let me know about your experiences when experimenting with digital filters. Do you think the difference in magnitude is worth exploring further? Also, let me know if you come across conclusions from actual listening tests where these filter settings were assessed in a controlled fashion.
Realize that back in 2006, before ringing was brought to the spotlight with Meridian and their "apodizing" filter setting or Ayre and their whitepaper around 2009, Stereophile had an interesting article on this already. Despite the main writer wringing his hands about the importance of these filters, notice that the editors admitted to not being able to hear much difference. I concur. Certainly if I were a manufacturer looking to squeeze everything out of a design, I might want to customize the filtering to taste based on the hardware and target audience. But as consumers listening to all sorts of music with variable quality out of our control, I'd be pretty happy with a typical linear phase anti-imaging filter of moderate steepness.
For those who want to read more, consider this article in Secrets of Home Theater and High Fidelity:
Up-sampling, Aliasing, Filtering, and Ringing: A Clarification of Terminology
Notice the article above is focused on ringing in video (specifically 4K video and quality of upsampling like 1080P to 4K). Digital signal processing concepts of course apply to video as well as audio. One big difference with audio is that time only goes in one direction... You can get away with more post-ringing whereas in video, around sharp transitions, pre- and post- effects may both be very noticeable in the image.
For those who remember their maths, here's a YouTube video discussing "impulse response", "convolution", "Laplace transform", etc... Have fun!
A great resource to check out:
Infinite Wave SRC Comparisons
Nice interactive website to look at the various sample rate converters on the market. You can easily flip between frequency sweeps to look at imaging artifacts, cleanliness of test signal, transition bands, and impulse response ringing.
To end off this post, let's talk about a couple of items in the blogosphere lately.
First, I find it rather odd that a digital audio site would post an article like this ("Sampling: What Nyquist Didn't Say, And What To Do About It"). As a general practical article on the limits of the sampling theorem, pragmatic questions including whether one needs a filter in some instances, and how to select them in real-life engineering applications (eg. digital sampling of EKGs...), this is a great article. But what does this tell us about practical implications in audio and how is this applicable to audibility of high-fidelity playback? Sure, filters need to be selected for the application and obviously for different purposes, one can and should understand the waveform being sampled. Furthermore, sampling rate obviously needs to be commensurate with the frequency of the event being recorded. But CD sampling rate was decreed as 44.1kHz, we generally know that humans can't hear above 20kHz (sampling rate 10% above that 20kHz audibility threshold), digital audio has had at least 3 decades to refine the sound quality including filters, and as discussed above, there are some reasonable compromises to keep in mind which can be understood without a PhD in theoretical physics. Without some useful conclusions in articles like this about high-fidelity audio when posted on an audio site targeted at non-technical audiences, a typical audiophile probably leaves scratching his/her head with more questions than answers thinking there's something terribly complex and mystical in all this. IMO, this is not the case and it does the hobby a disservice to promote unnecessary uncertainties typical of FUD.
Second is of course the recent bruhaha around the audibility of high resolution audio (Reiss' "A Meta-Analysis of High Resolution Audio Perception Evaluation" in the AES). That's nice. Does it mean that suddenly Neil Young's interviews with musicians in a car and seeing them "blown away" from the sound is now true? Should we now storm HDTracks/Pono/etc. to re-buy all our favourite albums in hi-res now that it's "official"? Should we now demand audio streaming sites to carry hi-res material and greatly anticipate Tidal's MQA stream?
Of course not! Mark Waldrep (aka Dr. AIX) has already reminded us that the vast majority of what's being peddled as "hi-res" isn't higher-than-CD resolution anyway. Remember folks, this paper is a meta-analytic compilation of 18 other research papers, most of which used experimental audio signals recorded in true high-resolution. We don't know how many of these are using actual music to test. Also have a look at Table 1 and see just how disparate the methodologies are and ponder as to whether many of these methods have bearing on listening and enjoying music! Even including papers where training was used, the composite score of "% correct" identification as summarized by the typical meta-analytic "forest plot" in Figure 2 was 52.3% out of 12,645 total trials (range of 50.6-54.0%)! (And this forest plot did not include the Meyer & Moran 2007 results which were summarized elsewhere in the paper.)
Seriously folks, if we're trying to decide whether a high-res album sounds different from a CD 16/44 (of the same mastering of course), it should not need a meta-analysis. As a consumer, I can go on HDTracks this morning and see that a 24/192 version of Eric Clapton's recent album I Still Do costs US$27.98. And the CD on Amazon is US$10.90. It looks like both the CD and download are from the same DR11 master. The question for me in considering the purchase is not whether they may sound different, but rather does this difference justify a 250% markup!? In this context, does a 52.3% accuracy rate in a research setting sound like a valuable proposition to grab the high-resolution version?
You know guys, the fact that we're even going through the contortions of complex statistical analysis after >15 years since the release of SACD and DVD-A clearly indicates that those who claim to hear "obvious" differences are plainly wrong. When a meta-analysis is used in science to gather data far and wide to find and declare statistical significance of this kind of tiny magnitude, it just means that the "signal to noise" ratio is poor and that the magnitude of the effect is obviously academic. The author stated just as much: "In summary, these results imply that, though the effect is perhaps small and difficult to detect, the perceived fidelity of an audio recording and playback chain is affected by operating beyond conventional consumer oriented levels." Notice the careful wording... In no way does it imply that these "small" and "difficult to detect" differences are necessarily "better" as audiophiles always desire to promote. I like this wording and think Dr. Reiss did a fantastic job putting this together. By the way, these results are of no surprise as we've been talking about this for years!
To me, if I were an investor in companies primarily targeting the "hi-res audio" segment after all these years, these results are actually not to be welcomed. High time to take more chips off the table hopefully with a profit because it's clear that when the market stabilizes, hype subsides, and value is priced in, the markups will have to be minimal. Of course this doesn't mean music should not be produced in the best resolution possible (especially classical, jazz, and other acoustic genres). Just that the lack of value as currently priced is actually painfully clear.
Have a great week everyone... Happy Canada Day. Happy Independence Day to the American friends!
It's summer and time to get into the great outdoors with the family. I've got some camping, trips to the tropics coming up, and planning to hit a few beaches along the way. Might not get a chance to post as much :-).
As always... Enjoy the music!