Archimago's Musings: MUSINGS: Digital Interpolation Filters and Ringing (plus other Nyquist discussions and "proof" of High-Resolution Audio audibility)

Friday, 1 July 2016

MUSINGS: Digital Interpolation Filters and Ringing (plus other Nyquist discussions and "proof" of High-Resolution Audio audibility)

A couple weeks ago, Whackamus posed this interesting comment and question which I thought would be a good topic to discuss and explore in greater detail and with some examples/samples:

"I've been reading your blog for years. Or for almost four years, at any rate. I have to thank you for doing what you do. I've likewise always wanted to ask you a question, too, but I don't know how the bleep to to contact you. In any case, since I've been fretting over it afresh, I thought I'd just post it here. If you ever do decide to get to/address it, that'd be great. If not -- hey, no sweat. :)

In any case, I read the following (tonight) on the Stereophile forums:

"I personally think that MQA has some noble goals, in terms of getting as close to the original master as possible, but I think that is far less important than the elimination of the damaging pre-ringing distortion. This has been the bane of digital playback for 30 years, and over-sampling and various filter techniques have tried to deal with it, with limited success."

I won't say that I've never heard ringing -- because I probably have -- but I will say that I've never explicitly said: "Aha! Eureka! Thar be ringing!" Because -- outside of maybe a blurring during transients? -- I have no idea what it sounds like. But my question is less about MY having heard ringing than about the AUDIBILITY of ringing -- pre, post, or otherwise. In a quality DAC (which I've got to assume most of the folks posting on Stereophile.com have access to), how audible are ringing effects? Or, rather, how COMMON are they? I kind of imagine that the Meitners, Lavrys, Levinsons, Stuarts, etc. of the audio world take great care to minimize (pre-/post-)ringing effects and to eliminate ringing in the audible realm. I likewise imagine that both such things are doable, inasmuch as most of us have been enjoying digital audio for decades now. But the Stereophile poster makes it seem as if ringing is the apodeictic bane of digital audio. What am I missing?"

Beautiful question! Like other audiophiles, I've heard that the "dreaded ringing" (like the "dreaded jitter"), over the years has been on the minds of audiophiles as a nemesis which must be slaughtered! Typically, we see images like this in magazines which are of course extremely frightening to look at:

Terrible! That nice little clean digital "impulse" with defined onset and offset has become mangled into this "time-smeared" mess with all kinds of "unnatural" ringing. Most horribly of course is that "pre-ringing" before the main waveform itself (what kind of Hellspawn is an "echo" before the sound itself???!!!). Isn't it unbelievable how awful digital audio is?!

Before freaking out, let's think this through.

Since 2013, I had been exploring this phenomenon and trying to figure out for myself just how much of a problem this is from the perspective of magnitude of audible effect. Folks might want to have a look at previous articles on this:
MEASUREMENTS: Digital Filters and Impulse Response... (TEAC UD-501)
MEASUREMENTS: "Pulse Response" - 5kHz & 10kHz.

Consider for a moment what an "impulse" is in the digital world. It's a sharp transition or transient where from a baseline of 0, it instantaneously goes up to full amplitude. Numerically it looks like this (+32767 being the largest signed number for 16-bits, and -32768 the smallest):

...0, 0, 0, 0, +32767, 0, 0, 0, 0...

I think it's useful to see it as a number sequence of discreet sample points rather than some kind of waveform image as a start. When we look at images of this data with an audio editor where the "points" are conveniently connected for us, we are actually seeing the calculated interpolation as applied by the software. How this interpolation happens is a result of the function being applied which in an audio editor is represented by the line drawing we see.

When I measure an "impulse response", basically what I'm asking the DAC to reproduce (typically with a 16/44.1 signal), is that sudden sharp transition of exactly one sample in duration, asking the device to interpolate all the individual samples around that discontinuity with the filter function programmed into it. For a typical 8x oversampling DAC, that 44.1kHz is upsampled to 44.1 x 8 = 352.8kHz; or 8 intermediate samples are calculated for every single point. Realize that a "Dirac impulse" (the idealized single point spike) is not inherent in natural sounds. We do not get instantaneous transitions like this that suddenly start and stop the air waves in real life physical systems. Nor would single impulses like this sound any good anyhow! We can model it in a computer of course just like in electrical systems we can show true square waves even though in nature, ideal square waves of vertical slope do not exist either.

Suppose we start with the most basic DAC, one that does NOT offer an anti-imaging filter. A system where that single impulse point is held over the sample duration. This results in a square waveform representing that impulse across the time of the single sample as shown in the image above. When we do this, our digital data gets converted to an analogue electrical output with all the ultrasonic components of square waves - remember, an ideal square wave is a composite of all the odd-order harmonics ad infinitum. Instead of smooth sine waves, we see these blocky "digital" tracings and if we are to pass the "impulse" data through like this, the result is literally an unmodified square wave. This is what's called a "zero order hold" model of signal reconstruction; more commonly known in the audiophile world as the "non-oversampling" DAC, "NOS" DAC, and people like Audio Note might call it "1X oversampling".

When you see people show images of the squarish digital waveform like this image of a blocky sine wave:

Image from this Kickstarter project.

That's literally what NOS DAC output looks like. And these days, few DACs show disregard for artifacts like this anymore thankfully!

By turning off the digital filter on my TEAC UD-501 DAC, I can listen to this, measure it and demonstrate the effect of the lack of filtering.

Notice the "jaggy" unfiltered 1kHz sine wave at 16/44, with a rather "nice" looking impulse response measured without significant ringing (since this is an actual recording using a 24/192 ADC, note the "Gibbs Phenomenon" with the impulse waveform - see below).

But look at the "Digital Filter Composite" (again, thanks to Jürgen Reis for suggesting the use of this measurement method):

We see a terribly "dirty" result when examined in the frequency domain. Tons of noise beyond Nyquist (22.05kHz), plus the 19 and 20kHz sine waves are echoed across the spectrum. As much as some would want us to believe that time domain qualities are extremely important down to the ringing, remember that for human hearing, the frequency domain is no doubt essential to get right (the cochlea performs a type of FFT processing, and similarly this is how cochlear implants function to artificially aid in hearing when the natural cochlea fails).

Remember, digital audio is by definition bandwidth limited. That is, when we sample using a CD samplerate of 44.1kHz, reconstruction of the waveform is accurate based on Nyquist-Shannon theorem up to Fs/2, or the "Nyquist frequency" of 22.05kHz for the CD. When we reconstruct the output and do not bandwidth limit the signal, as in the case of these NOS DACs, notice all the harmonics and distortion products seeping through beyond 22.05kHz. The analog to this in the world of video and digital photography would be Moiré patterns either in the fine details or in the color banding of the image. We clearly recognize this as unwanted "detail" which was not found in the original image we captured.

So, how do we remove all that extra high frequency distortion? We use a filter of course! And in modern DAC's this is typically done with a digital oversampling process that interpolates the data so it doesn't look like these nasty square waveforms any more, but rather something approximating the sinusoidal physical air waves that we eventually hear, while suppressing frequencies not represented in the original digital signal as best we can.

Enter the Whittaker-Shannon interpolation formula - commonly known as the sinc filter. This is the mathematically "ideal" impulse response for a brick-wall low-pass filter. Behold... "Ringing":

A filter function that respects the bandwidth limited nature of the sampling theorem obviously means that the output waveform when faced with such an extreme input as the unnatural "impulse" should interpolate the signal with minimal seepage beyond the Nyquist frequency. You will see this ringing phenomenon wherever there are sudden transients containing constituent frequencies above Nyquist. For example square waves will show the "Gibbs Phenomenon" during the transitions:

Despite ringing in the time domain, when we examine the frequency domain, things look much nicer! Here then again is my TEAC UD-501, but with a sharp/steep 8X oversampling anti-imaging filter turned on:

As you can see, sine waves are smoothed out and the frequency-domain FFT composite demonstrates the benefit of the filter - good suppression of high frequency imaging; a relatively sharp "cliff" around 22.05kHz, and clean 19 & 20kHz signals with no high amplitude harmonics and intermodulation products. IMO, this is a much better result than a NOS DAC.

Which brings us to the main issue. Whereas frequency domain imaging distortion and intermodulation distortion clearly can be audible (for an example of this, go download Monty's "Intermod Tests" and have a listen), just how audible is the impulse ringing which is unavoidable for a steep low-pass filter? Specifically, how audible is the pre-ringing (because post-ringing will likely be masked naturally by reverb trails)?

IMO, the audibility is minimal if at all. Here's why:
1. The ringing is typically at Nyquist. For CD samplerate, this is 22.05kHz folks. What human can hear a low amplitude pre-ringing coming about a millisecond before an impulse at this frequency? Remember that the amplitude of the ringing is correlated to the amplitude of the "impulse". When you see measurements of the impulse response, typically this is at 100% amplitude (like that +32767 above) so the ringing you see is really a "worst case scenario", not representative of actual music.

2. Microphones and ADCs are bandwidth limited devices. Most microphones have little frequency response above 20kHz anyway as discussed recently. Remember, as I noted above, square waves and certainly single sample impulse signals are not natural sonic phenomena. Furthermore, the analogue signal from the microphone will typically be filtered by the ADC's low-pass filter as well which we never talk or obsess about in the audiophile world!

You can in fact take some music you have and upsample it from 44kHz to 176.4kHz with a steep upsampler that demonstrates strong ringing with an impulse response. Have a look in an audio editor with the "Spectral Frequency Display" and see if you notice much ringing being added around the Nyquist frequency. I have done this many times and cannot recall ever having seen any strong ringing other than with artificial test signals.

3. Empirical evidence is lacking. Talk is cheap and testimony is legion, including folks like the fellow quoted above by Whackamus, from Bob Stuart, and audiophile folk heroes like John Swenson. There seems to be this belief out there that digital filters somehow play a huge role in the sound and that somehow it needs to be specially tuned by the "gurus". I suppose promoting this point of view allows manufacturers to differentiate themselves with their version of digital filtering and allows talk of fancy terminology like an FPGA programmed to perform the signal processing. Furthermore, these claims seem to be gobbled up by the mainstream audiophile media as some kind of massive step forward in digital audio design!

Seriously folks, many audiophiles feel that NOS DACs sound great to them, yet most digital audio is designed with relatively steep filters with ringing and generally people don't complain, how much difference is there really? I have never seen a purely subjective reviewer come out and say "Aha! I know this device used a steep filter and I hear ringing!" without them knowing what the impulse response for the device looked like a priori. The difference is clearly not very obvious.

You might recall that we looked at one part of the audibility question last year on this blog with a little blind test:
INTERNET BLIND TEST: Linear vs. Minimum Phase Upsampling Filters
Using naturally recorded music starting at 24/44, a comparison was made between two upsampling filters (interpolation to 176.4kHz) with impulse responses looking like this:

Guess what, as a group, there was no evidence in the blind test results that the 45 audiophiles who tried this test actually had a significant subjective preference for one or the other filter setting. You would think that the linear phase filter with the long pre-echo would be less desirable if the effects were all that big. (See the results beginning here: The Linear vs. Minimum Phase Upsampling Filters Test [Part I]: RESULTS.)

[Please folks, let's not bring up Meridian's AES 2014 paper: The Audibility of Typical Digital Audio Filters in a High-Fidelity Playback System which confounds all kinds of things like sub-optimal dithering and as far as I can tell, didn't convincingly prove what the title claims.]

Having said this, am I saying then that filtering settings are not important? Well, I guess that depends on how one defines "important". I do want the low-pass filtering because I believe clean frequency domain performance is important - NOS would not be my preference. A flat frequency response to 20kHz, reasonable suppression of imaging, and maybe modest suppression of impulse response ringing IMO is good enough. Therefore I suspect the majority of typical settings used by DAC manufacturers would be fine if not indistinguishable.

Whether one hears it or not, as I suggested above, I think there's nothing wrong with achieving modest suppression of the ringing, especially the pre-ringing... It's a "perfectionist audio" argument rather than empirical claims of audibility I believe. What could be done? Here are a few options.

1. Go high-res. With 88.2kHz samplerate, Nyquist would be 44.1kHz, and ringing at that frequency would be way beyond the hearing ability of humans. Basically we've bought even more insurance in the event that in some situations the 22.05kHz ringing from a steep "brick wall" filter may seep into the audible range. Furthermore, it's unlikely many speakers would be able to reproduce this frequency without significant attenuation. Whether one uses a sharp digital filter or a weak one or even none at all will make little difference. Of course, not all albums currently are available in high-res (and sadly very few are deserving to be called high-resolution recordings). Note that this does not include albums that are just upsampled which applies the ringing of the algorithm used and may in fact be worse than your DAC's interpolation.

2. Use a minimum phase filter setting. Technically this isn't reducing ringing, just removing the pre-ringing component. Over the years, we've seen minimum phase settings be used in all kinds of devices from the iPhone 4/6, to the Samsung Galaxy Note 5, and even motherboards like the Gigabyte GA-Z170X-Gaming 7 a couple weeks back. Obviously even inexpensive devices can be programmed to do this. I've been using iZotope RX 5 these days as an easy tool to experiment and listen to different settings. Changing the "Pre-ringing" setting to 0 will result in a minimum phase filter.

iZotope RX 5 - Upsampling of 44kHz to 176.4kHz with linear phase interpolation.

iZotope RX 5 - Upsampling of 44.1kHz to 176.4kHz with minimum phase interpolation, same steepness.

Notice that the pre-ringing energy has been transferred to the post-ringing amplitude and duration when using the minimum phase setting. Another compromise is that there is a phase shift in the frequency domain when using minimum phase settings (not shown, but you can see a graph of this in my previous post). Finally, we can appreciate also that more energy has been transferred to the post-ringing "side lobe", and the amplitude of the initial "main lobe" isn't as strong for the same filter steepness setting. I have not heard this talked about much; sure, perhaps masking with removal of the pre-ringing is a good thing, but there is more smearing of the energy across time with a strict minimum phase setting.

For the sake of completeness, there are "intermediate phase" settings you can use for filter design. We actually have see this type of setting used over the years in my hardware tests like the old WD TV Live! This can be demonstrated by using an intermediate setting in iZotope with the "Pre-ringing" set to 0.5:

Notice at this setting, we see very mild pre-ringing with most of the energy transferred to the post-ringing like with minimum phase though the amount of post-ringing energy isn't as strong if we were to quantify it.

3. Use a slow roll-off setting. Many DACs including my TEAC UD-501 has a slow roll-off filter setting these days. One can easily do this in iZotope RX 5 by changing the "Filter steepness" setting:

iZotope RX 5 - Upsampling of 44.1kHz to 176.4kHz with steepness setting of "200". Lots of ringing.

iZotope RX 5 - Upsampling of 44.1kHz to 176.4kHz with steepness setting of "10". Ringing obviously attenuated.

As you can see, a less strong, more gentle filter will allow some imaging frequencies above Nyquist to pass through. However, clearly the ringing is less intense. This "Fourier transformation" correlation is important to keep in mind, lower ringing in the time domain implies a less steep filter and likely more imaging artifacts in the frequency domain. This is why when you see a "nice" looking impulse response like that of the PonoPlayer or emm Labs DAC2X, the first thing you should also wonder about is "does this device have a weak anti-imaging reconstruction filter?"

Like many things in nature, the act of "beautifying" one characteristic will result in less ideal performance in another domain. It would be great if we could have a nice and clean sharp low-pass filter but this would be at the expense of time domain ringing and potential temporal smear demonstrated by the impulse response. Conversely, reduction of ringing in the time domain means the strength of the low-pass filter will be reduced and the ability to suppress imaging will weaken.

Of course, there's nothing to stop us from combining points 2 and 3. For example, we can model what I found with the PonoPlayer with these settings:

Using my Focusrite Forte ADC, here are the actual measured impulse response and "digital filter composites" from the PonoPlayer compared to test tones played back using my TEAC UD-501 with 16/44 files upsampled to 24/176.4 using the filter settings in iZotope above:

Pretty close, right? In fact, I should have used an even weaker filter setting in iZotope to approximate the PonoPlayer. I think a steepness factor of 1.2 would be very close. It's of course unlikely that the "filter composite" image would look exactly the same... These are quite different DACs after all with analogue electronics different and the 64-bit iZotope RX calculations likely would be different from the mathematical precision in the PonoPlayer hardware.

There is an important point here though. If you know one of the transform pairs, like what the impulse response looks like, you'll be able to predict the frequency domain result. As you can see, it looks like Ayre used a very gentle minimum phase filter setting that allows significant amounts of frequencies >22.05kHz to pass through when playing 44.1kHz music. The designers obviously felt that this was a desirable balance for this device and the target audience.

Conclusions:
Go experiment. Have a listen to a NOS DAC or if your DAC allows the filter to be turned off, give that a try. Go try listening to various filter settings with SoX or even easier, iZotope RX with all these parameters to play with. Try some unsighted listening and see if you can consistently tell a difference. Try different types of music. For example, an aggressive, over-compressed "loud" mastering, with clipping may excite more ringing and imaging distortions (but then this kind of music is inherently distorted anyway).

No matter how much we obsess over the design of these filters, realize that there are a multitude of other extremely important factors in ultimate sound quality. No matter how picky we become as consumers, there's nothing we can do about the production side. For example, what do we know about the quality of the ADC used to convert the original performance and the nature of the low pass filtering used (see this article on the use of analogue vs. digital filters before an ADC)? Even more importantly, the quality of the mastering job. We have already seen examples of suboptimal studio mixes, pseudo 24-bit audio, and music resellers providing nothing more than Loudness Wars "hi-res" files. Unless the DAC digital filter settings are truly atrocious, do we honestly think it would make much difference given all the factors outside of our control?

Let me know about your experiences when experimenting with digital filters. Do you think the difference in magnitude is worth exploring further? Also, let me know if you come across conclusions from actual listening tests where these filter settings were assessed in a controlled fashion.

Realize that back in 2006, before ringing was brought to the spotlight with Meridian and their "apodizing" filter setting or Ayre and their whitepaper around 2009, Stereophile had an interesting article on this already. Despite the main writer wringing his hands about the importance of these filters, notice that the editors admitted to not being able to hear much difference. I concur. Certainly if I were a manufacturer looking to squeeze everything out of a design, I might want to customize the filtering to taste based on the hardware and target audience. But as consumers listening to all sorts of music with variable quality out of our control, I'd be pretty happy with a typical linear phase anti-imaging filter of moderate steepness.

For those who want to read more, consider this article in Secrets of Home Theater and High Fidelity:
Up-sampling, Aliasing, Filtering, and Ringing: A Clarification of Terminology

Notice the article above is focused on ringing in video (specifically 4K video and quality of upsampling like 1080P to 4K). Digital signal processing concepts of course apply to video as well as audio. One big difference with audio is that time only goes in one direction... You can get away with more post-ringing whereas in video, around sharp transitions, pre- and post- effects may both be very noticeable in the image.

For those who remember their maths, here's a YouTube video discussing "impulse response", "convolution", "Laplace transform", etc... Have fun!

Addendum:
A great resource to check out:
Infinite Wave SRC Comparisons
Nice interactive website to look at the various sample rate converters on the market. You can easily flip between frequency sweeps to look at imaging artifacts, cleanliness of test signal, transition bands, and impulse response ringing.

-------------

To end off this post, let's talk about a couple of items in the blogosphere lately.

First, I find it rather odd that a digital audio site would post an article like this ("Sampling: What Nyquist Didn't Say, And What To Do About It"). As a general practical article on the limits of the sampling theorem, pragmatic questions including whether one needs a filter in some instances, and how to select them in real-life engineering applications (eg. digital sampling of EKGs...), this is a great article. But what does this tell us about practical implications in audio and how is this applicable to audibility of high-fidelity playback? Sure, filters need to be selected for the application and obviously for different purposes, one can and should understand the waveform being sampled. Furthermore, sampling rate obviously needs to be commensurate with the frequency of the event being recorded. But CD sampling rate was decreed as 44.1kHz, we generally know that humans can't hear above 20kHz (sampling rate 10% above that 20kHz audibility threshold), digital audio has had at least 3 decades to refine the sound quality including filters, and as discussed above, there are some reasonable compromises to keep in mind which can be understood without a PhD in theoretical physics. Without some useful conclusions in articles like this about high-fidelity audio when posted on an audio site targeted at non-technical audiences, a typical audiophile probably leaves scratching his/her head with more questions than answers thinking there's something terribly complex and mystical in all this. IMO, this is not the case and it does the hobby a disservice to promote unnecessary uncertainties typical of FUD.

Second is of course the recent bruhaha around the audibility of high resolution audio (Reiss' "A Meta-Analysis of High Resolution Audio Perception Evaluation" in the AES). That's nice. Does it mean that suddenly Neil Young's interviews with musicians in a car and seeing them "blown away" from the sound is now true? Should we now storm HDTracks/Pono/etc. to re-buy all our favourite albums in hi-res now that it's "official"? Should we now demand audio streaming sites to carry hi-res material and greatly anticipate Tidal's MQA stream?

Of course not! Mark Waldrep (aka Dr. AIX) has already reminded us that the vast majority of what's being peddled as "hi-res" isn't higher-than-CD resolution anyway. Remember folks, this paper is a meta-analytic compilation of 18 other research papers, most of which used experimental audio signals recorded in true high-resolution. We don't know how many of these are using actual music to test. Also have a look at Table 1 and see just how disparate the methodologies are and ponder as to whether many of these methods have bearing on listening and enjoying music! Even including papers where training was used, the composite score of "% correct" identification as summarized by the typical meta-analytic "forest plot" in Figure 2 was 52.3% out of 12,645 total trials (range of 50.6-54.0%)! (And this forest plot did not include the Meyer & Moran 2007 results which were summarized elsewhere in the paper.)

Seriously folks, if we're trying to decide whether a high-res album sounds different from a CD 16/44 (of the same mastering of course), it should not need a meta-analysis. As a consumer, I can go on HDTracks this morning and see that a 24/192 version of Eric Clapton's recent album I Still Do costs US$27.98. And the CD on Amazon is US$10.90. It looks like both the CD and download are from the same DR11 master. The question for me in considering the purchase is not whether they may sound different, but rather does this difference justify a 250% markup!? In this context, does a 52.3% accuracy rate in a research setting sound like a valuable proposition to grab the high-resolution version?

You know guys, the fact that we're even going through the contortions of complex statistical analysis after >15 years since the release of SACD and DVD-A clearly indicates that those who claim to hear "obvious" differences are plainly wrong. When a meta-analysis is used in science to gather data far and wide to find and declare statistical significance of this kind of tiny magnitude, it just means that the "signal to noise" ratio is poor and that the magnitude of the effect is obviously academic. The author stated just as much: "In summary, these results imply that, though the effect is perhaps small and difficult to detect, the perceived fidelity of an audio recording and playback chain is affected by operating beyond conventional consumer oriented levels." Notice the careful wording... In no way does it imply that these "small" and "difficult to detect" differences are necessarily "better" as audiophiles always desire to promote. I like this wording and think Dr. Reiss did a fantastic job putting this together. By the way, these results are of no surprise as we've been talking about this for years!

To me, if I were an investor in companies primarily targeting the "hi-res audio" segment after all these years, these results are actually not to be welcomed. High time to take more chips off the table hopefully with a profit because it's clear that when the market stabilizes, hype subsides, and value is priced in, the markups will have to be minimal. Of course this doesn't mean music should not be produced in the best resolution possible (especially classical, jazz, and other acoustic genres). Just that the lack of value as currently priced is actually painfully clear.

----------

Have a great week everyone... Happy Canada Day. Happy Independence Day to the American friends!

It's summer and time to get into the great outdoors with the family. I've got some camping, trips to the tropics coming up, and planning to hit a few beaches along the way. Might not get a chance to post as much :-).

As always... Enjoy the music!

58 comments:

Gurpreet1 July 2016 at 18:10
According to Rob Watts, the designer of Chord DACs like Hugo and Mojo, has written on Head-Fi that an impulse input into a DAC is an "illegal" signal as it is not bandwidth limited. So, does it mean that to apply Shannon-Nyquist theorem, both input and output have to be band width limited (which in case of RedBook would be 22KHz) ?
ReplyDelete
Replies
Gadgety2 July 2016 at 07:47
Interesting read. I seem to remember that higher resolution allows for less steep, more benign and hence less ringing filters? I believe these effects must be tiny in comparison with the speaker-in-a-room reproduction, and speaker element ringing, room reflections, standing waves etc. Interesting the comparison with video/visual material. MadVR in the moving image domain does massive oversampling afaik, and apply various type of filters, which can be turned on or off to see the effects of various combinations. HQPlayer seems to do something similar in the audio domain.

As for Izotope software it would be fun if had a randomizer for different filters to allow some blind play back. Would be even more fun if I as a consumer could get the music files and open them up, and play with the settings to essentially remaster aspects of the sound on the fly, like it's done in the Izotope demo videos. Yeah, I know, that would be the last thing the record companies, and probably the artists would want. It would be fun, though.
ReplyDelete
Replies
VK3 July 2016 at 12:52
Nice post as always.

About the hi-res thing, i don't get it: if it is so much good than 44/16, why they have to "preach" so hard? Isn't the difference obvious?
The way some sites put it seems that anyone can hear the difference and it's clear by now that only very few trained ears can distingish 44/16 from 96/24+ (by the way, i think we knew this already...).

So... nothing really new in this "meta" paper.

Best regards!
ReplyDelete
Replies
Arve3 July 2016 at 17:21
On the meta-analysis paper: The Theiss/Hawksford study should have been eliminated, and the inclusion of it puts the rest of the sources in the study in question. I'm just going to repeat something from Reddit's /r/audiophile :

First off, listening level isn't controlled for, so there is no guarantee that the difference isn't merely in identifying the differing noise floor.

The bit and sample rate conversion is not controlled for. There are pretty huge variations in the performance of sample rate converters, as evidenced by these measurements [1] and without having characterized the performance of one, the paper is pretty much only testing the performance of the sample rate converter itself.

Going further, the Hawksford study is not controlled for audible intermodulation artifacts - something I hardly even think people thought of in 1997. (This is also a general criticism of the entire meta analysis - systems trying to reproduce ultrasonics, without being capable can demonstrably yield audible, and measurable artifacts, and any study that doesn't control for it is basically completely invalid)

(Yes, one can argue that explicit mention of listening levels shouldn't _need_ to be included, but it, along with intermodulation artifacts are two issues that are known to be error sources in ABX comparisons, but given that this study has results that are very far off all of the other studies, and the one studies that pulls the numbers into "significance" territory, it should be viewed much more rigorously)
ReplyDelete
Replies
Anonymous4 July 2016 at 01:03
Only audiophiles could beat themselves up this way! The standard linear phase filter is the 'correct' filter. Any filter that moves the ringing to the post- side is changing the signal phase. Only if the ringing was at non-ultrasonic frequencies would any of this be an issue. It isn't, so it isn't.

Linear phase speaker crossover filters are more interesting because, by listening to a single driver, the pre-ringing is audible, apparently. I say, "apparently", because a single driver already sounds mighty odd, anyway - the pre-ringing is just a small effect that, maybe, some people might enjoy listening for. The slopes used in speaker crossovers are very shallow in comparison to a "brick wall" so the ringing is low. But of course the whole idea is that, combined with the theoretical ringing of the neighbouring drivers which is in the opposite phase, the ringing sums to zero.

A person could worry about these things, or they could simply accept that by embarking on the path of serious investigation and listening tests, they are going to ruin their enjoyment of their system, lose money, and possibly spoil their enjoyment of music altogether. Instead of buying several DACs, they could just buy some better speakers - which could indeed be audibly different, as opposed to the DACs which won't be.
ReplyDelete
Replies
Honza4 July 2016 at 02:01
I repost under correct account - Personally when I need it I use SoX with 95 passband, aliasing ON and linear phase. I know that the default is aliasing off but I prefer less ringing and some aliasing. Standard 44.1 kHz I do not resample (upsample) I leave it on DAC if it needs it. Maybe my preference for that setting is cause by lower ringing made possible through aliasing. As is written here http://src.infinitewave.ca/help.html "... It should be noted that the "ringing" of filters during SRC is mostly concentrated near Nyquist frequency because this range contains variations of the frequency response (here it is usually a range of 20-24 kHz). Even though it is in the ultrasonic range, there is some evidence that excessive ringing of an SRC filter negatively affects the overall sound, smearing the stereo image and reducing the clarity of bass ... " So some caution for ringing is good I think.
ReplyDelete
Replies
Honza4 July 2016 at 10:26
And one more addition - on cheaper onboard DACs like Realtek ALC892 it seems to me beneficial (subjectively) to enable SW upsampling of CD rate 44.1 to 96 kHz - probably because filtering is better at this rate. May be this can be tested in the future here also - I do not have neccessary tools ready.
ReplyDelete
Replies
Honza4 July 2016 at 10:34
P.S. I wonder how many people were confused as me when they tried to set their cheap onboard sound card to e.g. 96 kHz and obtained different (perceived as better) sound and attributed that to the sampling rate itself - thus started to prefer 96 kHz recordings just because of that experience. In my experience that is not an attribute of a recording but an attribute of some (cheaper but otherwise good) DACs that have worse filtering at 44.1 kHz, whereas the situation at 96 kHz is much better because even worse filter performs well at this rate since it is far from audible frequencies .....
ReplyDelete
Replies
pelmazo6 July 2016 at 01:07
Congrats for another excellent article, Archimago!

I have, over time, and under the impression of the kind of misconceptions you are addressing, come to an even more pointed way of describing this impulse reconstruction stuff:

Firstly, I believe now, it is necessary to clearly refute the diagram that shows the supposedly original impulse as rectangular, as plain wrong. You do that in spirit, and with the correct arguments, but I think the point needs to be made as forceful as possible, because it is the origin of the misconception. The digital data is merely a stream of numbers. A stream of numbers, where a single nonzero number is embedded in a stream of zeros, can be regarded as representing a pulse of some sort, but certainly not a rectangular one. Drawing it in this rectangular form is tacitly introducing a zero-order-hold function, and such a function does not produce a valid rendition of the signal. Full stop.

This means, that if you want to represent the original data stream graphically, you would have to put dots on the diagram, not lines. It would be OK to draw lines from each dot to the horizontal axis, but that's as far as you can take it while still being valid. Lines to connect the dots with each other would already be an attempt at reconstruction, which goes beyond merely showing the "original" data.

Secondly, every attempt at reconstructing a waveform from the data stream, which is what a DAC is supposed to do, is basically asking the question: What is the analog waveform, which would have produced this stream of data when fed into an (ideal) ADC. This is what reconstruction means: You want to reconstruct an analog waveform that you imagine has been present at an ADC's input. The digital data stream is, in this sense, not the original, it is an intermediate representation. You can of course produce a data stream artificially, with no ADC involved anywhere, but this data stream only gets its meaning when you associate it with an analog waveform that would have led to this data stream when fed to an ADC. It is this waveform which you are seeking to reconstruct.

Now, we know that for this to work, the waveform must have been bandwidth limited to half the sampling frequency. Otherwise the signal representation would become ambiguous. In other words, the waveform that you imagine was fed to the ADC to produce our data stream, must have been bandwidth limited. Hence it can't have been a rectangular pulse, because that's not bandwidth limited. Try to come up with a waveform that is properly bandwidth limited, yet would produce the stream of numbers shown when fed to an ADC. This is the crucial question here: Which analog waveform would go through all the dots formed by the data stream, and at the same time be properly bandwidth limited? This is the correct reconstruction, and hence the waveform the DAC must produce.

This is a difficult answer to figure out for many, since the waveform is shown in the time domain, while the bandwidth condition says something in the frequency domain, and you have to make the proper connection between the two. If you manage to work this out, you find that the only way to solve this is via a waveform that shows this apparent ringing. In other words, the DAC that rings is right, it comes at least close to the correct answer. If it produced anything that more resembled a rectangular pulse, it would be violating the preconditions of the whole system.

It turns out there never was a rectangular pulse, it had been a chimera all along, a misinterpretation of what the data stream actually means.
ReplyDelete
Replies
JR_Audio6 July 2016 at 12:37
Great Summary

Hi Archimago.

I have just found the time to read this blog and congratulation, this is a nice kind of summary about digital filters. Nice work.

@Honza: If you up-sample form 44k1 to 96k, you need a digital filter set at 44k1 within this process, no matter what you are doing.

Juergen
ReplyDelete
Replies
Anonymous6 July 2016 at 15:29
I am a big fan. But, I think there is still considerable confusion, misunderstanding, etc. on the hi rez issue. Your comments at the end of your post on the Reiss paper perhaps amplify that.

It is helpful to first go through the Mark Waldrep comments you also cited. We have to take him with a grain of salt. But, I think it is clear that in today's marketplace, there is a lot of "fake" hi rez out there. Hi rez really has to start with the original recording. Analog or RBCD masters, though upsampled to hi rez specs, are fakes.

To really get the advantages of hi rez, one must have it through the entire chain, from recording through playback. I believe many audiophiles, and even "scientific" testers, like Meyer&Moran, have missed this. They used quite a few analog remasterings in their testing. Many audiophiles, sometimes angrily, insist it is all BS, because there is no difference to their ears in listening to RBCD vs. what they understood to be "hi rez". Many download websites have unfortunately delivered fakes rather than the real deal.

But, it is merely the GIGO concept. Years ago, a friend wanted to have all his videos transferred from VHS tape to DVD because, as we know, DVD had much better picture quality. I, with hesitance, had to tell him the quality would be no better.

The point is many negative audiophile conclusions may have been reached based on an inferior sample of recordings where the hi rez was not truly hi rez. Pop music engineering practices are typically troublesome, because usually we have no idea exactly what went on in creating the master from which the playable recording was derived. I do not think a hi rez remastering of some oldie rock classic from analog tape, for example, will be at all revealing of what hi rez can do. Ditto for an RBCD resolution digital master.

Me? I am predominantly a classical music listener to natively recorded hi rez in Mch, which is a more recent development over the last 15 years or so. To me the advantages are clear, though perhaps subtle. It is an improvement, not a gee whiz breakthrough. I say that based on my own testing of hi rez vs. RBCD in stereo. But, it is fairly clear consistently in comparisons of the RBCD layer vs. the DSD stereo layer, both from the same hi rez stereo master on a hybrid SACD. Volume matching can be a bit tricky for that test, however.

But, though the question has largely been ignored due to the niche status of hi rez, I find considerable comfort in the Reiss paper that I am not just deluding myself about the potential sonic advantages of hi rez. Many other test subjects, particularly ones trained in what to listen for, in careful testing seem to hear the difference, too, depending on the quality of the experiment.

But, it is all perceptual. We know in full detail what the measurements say about RBCD vs. hi rez. You gotta go with what you think sounds best. Papers are useful, but your own listening comparisons with reasonable controls are best.
ReplyDelete
Replies
StevenS7 July 2016 at 11:32
True, Reiss does write "In summary, these results imply that, though the effect is perhaps small and difficult to detect, the perceived fidelity of an audio recording and playback chain is affected by operating beyond conventional consumer oriented levels." But in speaking to press, he said "Audio purists and industry should welcome these findings -- our study finds high resolution audio has a small but important advantage in its quality of reproduction over standard audio content." Which to my reading is a claim that goes well beyond what his data supports, and hints a a bias of his own. https://www.sciencedaily.com/releases/2016/06/160627214255.htm
ReplyDelete
Replies
pelmazo10 July 2016 at 03:55
Before the introduction of the CD, there have been a number of perceptual studies. Hence I don't think you are right here. The discussion about which sample rate to choose, and which bit depth to implement, went on for several years in the advent of digital audio, and of course perceptual studies were done to help and underpin the decision.

You can't, of course, expect that the results would be completely unanimous, hence at some point you have to jump and decide on the basis of the available information. And at that point in time, it seemed to be quite clear from the perceptual studies that were available, that a bandwidth up to 20 kHz was already generous and offered considerable margin. Similarly, 16 bits were also regarded as ample, which is perhaps illustrated by the fact that initially, PCM processors that allowed digital audio recording on analog consumer grade VTR boxes, considered 13 bits to be sufficient for a consumer grade format. It didn't prevent them from being hailed as a great step forward in fidelity, several years prior to the introduction of the CD.

35 years on from there, we still have no convincing and clear evidence that the choice of wordlength and sampling rate made back then was inappropriate, and needs to be increased to provide appreciably better fidelity for the consumer.
ReplyDelete
Replies
pelmazo10 July 2016 at 03:57
My previous comment was supposed to be an answer to Fitzcaraldo215, but the indentation somehow didn't happen. Sorry for that.
ReplyDelete
Replies
Honza12 July 2016 at 00:04
Ane one more thing about perceptual testing. While it is very important for evaluation, we have to think carefully about it. Imagine e.g. if the CD standard would have been set at 15/42 kHz (not impossible). Would you think that perceptual tests would lead us to adopt e.g. 24/48 container when it became generally available ? I do not think so, we would have a lot of studies telling us that it is not statistically significant to discern 15/42 from 24/48 similarly to those that tell us the same with 16/44.1 vs 24/48, just in that case the margin is smaller. So, Perceptual tests are important but we have also to evaluate the technical parameters of a recording - if it records additional audio information or not, where "audio" means what can be heard by humans under imaginable and reasonable circumstances (e.g. 20-20 kHz and appropriate time slices) and provides undistorted record of what was originally performed.
ReplyDelete
Replies
AJ Soundfield19 July 2016 at 14:51
Always amazing when 70+ year old hearing males using speakers with response that plummets above 20k and would explode way before >16bit dynamic range, can clearly hear the benefits of Hi-Re$ higher sample rates and greater word length. Must be that dreaded "time smear" that Reiss mistakenly called "unknown reasons" from the data mining.
Now if they could just say exactly what specific track and what specifically to "listen" for, to hear these elusive benefits. But alas, they never do. Like the Hypersonic effect, you just feel better, for "unknown reasons".
Well, like the magic cable guys would say, you must "experience it for yourself" (translation, buy it).
Eyes open so can hear that time smear now.
ReplyDelete
Replies
Danzy Jones12 March 2017 at 14:23
Hi
This is probably off topic but related? I'm trying to understand some of the concepts discussed here and finding your blog very helpful.
I have a question (if you have the time or inclination!) -
I digitize my Vinyl and use Sound Studio (a cheap but excellent App) on the Mac. Unlike any other software I know I can record upto 2.88Mhz (!) from a 192khz/32 feed. This is resampled to 960khz and saved. I then use iZotope RX to resample to 48k/32bit as it will accept upto c.960khz.
The result is astonishing to my ears - better than a straight 192khz recording by far - it has a similar clarity to SACD (JRiver) conversions I've done to 384khz -> 48khz PCM.
Am I doing something wrong?! I see I'm basically Upsampling on the fly,
I'd be interested in your thoughts
ReplyDelete
Replies
Anonymous5 May 2017 at 19:33
This comment has been removed by the author.
ReplyDelete
Replies
allhifi30 October 2017 at 01:47
Archimago: Your arguments are passionate, and come across as far more emotional than professional.

Here are some examples;

"3. Empirical evidence is lacking. Talk is cheap and testimony is legion, including folks like the fellow quoted above by Whackamus, from Bob Stuart, and audiophile folk heroes like John Swenson. There seems to be this belief out there that digital filters somehow play a huge role in the sound and that somehow it needs to be specially tuned by the "gurus". I suppose promoting this point of view allows manufacturers to differentiate themselves with their version of digital filtering and allows talk of fancy terminology like an FPGA programmed to perform the signal processing. Furthermore, these claims seem to be gobbled up by the mainstream audiophile media as some kind of massive step forward in digital audio design!

"these claims seem to be gobbled up by the mainstream audiophile media"

(Mainstream media are nothing short of simple folk -not scientists, mathematician's, researchers or even sensible/critical listeners).

The lack of scope and references in your arguments renders it both baseless and pointless -to be fancifully endured by those lacking academic discipline (or even respect) themselves.

Please, re-read your comments;

"Guess what, as a group, there was no evidence in the blind test results that the 45 audiophiles who tried this test actually had a significant subjective preference for one or the other filter setting. You would think that the linear phase filter with the long pre-echo would be less desirable if the effects were all that big. (See the results beginning here: The Linear vs. Minimum Phase Upsampling Filters Test [Part I]: RESULTS.)"

( The 45 "audiophiles" who took the test ! Seriously ? )

[Please folks, let's not bring up Meridian's AES 2014 paper: The Audibility of Typical Digital Audio Filters in a High-Fidelity Playback System which confounds all kinds of things like sub-optimal dithering and as far as I can tell, didn't convincingly prove what the title claims.]

"and as far as I/you can tell" (How reassuring).

Your prolonged blathering could have been summarized (i.e. edited) in a few sentences -at most.

Emotional banter is no substitute for valued research and is indeed the bane of rational discussion and intelligent research/analysis -progression.

As a final thought, have you ever considered (is it Dr.?)Robert Stuart's impressive credentials -the least of which would be his academic excellence ?
Perhaps you should.

As a final thought, once the Mesmerizing Quantum Audio dust has settled (for you), perhaps then you may wish to revisit your past arguments, very little of which would pass for rational argument let alone rigorous science.

peter jasz
ReplyDelete
Replies
allhifi30 October 2017 at 09:39
Archimago: I forgot to "reference" more of your statement -the purpose of copying/pasting as I did in my previous reply.

Namely:

"There seems to be this belief out there that digital filters somehow play a huge role in the sound and that somehow it needs to be specially tuned by the "gurus"..."

I'm not sure who you are referring to, but digital filters DO contribute a significantly -so much so that some listener's don't even like the particular sound of their DAC without a specific filter applied.

Why do you think that a growing count of DAC designer's provide at least 3 (sometimes 6) to choose from? And, in doing so, the discussion of digital filters ensues ...

" ...I suppose promoting this point of view allows manufacturers to differentiate themselves with their version of digital filtering and allows talk of fancy terminology like an FPGA programmed to perform the signal processing."

YES. It does allow sensible discussion and contributing factors to be shared.

" .... and allows talk of fancy terminology like an FPGA programmed to perform the signal processing."

CORRECT. It does "allow" this kind of talk. And why should manufacturer's not discuss the latest thinking/programming unique to FPGA capabilities ?

Start fighting the good fight. If you believe all is good with previous (digital audio) efforts, such say so.

If I'm not mistaken you (and others) have correctly pointed out the desirability of a 50 KHz audio passband -as does MQA.

What they do with legacy signals (within MQA)may be argued. Or indeed MQA processing of native 192/384 (20-24 bit) material.

In the end, any disagreements (such as yours) should have been argued during the Patent Application process, and not postured-upon on public hifi forums. Bitching afterwards (patent approval) is too late -and ultimately too weak an argument.

Rest assured, any Patent application of this magnitude is often peer-reviewed -vigorously. Any and all claims by MQA, if not their own research was referenced to the appropriate published papers. Quite simply, it was scrutinized in sensible detail.

pj

ReplyDelete
Replies
Leonardo DiCaprio27 March 2018 at 14:50
I have been in the tinnitus community for a while now and I have experience what you are going though with tinnitus , I do know people that have got
tinnitus from having their implants done. Ringing in the ear after an implant can go away on its own, but that is BS of what the mouth and jaw specialist said. I have definitely seen and heard of a connection between tinnitus and teeth and mouth. There is a treatment for ringing ears and if you are
interested you can contact Dr William herbal remedies that what I take that stop the ringing in my ear that was really driving me crazy, don't Lose hope .,I have found a solution for my fellow citizens . If you haven't I would suggest trying Dr William herbal supplement. Again, this is a thing that takes time and usually requires time and patience for it to work WITH no side effects. But I'm not hear to sell you the product, I'm only hear to tell you that it works, but it is a journey and can be difficult. I hope at this point it want away on it own. Good luck . his email address drwilliams098675@gmail.com for advice and for his product
ReplyDelete
Replies