Friday, 10 July 2015

The Linear vs. Minimum Phase Upsampling Filters Test [Part II]: ANALYSIS & CONCLUSIONS


This post is a continuation of RESULTS: The Linear vs. Minimum Phase Upsampling Filters Test (Part I) where I had already summarized the rationale, procedure, and description of the 45 test respondents including basic demographics, equipment, and raw results.

IV. Analysis

In this segment, let's try to ask some questions to see if we can come up with answers on the significance of the findings themselves. I think the best way to interrogate the data might be to ask a few questions and see if an answer can be teased out...



A. Are these preferences for minimum vs. linear phase filters significant?

For all the individuals who responded as "both sounded the same", suppose we split them up 50:50 because they randomly selected sample A or B in a forced choice. We would come out with the following result with p-values calculated using binomial probabilities (0.5 chance, one-tailed test):

The yellow bars are the ones preferred by the respondents. As you can see, the calculated p-values are not impressive (unlike the typical 0.05 that one would like to see for a high level of statistical significance). The best we can say is that the "Mandolin" sample produced a trend towards a preference of the minimum phase setting. "Give It Up" showed an even weaker trend towards the minimum phase setting. Interestingly, it is the "GrandPiano" sample with the strongest preference (but still >0.05), and the preference was towards the linear phase filter setting.

B. Did the people with higher confidence that they heard a difference show a preference of linear vs. minimum phase setting?

Okay, let's look at the sub-sample of respondents who either felt they had "moderate" or "high" confidence that they heard a difference between the two samples. Realize that this significantly reduces the sample size down; for "Mandolin" this means a sub-sample of 30 total, "GrandPiano" 26, and "Give It Up" 24. As a result statistical power is significantly reduced:
As you can see, the trends here are the same as the larger group of 45. There was a very small improvement in the p-value for "Mandolin" minimum phase filtered version despite the loss in statistical power... Perhaps there is a hint that this sub-group did show stronger preference towards the minimum phase setting.

C. Was there a difference between those who use speakers vs. headphones?

Okay, there was almost a 50:50 split between speaker listeners and headphone listeners. Let's see if there's any difference between those groups.



What we're seeing here is interesting! If we look at the respondents using speakers to evaluate, you see primarily a skew towards the minimum phase filter setting with both "Mandolin" and "Give It Up". "GrandPiano" still has a very small preference towards the linear phase filtering.

However, the response from headphone listeners is actually quite different. For some reason, they very much preferred the linear phase setting for "GrandPiano" to the point where it reached statistical significance with p<0.05. Also, for the "Give It Up" sample, there was no preference for either the minimum or linear phase setting which represents a shift towards preference to the linear phase upsampling compared to the speakers group (which preferred minimum phase).

D. Did the musicians and those involved in music production show any unique characteristics?

My feeling is that there just was not enough data to give a good answer. However, the thing that struck me was that both these groups tended to rate lower confidence for their preferences; the majority picking "low confidence (doubt I'd pass a formal ABX)". Only 1/7 of respondents who is a musician and/or involved in music production voted "high confidence" for having heard a difference for any of the samples.

E. How many people preferred all 3 minimum phase or linear phase samples?

Those preferring all linear phase settings: "Mandolin" A, "GrandPiano" B, "Give It Up" B - 5

Those preferring all minimum phase settings: "Mandolin" B, "GrandPiano" A, "Give It Up" A - 5

As you can see, the exact same number of respondents consistently selected the same type of filter. Given a total of 45 responses, by chance if a person only randomly selects A or B, expectation would be 5.6 "guessing" the correct combination. No evidence therefore for any special preference.

V. Discussion / Conclusions

In summary, based on the results of a "blind" survey distributed over the internet using high quality musical samples processed with a extremely steep linear or minimum phase up-sampling filters, 45 "audiophile" respondents utilizing higher quality equipment submitting results over 2 months, we see the following:

1. There was a trend in 2 out of 3 musical samples towards preference of the minimum phase filter setting. In 1 of 3 samples, the trend was towards the linear phase setting. It is of course possible that there were not enough respondents and if we had a larger sample, statistical significance could be achieved in the overall result as it relates to the observed trends. In any case, I did not see consistent preference in the overall group for linear or minimum phase setting.

2. There was a general tendency towards the minimum phase setting with those listening with speakers whereas headphone users seemed to skew more towards the linear phase setting. This brings up interesting questions about the differences between the sound presented through speakers (especially soundstage and imaging qualities) versus how sound is perceived through headphones (free of room interactions, lower channel crosstalk, mental integration of the stereo image). Perhaps the digital filter settings should be taylored to the type of listening.

3. The "GrandPiano" sample was the only one that had a result which reached the p<0.05 statistical level of significance with one of the subanalyses. And this was a preference towards the linear phase setting with headphone users. Why this is is unclear to me. Perhaps it has to do with the fact that this sample was simpler and contained much less high-frequency content to excite ringing?

With little high frequency ringing (little content close to Nyquist) in the "GrandPiano" sample, perhaps it is the phase shift that's more of an issue with the minimum phase setting; perhaps more easily detected with headphones and felt to degrade sound quality? Obviously this is just a hypothesis that needs further testing.

4. There was no evidence for a special group of "golden ear" respondents consistently preferring all 3 linear or minimum phase samples. Furthermore, I did not see any special preference towards linear or minimum phase settings with the 7 respondents involved in music production or performance.

------------------

As you can see, preferences for one setting or another depended on the sample being tested and way of listening (speakers vs. headphones). Do not forget that the filter setting being used here is unusual in that it's an extremely steep filter with 99% (-3dB point at 21.83kHz) bandwidth. This accentuates the duration of ringing (very long pre-ringing in the linear phase filter if one believes this is "bad"), and in the minimum phase filter, will also accentuate the phase distortion in higher frequencies as I showed in the graphs for the original test invitation.

The fact that I could not find much of a consistent preference despite this extremely steep "brick-wall" type setting does bring into question whether there is anything to be concerned of with more typical (less steep) filters. For example, even SoX by default utilizes a 95% filter (-3dB point at 20.95kHz) with much less ringing as shown below:

Spectral Frequency display - 99% bandwidth used in test vs. 95% SoX default. Notice difference in duration of ringing.
Difference in frequency response using 44kHz full bandwidth white noise upsampled.
If we can't find clear preferences with the amount of pre-ringing in the 99% steep filter as compared to minimum phase with no pre-ringing, then what are the chances that the comparatively small amount of pre-ringing with the default 95% SoX linear filter is of any concern? I believe it's therefore reasonable to question the importance of the various digital filter settings and not be too impressed by the impulse response waveforms. Obviously all kinds of follow-up tests can be conducted - 95% linear vs. minimum phase, 95% vs. 99% audibility, intermediate vs. linear vs. minimum phase, steep vs. gentle roll-off...

As usual, being that this test, although randomized and blinded, was conducted in the public forum through the Internet, there are many uncontrolled variables. Did the respondents have the speakers set right? Are the music player settings optimal (eg. bit-perfect)? Did anyone forget to turn off the EQ/DSP? How is the hearing acuity of the tester? etc... Nonetheless, this is a "naturalistic" sample of audiophiles and the kinds of good quality equipment being used in the wild.

Another limitation is that some listeners may have looked at the test samples in an audio editor before trying the test. Remember, I purposely adjusted a single sample in some of the test material to see if people picked up on it (see the Procedure section). Indeed, I received 3 E-mails or private messages about this over the 2 months from folks who did this before listening claiming the algorithm is "flawed" or that my samples are "corrupt". Remember, these are single sample changes in files with a sample rate of 176.4kHz; absolutely inaudible even though visible in an audio editor. I guess 3 comments on this isn't a large amount.

I would therefore encourage everyone to continue experimenting as they see fit. Personally, I like the idea of DACs having the option to try different settings for 44/48kHz playback (it really doesn't matter with 88+kHz material since the ringing would be way out of audible frequency range). Maybe a standard SoX-like 95% relatively steep linear filter, a gentle roll-off linear filter, 95% minimum phase, and slow roll-off minimum phase would be an adequate selection to satisfy all needs. Given the results here, listeners may in fact prefer one setting over another depending on the situation like speaker vs. headphone listening.

To end off, I can say that I have tried this test myself and must admit that it's not easy! Thanks to everyone for taking the time to download these big 24/176.4 FLAC files, cue up the files on their high-res audio system, and spending the time to evaluate. Based on some of the subjective comments (which I'll publish later), clearly many people spent a good amount of time listening and writing notes on what they heard. Furthermore, I want to thank Juergen for file selection, suggestions, and allowing me to use his own recordings verified to be clean of artificial processing. Also to Ingemar of PrivateBits for hosting the files over the months.

If anyone knows of good papers/links relevant to the topics discussed in this test, please let me know in the comments! I'd certainly be curious as to whether formal academic papers have demonstrated actual non-sighted audibility and listener preferences. For example, this Meridian-sponsored paper "The Audibility of Typical Digital Audio Filters in a High-Fidelity Playback System" (AES, 2014) seems to be relevant at least in title but it's $20 to buy as a non-AES member. Furthermore, the abstract appears to be discussing dithering results for some reason.

---------------------

Summer is here folks! And it looks like it'll be a scorcher here on the West Coast this year... Time to enjoy some BBQs, lazy summer days, camping, and time with the family :-).

Of course... Enjoy the music...

10 comments:

  1. I love these types of well thought out tests :)

    If I recall correctly, I did hear a difference on the mandolin music, but not so much on the piano or vocals. As I tend to like music with sharp percussive "attacks" (like mandolin, harpsichord, etc.), sounds like minimum phase with as shallow a filter as I can stand would suit me well - and that's what I settled on long ago :)

    ReplyDelete
  2. Thank you! Another one bites the dust....

    ReplyDelete
  3. Be careful of doing lots of analyses of a given data set. The more you do, the more likely you are to get a p-value < 0.05.

    ReplyDelete
  4. Be careful of doing lots of analyses of a given data set. The more you do, the more likely you are to get a p-value < 0.05.

    ReplyDelete
  5. Though I often agree with jhwalker, in this case I'll gently and respectfully disagree. The crux of my disagreement can be summed up by saying I question this sentence from Part I: "Even though playback at 176.4kHz may still be further upsampled internally in the DAC (converted to Delta-Sigma modulation, etc...), the digital filtering effect would have already been 'imprinted' in the test samples to evaluate for sonic differences."

    I think that's way too simple a statement to adequately account for all the different DACs, with their various internal filtering arrangements, on which this test was run. Combine that with the effects of the ADCs with which each recording was made, and it seems to me you've got a huge amount of potential variability in results. Who in the testing owns a DAC that uses apodizing filters, which are specifically made to substitute their response characteristics for those that preceded them? And unless apodizing filters were used for conversion, any ringing in the original recordings would be present in the filtered files. In the absence of apodizing filters, how certain are you that the resampling characteristics for the first 4x conversion would *not* be changed in various unpredictable ways by both the internal (or software, for those using it) doubling to 352.8kHz, and the subsequent sigma-delta modulation to DSD rates (or in the case of DACs with ESS chips, proprietary modulation to rates in the ~40-44.1MHz range)?

    So it's a nice idea, and certainly something all the participants could enjoy, but to run tests of statistical significance on such a potentially variable base of reported data might not tell us much, or at least much we can be very sure of.

    A couple of other thoughts about these topics:

    - In blind tests I've conducted at home, using offline conversion to DSD128 and playing it through my DAC, I have been able to identify minimum phase and linear phase reliably. I have a bit of speculation as to why this is so (which identifies yet another variability in the conditions of your test). My speakers are Vandersteens, which are "time aligned," i.e., the drivers for the various frequency ranges are positioned and the crossover is designed so that all frequencies should arrive at the listening position at roughly the same time. Minimum phase filters are "dispersive," i.e., time through the filter varies by frequency. My guess is this messes up (technical term) the time alignment, making minimum phase filters fairly easy to pick out *in my specific system.* This would of course not work for headphones or in systems where the speakers were not time aligned, or in systems where filtering (DSP) was used for frequency adjustment and incidentally affected timing.

    - This leads me to a further bit of guessing about what might possibly account for the linear phase preference with the piano track, particularly with headphones. (Of course everything I said about potential variability in the data still goes, so evaluate my guesswork accordingly.) Perhaps the dispersive nature of the minimum phase filter on a full range instrument like the piano caused more or less subtle "This can't be real" timing cues in the result. The effect, if one pictures our sonic "view" of the instrument as it was recorded to be facing the keyboard, would be to move one end of the keyboard closer and the opposite end further away, i.e., to turn it a little "sideways." If the "view" was sideways to the keyboard, then the effect, depending on which end was toward us, would be either to "stretch" or to "squash" the instrument a little. I think this might feel especially artificial with headphones, given this shifting, stretching or squashing would sound like it was taking place inside one's head.

    ReplyDelete
  6. Apodizing: Word and technically.
    Judmarc, thank you for your comments, but I have to expand just a bit the technical term Apodizing. When you look at the two different digital upsampling filters in this test, you will see, that both, the minimum phase and the linear phase, do have infinite suppression at FS/2, meaning, that every ringing that would be in the 44k1 source file (that do always occur at FS/2) will be totally removed (no matter what AD converter would have been used in 44k1 recording and for the digital down sampling filter that I have used, to create the two 44k1 sample files, out of the 88k2 recording. So to summarize. In this test, every possible ringing of the source files is removed and only the ringing of the upsampling filter is the major part. And as this SoX filter setting it the most extreme setting, so the most possible post ringing (for MP) and the most possible pre and post ringing (for LP) filter, with this extreme setting, this totally override any 4FS digital filter that is used for playback, because the 4FS playback filter, no matter if this is MP or LP or Apodizing, do modify the 88k2 ringing and not the heavy 22k05 ringing of the used SoX upsampling filter.
    Juergen

    ReplyDelete
  7. Thank you for the explanation, Juergen, I appreciate the opportunity to learn more.

    "And as this SoX filter setting it the most extreme setting, so the most possible post ringing (for MP) and the most possible pre and post ringing (for LP) filter, with this extreme setting, this totally override any 4FS digital filter that is used for playback"

    I stupidly created a problem for myself with my system just today. After it is fixed (soon, I hope), I wonder if you would have time and interest for the following: Making a couple of 24/352.8 and/or DSD128 files with the best sounding min phase and linear phase filters you know how to make, and sending them to me for testing. If you were to do this, I would prefer simple acoustic music, like acoustic guitar and vocals, or piano and vocals.

    ReplyDelete
  8. Judmarc

    I was actually one week in Hongkong for the High End Show and right now still in Taiwan for the past High End Show and will, after being back home again, make some holiday, so will have not chance to create any files for you in the next time.

    But with the "extreme" settings that was used in this test, DXD and DSD128 will make no difference for this 22.05 kHz ringing, that occurs, with that up-sampling process, no matter if I am doing "only 4FS" or as you wish, 8 FS.

    With less "extreme" settings, we could have made "better sounding" 4FS files, but the main task was, to use the "worst" sounding = "extreme" settings (take this with a corn of salt), to demonstrate the "extremes" differences.

    Enjoy listening to music. And btw, I personally would be more happy, when the industry would come back to less dynamic compressed (loudness war) music, instead of pushing the "stupid" higher sample rate way.

    Juergen

    ReplyDelete
    Replies
    1. Thanks, Juergen. My brother lived in Taiwan for a few years, though I unfortunately never got a chance to visit him there.

      I understand about the ringing. I was just curious whether I would notice any difference with non-"pathological" filters in some other aspect, e.g., dispersion/group delay.

      I wish you great enjoyment in your music listening as well. :-)

      Delete
  9. I think the reason why you can't get a statistically significant result is because;

    A) With linear phase the fundamental pre-echo is beyond most peoples hearing frequency limit (especially those in this test)

    B) With minimum phase the fundamental ringing is above most peoples hearing frequency limit (especially those in this test)

    C) With minimum phase there may be very slight phase warp inside the listeners frequency window but absolute phase of this magnitude is near impossible to detect without a reference (EG the same audio without phase shift)

    D) The amplitude of content that hits the ring/echo frequencies is already so low that the intermod components are virtually nil

    /2c

    ReplyDelete