This post is a continuation of RESULTS: The Linear vs. Minimum Phase Upsampling Filters Test (Part I) where I had already summarized the rationale, procedure, and description of the 45 test respondents including basic demographics, equipment, and raw results.
IV. AnalysisIn this segment, let's try to ask some questions to see if we can come up with answers on the significance of the findings themselves. I think the best way to interrogate the data might be to ask a few questions and see if an answer can be teased out...
A. Are these preferences for minimum vs. linear phase filters significant?
For all the individuals who responded as "both sounded the same", suppose we split them up 50:50 because they randomly selected sample A or B in a forced choice. We would come out with the following result with p-values calculated using binomial probabilities (0.5 chance, one-tailed test):
B. Did the people with higher confidence that they heard a difference show a preference of linear vs. minimum phase setting?
Okay, let's look at the sub-sample of respondents who either felt they had "moderate" or "high" confidence that they heard a difference between the two samples. Realize that this significantly reduces the sample size down; for "Mandolin" this means a sub-sample of 30 total, "GrandPiano" 26, and "Give It Up" 24. As a result statistical power is significantly reduced:
C. Was there a difference between those who use speakers vs. headphones?
Okay, there was almost a 50:50 split between speaker listeners and headphone listeners. Let's see if there's any difference between those groups.
What we're seeing here is interesting! If we look at the respondents using speakers to evaluate, you see primarily a skew towards the minimum phase filter setting with both "Mandolin" and "Give It Up". "GrandPiano" still has a very small preference towards the linear phase filtering.
However, the response from headphone listeners is actually quite different. For some reason, they very much preferred the linear phase setting for "GrandPiano" to the point where it reached statistical significance with p<0.05. Also, for the "Give It Up" sample, there was no preference for either the minimum or linear phase setting which represents a shift towards preference to the linear phase upsampling compared to the speakers group (which preferred minimum phase).
D. Did the musicians and those involved in music production show any unique characteristics?
My feeling is that there just was not enough data to give a good answer. However, the thing that struck me was that both these groups tended to rate lower confidence for their preferences; the majority picking "low confidence (doubt I'd pass a formal ABX)". Only 1/7 of respondents who is a musician and/or involved in music production voted "high confidence" for having heard a difference for any of the samples.
E. How many people preferred all 3 minimum phase or linear phase samples?
Those preferring all linear phase settings: "Mandolin" A, "GrandPiano" B, "Give It Up" B - 5
Those preferring all minimum phase settings: "Mandolin" B, "GrandPiano" A, "Give It Up" A - 5
As you can see, the exact same number of respondents consistently selected the same type of filter. Given a total of 45 responses, by chance if a person only randomly selects A or B, expectation would be 5.6 "guessing" the correct combination. No evidence therefore for any special preference.
V. Discussion / ConclusionsIn summary, based on the results of a "blind" survey distributed over the internet using high quality musical samples processed with a extremely steep linear or minimum phase up-sampling filters, 45 "audiophile" respondents utilizing higher quality equipment submitting results over 2 months, we see the following:
1. There was a trend in 2 out of 3 musical samples towards preference of the minimum phase filter setting. In 1 of 3 samples, the trend was towards the linear phase setting. It is of course possible that there were not enough respondents and if we had a larger sample, statistical significance could be achieved in the overall result as it relates to the observed trends. In any case, I did not see consistent preference in the overall group for linear or minimum phase setting.
2. There was a general tendency towards the minimum phase setting with those listening with speakers whereas headphone users seemed to skew more towards the linear phase setting. This brings up interesting questions about the differences between the sound presented through speakers (especially soundstage and imaging qualities) versus how sound is perceived through headphones (free of room interactions, lower channel crosstalk, mental integration of the stereo image). Perhaps the digital filter settings should be taylored to the type of listening.
3. The "GrandPiano" sample was the only one that had a result which reached the p<0.05 statistical level of significance with one of the subanalyses. And this was a preference towards the linear phase setting with headphone users. Why this is is unclear to me. Perhaps it has to do with the fact that this sample was simpler and contained much less high-frequency content to excite ringing?
With little high frequency ringing (little content close to Nyquist) in the "GrandPiano" sample, perhaps it is the phase shift that's more of an issue with the minimum phase setting; perhaps more easily detected with headphones and felt to degrade sound quality? Obviously this is just a hypothesis that needs further testing.
4. There was no evidence for a special group of "golden ear" respondents consistently preferring all 3 linear or minimum phase samples. Furthermore, I did not see any special preference towards linear or minimum phase settings with the 7 respondents involved in music production or performance.
As you can see, preferences for one setting or another depended on the sample being tested and way of listening (speakers vs. headphones). Do not forget that the filter setting being used here is unusual in that it's an extremely steep filter with 99% (-3dB point at 21.83kHz) bandwidth. This accentuates the duration of ringing (very long pre-ringing in the linear phase filter if one believes this is "bad"), and in the minimum phase filter, will also accentuate the phase distortion in higher frequencies as I showed in the graphs for the original test invitation.
The fact that I could not find much of a consistent preference despite this extremely steep "brick-wall" type setting does bring into question whether there is anything to be concerned of with more typical (less steep) filters. For example, even SoX by default utilizes a 95% filter (-3dB point at 20.95kHz) with much less ringing as shown below:
|Spectral Frequency display - 99% bandwidth used in test vs. 95% SoX default. Notice difference in duration of ringing.|
|Difference in frequency response using 44kHz full bandwidth white noise upsampled.|
As usual, being that this test, although randomized and blinded, was conducted in the public forum through the Internet, there are many uncontrolled variables. Did the respondents have the speakers set right? Are the music player settings optimal (eg. bit-perfect)? Did anyone forget to turn off the EQ/DSP? How is the hearing acuity of the tester? etc... Nonetheless, this is a "naturalistic" sample of audiophiles and the kinds of good quality equipment being used in the wild.
Another limitation is that some listeners may have looked at the test samples in an audio editor before trying the test. Remember, I purposely adjusted a single sample in some of the test material to see if people picked up on it (see the Procedure section). Indeed, I received 3 E-mails or private messages about this over the 2 months from folks who did this before listening claiming the algorithm is "flawed" or that my samples are "corrupt". Remember, these are single sample changes in files with a sample rate of 176.4kHz; absolutely inaudible even though visible in an audio editor. I guess 3 comments on this isn't a large amount.
I would therefore encourage everyone to continue experimenting as they see fit. Personally, I like the idea of DACs having the option to try different settings for 44/48kHz playback (it really doesn't matter with 88+kHz material since the ringing would be way out of audible frequency range). Maybe a standard SoX-like 95% relatively steep linear filter, a gentle roll-off linear filter, 95% minimum phase, and slow roll-off minimum phase would be an adequate selection to satisfy all needs. Given the results here, listeners may in fact prefer one setting over another depending on the situation like speaker vs. headphone listening.
To end off, I can say that I have tried this test myself and must admit that it's not easy! Thanks to everyone for taking the time to download these big 24/176.4 FLAC files, cue up the files on their high-res audio system, and spending the time to evaluate. Based on some of the subjective comments (which I'll publish later), clearly many people spent a good amount of time listening and writing notes on what they heard. Furthermore, I want to thank Juergen for file selection, suggestions, and allowing me to use his own recordings verified to be clean of artificial processing. Also to Ingemar of PrivateBits for hosting the files over the months.
If anyone knows of good papers/links relevant to the topics discussed in this test, please let me know in the comments! I'd certainly be curious as to whether formal academic papers have demonstrated actual non-sighted audibility and listener preferences. For example, this Meridian-sponsored paper "The Audibility of Typical Digital Audio Filters in a High-Fidelity Playback System" (AES, 2014) seems to be relevant at least in title but it's $20 to buy as a non-AES member. Furthermore, the abstract appears to be discussing dithering results for some reason.
Summer is here folks! And it looks like it'll be a scorcher here on the West Coast this year... Time to enjoy some BBQs, lazy summer days, camping, and time with the family :-).
Of course... Enjoy the music...