I. BackgroundWell, the time has arrived to open up the covers and see what the data reveals!
As a recap, I direct you to the post "INTERNET BLIND TEST: Linear vs. Minimum Phase Upsampling Filters" where the test was introduced and invitations sent out for participants to be involved. In preparation for some of the discussions here, I invite you to read up on an excellent "primer" on digital signal processing done by Kieran Coghlan ("Up-sampling, Aliasing, Filtering, and Ringing: A Clarification of Terminology") published on Secrets of Home Theater and High Fidelity in May. Note that digital processing affects both audio and visual technology, hence the discussion applies to 4K video as much as it does to hi-fi sound. In it, he talks about the "Fourier pairs"; functions have both time and frequency domain effects. Simply put for us in audio as it relates to this test, the steeper the DSP function in the frequency domain (eg. a steep "brick wall" filter), the more the effect in the time domain (ie. ringing). Here's a chapter in the book The Scientist and Engineer's Guide to Digital Signal Processing for those who want to go into even more of the mathematics.
The reason I want to explore this in a blind test is simply because time domain plots of discontinuous signals as produced by DAC upsampling antialiasing filters (generally presented as an "impulse" / Dirac delta function plots) are often used to portray the reputed benefits of various digital filters in the audio world. Furthermore, there are those who write about and suggest that differences in upsampling digital filter parameters affect the sound in very substantial ways. The idea that if we decrease ringing, especially the pre-ringing prior to the main impulse signal, could lead to significant improvements in sonic quality and that it is desirable to aim for the use of minimum-phase filters (and by extension, perhaps it would be good for the audiophile to purchase a DAC that has this feature). Post-ringing is said to be less problematic as auditory masking reduces audibility.
Therefore, to investigate this question, the SoX (Sound eXchange) open-source audio processing program was used to create two versions of test music by upsampling with either a typical linear phase or minimum phase algorithm from 44kHz to 176.4kHz. This is integer upsampling (4x), and mimics what might happen internally in an upsampling DAC. Even though playback at 176.4kHz may still be further upsampled internally in the DAC (converted to Delta-Sigma modulation, etc...), the digital filtering effect would have already been "imprinted" in the test samples to evaluate for sonic differences.
The SoX (current version 14.4.2) settings were...
sox xxx.wav -b 24 xxx_out.wav rate -v -s 176400
sox xxx.wav -b 24 xxx_out.wav rate -v -s -M 176400
Resultant impulse response from these settings (Waveform Display and Frequency Spectral Display):
The time scale on the X-axis is 25 ms. This gives you an idea of how extended the ringing goes on for using these SoX filter settings! As you may expect given the discussion above about "Fourier pairs", this amount of ringing is due to the SoX steep filter (-s switch). The resampling bandpass is extended from 95% to 99% (DC to 21.83kHz for 44.1kHz samplerate, based on -3dB point) for both linear and minimum phase settings. Other than lack of pre-ringing, notice the phase distortion with the minimum phase upsampling algorithm resulting in the higher frequency components being delayed. By the way, -v switch is for "very high" quality processing - anyone know if this is 32-bits or 64-bits internally?
The research question is this: given what are clearly extremely steep filter settings that induce very significant amounts of ringing in the time domain (thus potentially audible), do test subjects (naturalistic public sample of music lovers/audiophiles) significantly show preference for either the linear or minimal phase setting?
II. ProcedureAs usual, the selection of test samples is essential! In this regard I would like to thank Juergen for the excellent suggestions and samples. Two of the three recordings were sourced from 24/88 captures done with equivalence stereophony technique with no dynamic range compression, and of course no DSP applied. These segments ("Mandolin" and "GrandPiano") are purely acoustic. The last recording is a commercial blues track from AudioQuest of Mighty Sam McClain's "Give It Up To Love" from the album of the same title (used based on the principle of fair use for research purposes and academic discussion). I believe these recordings are of impressive quality and can be used in concert with excellent high-fidelity hardware to evaluate for time-domain differences as required in this test.
Here is the Dynamic Range Meter result for the files showing slight variation in the peak and average amplitudes. Average RMS amplitude is within 0.01dB for each track. Average of DR13 for all test samples signifies good dynamic range as expected for more natural sounding recordings.
Each of the samples (24/44 "Mandolin", 24/44 "GrandPiano", and 16/44 "Give It Up To Love") were trimmed to 60 seconds and the original audio data was upsampled using SoX as per the command lines above to create two versions (linear & minimum phase) of 24/176.4 material. These files were randomized and renamed to sample A or B:
Because this is supposed to be a "blind" test, I was curious whether people would use software like AudioDiffMaker and audio editors to look at the audio data. In order to embed a "test" of sorts for this, I purposely increased the value of a single sample in a couple of the tracks. This would not affect audibility but would stick out like a "sore thumb" for those who look using the spectral frequency display appearing as if there is content beyond the 22.05kHz limit from a 44kHz samplerate source. I monitored some of the forums and received E-mails from folks who discovered this.
|Example of a purposely placed spectral peak in "Give It Up To Love B".|
Data was collected anonymously through an account on freeonlinesurveys.com with IP filtering turned on so only a single response can be sent from each IP. I wanted to know general demographics like age and location, whether respondents had preference for sample A or B, the subjective sense of ease or difficulty differentiating the samples, whether speakers or headphones were the primary transducer, whether they had experience in music production / is a musician / writes for audio publications. Collection window was 2 months from the time the test was posted to when the survey was stopped. This should have provided adequate time for all who wanted to submit their results.
The test was "advertised" on numerous "audiophile" forums in the hopes of attracting those with equipment capable of performing the test - access to computer audio, DACs capable of high-resolution 24/176.4 playback, and those likely to spend time listening more intently for differences. Forums where the invitations were sent included:
III. ResultsOver the 2 months of data collection, I received a total of 45 responses. I actually received a total of 47 responses but had to delete 2 of them due to the use of equipment clearly incapable of 24/176.4 like headphones directly off a Microsoft Surface Pro. Demographic characteristics are as follows (ignore the "Standard Deviation" calculation as this is mostly meaningless for the data):
44 men to 1 woman! Such is high-fidelity audio I suspect; at least it's a reflection of those in our hobby obsessive enough to try a test such as this!
24-bit vs. 16-bit Trial. This time, there was a peak in the 51-60 years group and an interesting steady balance of those in the decades from 21-50.
previous post. Most respondents were from North America with Europe coming next, followed by Asia and Australia/New Zealand. Again, not surprising and consistent with the readership I've seen for this blog.
I asked respondents to describe their audio system so I could get a sense of what kind of equipment was used... As usual, there were some very respectable systems represented:
DACs: LH Labs Geek Pulse Fi, Cambridge DACMagic Plus, Denon DRA-F109, Oppo BDP-103, TEAC UD-301, Emotiva DC-1, Chordette Qute EX, iFi Nano, Chord QBD76HDSD, Oppo BDP-105, Lynx Hilo, TacT, Musical Fidelity V90, Schiit Modi 2, Meridian Explorer2, Ayre QB-9 DSD, NAD M51, Onkyo DAC 1000, Rega DAC-R, Hugo, Oppo HA-2, Tascam UH-7000, LH Labs Geek Pulse XFi, MBL DAC, Audio-GD NFB 11.32, iFi iDSD Micro, ESI Dr. DAC Prime, CEntrance DACmini CX, Benchmark DAC1 HDR (note, the ASRC resamples everything to 110kHz so some alteration to the signal with this DAC), Schiit Yggdrasil
Speakers: EVE Audio SC208, KEF 201, MBL, Davis Acoustics MV7, PMC Twenty.23, Reimer Tetons GS, PMC TB2i, Wharfdale W90, Zen Adagio, Polk TSi400, KEF LS50, Canton Ergo 690, B&W 802D, Jamo X870, Dynaudio C1 + Sunfire subs, Thiel 2.2, M-Audio monitor
Headphones: AKG K7XX Massdrop, Sennheiser 265, Beyerdynamic DT990, modded Fostex T50RP, Sennheiser HD800, AKG K701, Beyerdynamic DT880, Mr. Speakers Alpha Prime, NAD Viso HP 50, Philips X1, PSB M4U-1, Sennheiser HD650, Hifiman HE-400i, Beyerdynamic DT770 Pro, Audeze LCD-3, Sennheiser HD590
As you can see, the respondents were generous in their description of the gear used. Certainly from the list above, it would be fair to say that the target audience has been "tapped", and the gear used to evaluate, although fairly broad, do represent some excellent sampling of quality equipment. Out of the interest of space, I of course did not include other associated gear on that list which included some high-end computer server systems, streamers like the Auralic Aries, or excellent quality pre-amps and fancy amps used with those speakers.
There was approximately a 50:50 split between those who used headphones compared to speakers:
Of the respondents, 11% (5/45) are musicians (pro or regular performer), 9% (4/45) were involved in music production (audio engineer, recording, mixing, mastering), and 0/45 admitted to writing for an audio publication (online or print).
As a start, let me just show you the results of the test - no analysis for now, just the "raw" data of preference and subjective confidence for each musical sample identified with whether it was the linear phase version or minimum phase:
A. "Mandolin" Sample:
As you can see, 55.6% preferred the minimum phase upsampled version of "Mandolin". We also see that many (66.7%) felt confident about their choice, rating the audibility as either "moderate" or "high" confidence...
B. "GrandPiano" Sample:
C. "Give It Up To Love" Sample:
So far, we see that out of the 3 samples, the respondents showed a preference for the minimum phase upsampling with 2/3 ("Mandolin" and "Give It Up To Love"). Interestingly, the "GrandPiano" linear phase sample was preferred.
In order not to make this post too long, I'm going to proceed with Part II: Analysis and Conclusions to have a closer look into the data and see if we can come up with some more defined conclusions...