Archimago's Musings: BLIND TEST Results Part 1: "Do digital audio players sound different playing 16/44.1 music?" - Devices Unblinded! (Plus unusual exuberance & bias in the media?)

Saturday, 4 May 2019

BLIND TEST Results Part 1: "Do digital audio players sound different playing 16/44.1 music?" - Devices Unblinded! (Plus unusual exuberance & bias in the media?)

Thanks everyone for taking the time and efforts in performing the blind test which we started back in late January!

I officially closed off my survey from submissions May 1st (I promised April 30th, but gave a few hours more for stragglers from different time zones). I trust the 3 months provided plenty of time for everyone who wanted to perform the test to do so. I'll leave the blind test samples available for download for now and will take the files down in the near future.

You can read about the reason I ran this blind test in the previous post, but in a nutshell it's because of this poll result which I think reflects general audiophile perceptions on the question:

I updated the graphic above on the day the poll closed (April 28, 2019). As you can see, out of a final tally of 407 votes on that question, almost 90% of respondents believed that CD players have unique "sonic signatures". I suppose that's not a surprising number as it does seem highly intuitive! After all, with all the different CD players and 16/44.1 DACs out there, surely, companies as diverse as the mass-market brands like Yamaha, Denon, or Sony must sound quite different from more expensive Hegel or Marantz, then there are the upper echelons of audio such as Burmester, or playback combinations like Chord transport plus DAC. Sound quality "must" vary in much the same way as the remarkable cost differences!

Having said this, I trust that most audiophiles recognize that sometimes real life isn't as simple as the "received wisdom" we often hear. Intuitions and outcome don't always align. And just because each device might have a unique "sonic signature" doesn't mean the difference must be significant. Which is of course why certain ideas must be put to tests utilizing "honesty controls" to try to understand the empirical level of truth.

As described, this blind test is relatively simple. It involves 4 digital music samples of 16/44.1 bit-depth and samplerate. Each sample was of between 90 to 120 seconds duration - long enough I think for A/B comparisons to be made. I used 4 different devices (blinded as Device A to D) for playing back the music and recorded each device's output with the RME ADI-2 Pro FS analogue-to-digital converter in 24-bits and 96kHz.

The RME ADC is a 2-channel professional-level converter used in music production. By recording in 24/96, I'm capturing the unbalanced RCA or balanced XLR output from the CD player / DAC at a greater resolution than the original source recording itself. As demonstrated here, this ADC is capable of true high-resolution performance that easily surpasses 16/44.1 quality. Operating in 24/96, it can easily capture a flat frequency response beyond 40kHz with noise floor down to 20+ bits of resolution.

I have seen people criticize the use of the ADC believing that the RME has a greater effect than the CD players / DACs themselves. I disagree. Given the resolution of the RME, even if it subtly "colored" the sound in some way (no evidence of this at all), the variation would have been consistent across each sample and significant variations in sound from the individual DACs such as frequency anomalies, noise floor differences, and ultrasonic content up to 48kHz would still have been captured. Surely if the differences between DACs are of an obvious magnitude, that will show up! By the way, some audiophiles feel that software, cables, and even file formats result in significant differences. If so, take note when I describe the devices and playback set-up below.

Remember that I also previously commented on what I believe is the adequacy of 96kHz in capturing the effect of each device's filter setting. That which I cannot control of course is whether the respondents are playing the audio back with high quality equipment as well. Nor would I be able to determine the auditory acuity of those testing... These are of course the limits we always face with "naturalistic" testing instead of bringing people into a lab individually.

For some variety, notice that the 4 music tracks were selected from various genres and could be emphasizing different sonic characteristics - some have more bass, others are vocal tracks, and others might have more dynamic range... But each track was of reasonable quality; 3 of 4 with good dynamic range DR13+ and one with DR9 representing typical "modern" mastering. I purposely didn't select DR5 or DR6 recordings which unfortunately are rather ubiquitous these days but of poor resolution for testing.

Of course, blind testing requires that we normalize the average sample volume across the 4 devices. This was performed with Adobe Audition CS6 in 32-bits resolution then saved back out as 24-bits. At most I had to boost the softest recording by +1.16dB and the loudest recording attenuated by -1.25dB. This resulted in all the recordings being only +/-0.05dB difference which should be well below threshold of human ability to consciously or subconsciously discern. Remember, listeners tend to show preference towards louder recordings. Exactly what the threshold of "just noticeable difference" varies depending on the research and can be debated (like here) depending on signal type. Typically, we won't be able to detect volume differences as easily with actual music than simple test signals like a pink noise tone. The lowest I've seen people suggest for controlled tests like performing an ABX is 0.1dB. Thus my decision to accept the slight variation of up to +/-0.05dB between the samples I believe is reasonable.

Well, ladies and gents, I hope you kept track of your listening results! Let's proceed to unblind the 4 devices I used in this test. I flipped coins to randomly assign the order of these devices as A/B/C/D...

Device A = Computer Motherboard - ASRock Z77 Extreme4 (2012), phono output

This motherboard was purchased in 2012 and for years served as my main workstation with the LGA1155 Intel i7-3770K CPU inside! In fact, many of the articles on this blog from 2013-2016 were written on this computer.

Nowadays, the machine also has an 8GB nVidia GTX 1080 graphics card inside for gaming and VR. I have 16GB of DDR3 RAM on 2 DIMMs, both a SATA SSD and Firecuda HD, all running off an old SeaSonic SS-400FL2 400W switching powersupply. This machine is electrically noisy enough to create anomalies in the noise floor with one of my preamp inputs as reported last time.

Using a stereo 6' phono-to-RCA cable, I recorded the music playback (Foobar2000, WASAPI driver) through the rear audio output from the motherboard's Realtek ALC898 Codec. I applied -0.44dB for volume matching in Adobe Audition.

Device B = Apple iPhone 6 (circa 2014), phono output

Ahhh yes, one of the last iPhone models with a 3.5mm headphone jack. Apple in their infinite wisdom removed the analogue output starting from the iPhone 7's and provided their Lightning to phono DAC adaptor instead.

I believe the DAC inside is probably some type of Wolfson chip that Apple typically tends to use. Remember from the measurements awhile back, Apple uses steep minimum phase filters with the iPhone which may or may not be of any significance.

For this device, the source 16/44.1 material was converted and copied to the phone as ALAC files instead of FLAC for playback using iTunes. Audio output volume pushed to 100%. The audio recording required +1.16dB adjustment to volume match with the others.

Device C = Oppo UDP-205 (released late 2016) streaming through ethernet as Roon endpoint, XLR output

At a cost of ~US$1300 retail when I purchased it last year, this is the most expensive of the devices in this blind test. The DAC chip inside is the ESS ES9038Pro which is the current "flagship" DAC chipset made by ESS. Music was sent to this Roon-Ready device through gigabit ethernet from my Windows Server 2016 computer running Roon in an adjacent room in my house. Of course no DSP, volume adjustment, or upsampling applied in Roon. The music was recorded through the XLR output of the device to the RME ADC.

In order not to compromise the sound in any way, I used the output level of the Oppo as the "standard" which the others were all matched to! Therefore, there was no volume adjustment to the audio recording at all for the Oppo. Filter setting was the default "Minimum Phase Fast".

Device D = Sony SCD-CE775 (circa 2001) 5-disc changer playing burned CD-R, RCA output

Finally, this is our disk spinner. The only device here with mechanical parts for extraction of the audio data (the Oppo is a UHD Blu-Ray player as well but the audio is streamed and disc player unused).

It's an "old" Sony SACD player from back at the turn of the century when SACD and DVD-A were fighting for market share during the "first coming" of hi-res audio. The lossless 16/44.1 audio was burned on a Memorex CD-R using an LG CD/DVD/Blu-Ray burner at 16X and played in this 5-disc changer.

The 24/96 recorded audio from RCA output to the RME ADC was attenuated by -1.25dB to match the volume of the Oppo UDP-205.

--------------------------

There you go, the 4 devices used in this blind test. Consider your expectations if I were to list the 4 devices before you listened?! I imagine most audiophiles would no doubt be laughing at the quality of sound coming out of a computer motherboard or iPhone, right?

Remember that the purpose of this little study is to determine primarily 2 things:

1. Did listeners find that they could hear a significant difference between the devices?

I had 2 questions asking raters to comment on this; basically whether the "best" and "worst" devices in their opinion sounded very different, and if the "best" and "second best" devices differed significantly. Would the answer from respondents to these 2 questions be similar to the Steve Hoffman Forum's poll answer of close to 90% affirmative that "sonic signatures" were indeed different given how markedly varied the devices are above?

2. Were listeners able to rank the devices in a way that suggested they heard superiority in what we could consider as likely the "best" and "worst" devices?

Although each person would have different tastes in sound, I think looking at the list, one would expect that the computer motherboard "should" sound the worst. And perhaps consider that the Oppo DAC with balanced XLR output and no digital volume adjustment "should" result in the cleanest, "best" sound.

So, now that you know the devices I used... Did you rank the sound of these recordings in the fashion expected? Was it easier or harder to "hear" differences? Did the differences heard tell you anything about whether you should spend more or less money on your digital audio gear when playing back 16/44.1 music?

Other than advertising the test on a few websites (Steve Hoffman Forum, Audiophile Style, Squeezebox Forum, Audio Asylum,...), I basically sat back and watched; responding to questions as I thought appropriate... I did not look at the test results so as not to bias my own perceptions or inadvertently say something about them over the last 3 months. Obviously, since I knew the devices I had used, I did not submit my own listening impressions.

In total, the survey received 101 results! Nice, I was looking for about 100 and we did it. My thanks and hats off to these 101 audiophiles willing to put their ears and gear on the line. :-)

After having a cursory look, here's what I am sure will be the most significant result:

Audiophilia appears to be a strongly male-dominated hobby - 100 men : 1 woman... Thanks to the 1 woman who participated - you're truly special! :-)

Next time, we'll have a comparative look at the objective performance of the devices above and form a hypothesis as to what we could expect when we unpack those listening results. We'll get to see the relative frequency responses, distortion results, jitter, and impulse response characteristics for each.

----------------------------------

I noticed there was a bit of kerfuffle last week over what basically was just an announcement / advertisement for Wilson Audio and Peter McGrath demo in California. As you can see, the Stereophile readership got a bit irritated at Jason Victor Serinus (JVS) and his comments (at least before the comments went off the rails!). Notice that presumably he wasn't even the poster for this web item but decided to jump in. Obviously this was self-inflicted!

Look guys, let's speak of the obvious... Magazine writers and reviewers get paid for what they do (likely not much especially compared to the inflating costs of the toys reviewed!) and they do this likely to a significant extent because of the perks they get from associating with the Industry. Sure, it's cool to talk to various "luminaries" in the field. One gets to name-drop that they met and talked to so-and-so. Even cooler getting access to their products and the latest and greatest. And probably ultimate coolness to have an actual relationship with such individuals.

Notice that Mr. Serinus dodged the pointed question in the comments about whether he got perks from Wilson since he "owns" the Wilson Alexia Series 2's in his sound room, a device listed as US$57,900. He did say in this post that he had the first Alexia for 2.5 years - did he "own" that too?

It's important to remember this when you see certain brands appear to have a disproportionate "presence" in certain magazines and among reviewers despite what often ends up being highly suspect objective performance... For years I wondered why Synergistic Research was all over the place among audiophile magazines for their strange cables and unusual accessories, certain reviewers always seemed to find benefit from yet another one of their questionable products, and writers seemed to make an effort to comment on the brand at trade shows (so JA, ever get to the bottom of what The Atmospheres were doing?). The same thing with DeVore speakers where Mr. DeVore himself seemed to be "hanging out" with these people (here's a ghastly video for your consideration) - check out DeVore measurements here recently. Elsewhere, we observe companies like Zu where we see contrasts between the exuberance of certain reporters compared to more technical "tear downs" like this or measurements of products.

In my opinion, it looks like JVS went a bit too far in showing his personal and emotional connection with Wilson Audio. The trick is to curb one's enthusiasm so as to at least appear professional - and he failed in doing that. Instead of an independent reviewer with impartiality, his comments obviously were taken by readers as examples of himself being a spokesperson for the product. There is a big difference between the two even if we suspect in the back of our minds that the "Chinese Wall" has always been a bit porous; perhaps more so for certain writers. Good to see the readers are "watching the watchers" as it should be.

Carry on...

Happy May! I hope you're all enjoying sunshine and "May flowers" wherever you are... Enjoy the music.

ADDENDUM: May 9, 2019
As requested by some previously, here are the original "source digital" 16/44.1 files that were played by each of the test devices and recorded on the RME ADI-2 Pro FS. Please see the README.

Archimago - Do All CD Players Sound The Same - Source Digital (FLAC, 2019).zip

** Part II: Objective performance posted **

10 comments:

Allan Folz5 May 2019 at 17:15
This is funny. C was the one I subjectively, without thinking about too hard about it picked as best/standard after first listening to all the tracks. So then I started out trying to compare C against D, which I felt maybe was next best. After a while I decided I couldn't really tell a difference, so then started comparing C and A. After a time I decided, again, I couldn't tell a difference. I briefly compared C & B, A & B, A & D, before going back to C and ultimately deciding there was no audible difference among them after all.

Also, I was as rigorous as an amateur can be in deepest post-holidays winter. I spent multiple evenings over multiple weeks listening to each one carefully and comparing them using different strategies. All together, it was probably close to 10 hours, which for 90 second clips seems like a ridiculous amount of effort.

I'm going to go out on a limb and predict that there was not a statistically significant favorite. :)

OTOH, to really put this to bed, which admittedly flies in the face of my prediction, I ought to have someone re-randomize the tracks then give them a listen without trying too hard to overthink it. It would be interesting if I reached the same loose order, or at least pick either the same best or same worst.

In any case, thanks for putting these together Arch. It was a great idea.
ReplyDelete
Replies
Anonymous6 May 2019 at 04:00
I would like to see some detailed measurements of the ASRock mainboard as it uses ALC898, a former Realtek flagship and it should score better than a mainstream one like ALC892.
ReplyDelete
Replies
Steven Hill10 May 2019 at 20:57
That is interesting. I did not submit results because the only outcome was a marginal preference for C2! Of the other tracks (I excluded track 1 from my audition because I could not bear to listen to it a second time), I could hear no difference. But, and I suspect this may well be significant to the overall results, I did the test in my normal listening environment. My speakers are Linkwitz LX521.4 and the listening environment was very quiet at the time (35dBa ambient noise). Also, I listen almost exclusively to what is known as "classical music" so it's likely that I "knew" what to listen for! The outcome might have been different if I had used headphones (I have a pair of Sennheiser HD650s) but the only time those are used are when I am watching a DVD of an opera on my PC and the audio output is a USB stream to a Cambridge Audio DacMagic Plus.

So my hypothesis is that the results may be greatly subject to the environment in which the test took place and the musical preference of the listener.

Steven
ReplyDelete
Replies
Anonymous6 March 2020 at 12:04
IMO, it would have been better if for example the track with Cecile McLorin had been played on the 4 different media.

What I mean is, all the tracks are new to me, and there are obviously big differences in the level of fidelity between them. The mentioned track sounds to me as it is by far better recorded than any of the other tracks.
ReplyDelete
Replies

Add comment