As you know, Part I - Procedure is now published. Within it, I laid out in detail the test, how it was constructed, and how data was collected. Today, we'll embark on the exploration of the data itself. While I will try to conclude with some general points by the end of this post, I will not have had time to analyze everything quite yet. I'm currently anticipating at least another couple of posts to fully flesh out the data set including posting some of the subjective comments made by listeners. I feel this is the only way to properly thank those who took their time and provide as much information as possible to answer any lingering questions.
Let's start... Today, let's focus on the "core" or "headline" results I think most of us are interested in. Who are the people who tested and submitted results? What overall were their preferences? What was the result for each specific track? How confident were the respondents about their choice? And how significant were these findings ultimately?
I. Who were the respondents?Audiophiles of course :-). As I mentioned last time, the "advertisement" going out for this test were placed in a broad range of audio forums on the Internet. Some of these catered to more "subjectivist" listeners (such as AudioAsylum, maybe ComputerAudiophile), other sites more "objectivist" (Hydrogen Audio, Squeezebox "Audiophile" subforum), and others I would call more balanced (Steve Hoffman).
As you can see, this is a worldwide test (with some of these graphs, in general, ignore the "standard deviation" stat which is meaningless - it was just automatically generated by the survey site):
The clear "winner" were Europeans who contributed a full 53% of the results! This is followed by N. Americans with 30%, then Aussies 8%, Asians 6%, and S. Americans 2%. Too bad no representation from Africa and not surprised at the lack of Antarcticans :-). That last point is at least internally consistent with expectations.
So how about gender of the respondents:
Hmmm, looks like there aren't many women out there in the audiophile world interested in blind tests and/or MQA :-). Only one of the respondents was female - thank you for doing this!
How about age groups?
This I think gives us a good look at the age distribution of computer audiophiles - at least those most likely interested in hi-res digital audio and playback. As you can see, the peak of those who were able to give this test a try with the pre-requisite gear were clearly within the 51-70 age groups constituting 54% of all respondents.
Overall, I see that 60% of respondents used speakers, 46% used headphones which means 6% used both to evaluate the test tracks.
Not surprisingly these days, USB DACs dominate (58%) when it comes to analogue conversion with network streaming (35%) next followed by surprisingly few S/PDIF users these days (12%). Again this is quite reasonable as I think most audiophiles appreciate that asynchronous USB offers better jitter suppression (whether audible is another matter of course).
Looking at OS's, Windows gets 35% which is about three times the Mac OS (12%). Interesting that 16% used some Linux variant including Android-based machines (not surprising since audio streamer machines often run Linux software underneath; such as the Raspberry Pi, Sonore xRendu, etc...).
So how about the cost of the audio systems used to evaluate?
Remember, I'm asking people to exclude the "peripherals" like power conditioners, cables, and the like. It looks like the "sweet spot" for a good 28% of listeners is in the US$2000-$5000 range with a second bump in the US$10,000-$20,000 range. If I look at the system descriptions offered by the testers, I see some really good stuff (no particular order)...
Source: iMac, Raspberry Pi 3, Intel NUCs, Linn Majik DSM/2, Intel i7 PC, various Windows laptops, various Linux desktops & laptops, MacBook (Pro), Naim UnitiQute, Sonore microRendu, Bluesound Node 2, Aria music server, ODROID XU4, Auralic Aries Femto
DAC: Cambridge Audio DACMagic Plus, GD-Audio DAC, Jadis JS2 Mk IV, TEAC UD-501, Oppo BDP-105(D), DIY AKM AK4497 DAC, Rega DAC-R, Accuphase DP720, Oppo HA-1, Fiio X3 II, Burson Conductor V1 & V2, Auralic Vega, Mytek Brooklyn, TEAC UD-301, Schiit Yggdrasil, ASUS Xonar Essence STX, Meridian Explorer, iFi iDSD Nano, Sony NWA-30, Audiolab Q, Chord Mojo, Schiit Modi 2, Emotiva DC-1, Cambridge Audio Azur 851D, Denafrips Aries, T+A DAC 8 DSD, Gryphon DAC One, Resonessence Audio Herus, HiFiBerry DAC+ Pro, Chord QuteHD, dCS Vivaldi "stack"
Amplifiers: Linn Majik 6100, Musical Concepts Hafler mod, mbl 9007 monoblocks, Spectral DMA-150, Bryston 4B SST2, Electrocompaniet, ATC SIA2-150, various Emotiva models, Rogue Audio integrated, Parasound Halo A21, Yamaha A-S500, Benchmark AHB2, BOW Technologies Walrus, Olive Naim 72/Hicap/250
Speakers: Paradigm Studio 100v3, GoldenEar Triton One, Acapella High Violon mk VI, Linn Kaber, KEF LS50, JPW Sonata, Infinity Renaissance 90, Quad ESL 2805, Naim Allae, Thiel CS2.4SE, PMC IB2i, Genelec 8030B, Focal Alpha 80, Magnepan 3.6/R, Celestion A2, Monitor Audio Bronze 2, ATC SCM19 v.2, Definitive Technology StudioMonitor 45, Harbeth speakers, Polk TSi 400, Thiel CS3.7, Sonus Faber Guarneri Evolution
Headphones: Sennheiser HD650, Sennheiser HD600, Beyerdynamic DT-250, AKG K240, AKG K601, Audeze LCD-XC, Sennheiser HD800, Sennheiser Momentum, AKG HSC271, Grado SR 80, HiFiMan HE-500, Beyerdynamic DT 770, Audioquest Nighthawk
Whew, quite the list and that's not totally complete (I think everyone gets the idea of the breadth and depth of the gear used)... Many testers listed much more details! Some even posted their own objective equipment test results confirming low noise, high resolution capability :-). I didn't even list some other equipment like high-end NAS storage, pre-amps and headphone amps used here and details around all kinds of power conditioning, cabling, etc. Impressive systems, folks.
From the software side, numerous programs were used from foobar to JRiver to Roon to Media Monkey to Audacity to Audirvana for computers and with streamers everything from Volumio, piCorePlayer, RuneAudio to customized streaming solutions.
We also had a few testers with other musical / production experience as well as those who write audio hardware and music reviews:
17 musicians (20% of all respondents), and 13 (16% of all respondents) working behind the scenes in recordings, mixing and production.
II. What sample - MQA Core decode or standard Hi-Res PCM - did people prefer and with what confidence?We now get to the moment of truth!
TRACK 1: ARNESEN - MAGNIFICAT
As you can see, the majority preferred Sample B which is the Hi-Res Audio version of the song with a 58% preference for the PCM vs. 42% going with the MQA Core decode. Statistically, binomial one-tailed probability of achieving 48 of the same result out of 83 tests with an assumed 0.5 probability of success results in p=0.094 (z=1.32). What threshold one puts for "significance" can be debated, but we can at least say there seemed to be a bias towards the PCM sample.
There are more nuances we need to consider... Like whether this preference still holds true if we consider the level of "confidence" reported by the listeners. Here are the confidence levels rated from essentially no audible difference to "easy to tell the tracks apart":
One way we can calculate a weighted composite result is if we score by awarding 1 point for "No real difference", 2 points for "Slight difference", 3 points for "Moderate difference", and 4 points for "Clear difference". Doing this, we get a weighted composite score of 72 (40%) for MQA and 110 (60%) for standard Hi-Res PCM audio. Percentage-wise, that's a slight gain for Hi-Res audio which means that with confidence level factored in, most listeners still chose the standard PCM as preference.
TRACK 2: GJEILO - NORTH COUNTRY II
Okay, moving on to track number 2.
Wow. You really can't get any more 50:50 than this! The difference is basically the fact that we have an odd number of total responses.
Check out the confidence ratings:
As you can see, the respondents thought the difference in sound between the MQA and standard PCM with this track was even more subtle than with Track 1 above; with a full 73% describing the sound as either no different or subtle!
And what happens when we apply the weighting as above? We get a weighted composite score for MQA of 76 (47.2%) and for the Hi-Res PCM of 85 (52.8%). What this means is that even though the same number of people picked each of the samples, those who picked B (Hi-Res PCM) preferred that to a stronger degree.
TRACK 3: MOZART - VIOLIN CONCERTO IN D MAJOR
Finally track number 3...
Ahh, we see that in this one, there is a preference for MQA Core Decode of 55% versus 45% for the standard Hi-Res PCM. With binomial stats again assuming 0.5 chance of picking the MQA with each trial, p=0.189941 (z=0.878); interesting but not particularly significant.
As above, let's run it through the weighting based on confidence levels as shown here:
This pattern is similar to the confidence rating we see with track 2, the Gjeilo North Country II where we see the majority of people - 67% with this sample - thinking that at best, they were hearing subtle differences and the tracks were "hard to tell apart". With these calculations, we see that the weighted composite score becomes MQA 106 (62%), and PCM 64 (38%). It looks like more people who felt confident about their choice decided to go with the MQA. This is a strong inverse of what we saw with the first track (the Arnesen).
III. What is the final composite score and which of the two techniques did people prefer?With 3 tracks each person and 83 respondents, we can then pool the whole data set together and determine out of a total of 249 individual comparisons what the final tally looks like:
And combined with weighted totals using confidence ratings (remember - 1 point for "No real difference", 2 points for "Slight difference", 3 points for "Moderate difference", and 4 points for "Clear difference"):
Hmmmm... Statistically we're just flipping a coin here!
IV. Impression up to this point...Well dear readers, in the big picture, it's pretty clear what we're looking at here. There's just no consistent evidence of a significant difference. And even when I include the confidence score in the calculations, there's just no overall conviction in the results to suggest that there's a real preference for the MQA Core decoded version as being somehow superior to the standard Hi-Res PCM when subjected to the same upsampling filter operation (representative of how MQA typically processes the playback).
So, let's enumerate the findings into a few general impressions so far:
1. In 2/3 tracks, there was a slight preference towards the PCM track when taking into account the weighting system (Track 2 actually was essentially exactly 50:50 but the standard PCM won out in the confidence weighted adjustments). This finding was certainly not strong nor were the respondents confident about having heard a difference. Remember that this is a "forced choice" situation and would at least potentially capture any "subconscious" decision making.
2. With the data from all the tracks put together, whether unweighted or weighted with confidence data, it's pretty much a 50:50 "guess". The small overall preference towards Hi-Res PCM and the greater preference for MQA Decode in track 3 could very well be a combination of fatigue and response bias towards whatever was playing in selection "B" (positional / order bias). In this test, the standard PCM samples were labeled B 2/3 of the time and MQA was in the "B" position for the 3rd sample. I can imagine people having difficulty hearing a difference and tending to just pick "B" more.
3. From a confidence perspective, on the whole, the majority thought that at best, whatever audible difference was "subtle". Specifically, the final confidence tally looked like this:
If we look only at people who describe hearing "moderate" or "clear" differences, here is how they selected (no weightings applied):
An exact 50:50 coin toss even within the group of listeners who thought they heard significant differences to a moderate or obvious degree. Again, there is no preference towards MQA Core or just standard hi-res PCM playback.
There's more to say in subsequent posts. For the time being, these results likely would be disappointing for those who were expecting the MQA CODEC to sound significantly better (or even just "different"). It puts into question whether the encoding & decoding of the data in MQA really has added in any way to better sound through ostensible mechanisms like "deblurring". It looks like there is no evidence for this using these demo 2L files when we subject both the decoded MQA and Hi-Res PCM to the same upsampling filter. At least we can say the MQA version achieves the same sound as a 24-bit Hi-Res PCM version in a data-compressed package. Depending on which of the 3 tracks, I'm seeing that the MQA file is 60-70% the size of the original 24/88 or 24/96 downsample (remember that MQA Core only unfolds to these 2X samplerates, the rest is upsampling to the "original" samplerate). Remember, the "origami" system reduces the ability for lossless compression algorithms like FLAC to compress even further in the lowest 8 bits so it's not reasonable to expect fully half the file size. 60-70% of original size is very good compression for streaming systems but I think for one's own library, this might not be particularly meaningful given low storage costs.
Considering the length of this post, let me stop here... Although I think the results here are already quite useful in understanding what will likely be the final outcome, there's actually more we can look at. In the next week, what I'll do are some analyses drilling into the subgroups in the data - were musicians and audio engineers more likely to prefer MQA? What about those with more expensive audio systems? Were there more "golden ears" selecting all 3 MQA Decode samples or all 3 Hi-Res PCM samples? Stay tuned.
Have a great week everyone! Enjoy the music...
NEXT: MQA Core vs. Hi-Res Part III: Subgroup Analysis