Archimago's Musings: MQA Core vs. Hi-Res Blind Test Part III: Subgroup Analysis

Thursday, 28 September 2017

MQA Core vs. Hi-Res Blind Test Part III: Subgroup Analysis

This is Part III of the summary and results of the MQA Core decode vs. standard PCM Hi-Res blind test following from the "core" results last week.

At this point, we know that if we look at the "big picture", evidence suggests that there are minimal audible differences. So if we dig into the data set a bit further, let's see if we can ask questions based on the subgroups identified. I'm sure many of you have been curious about some of these questions for awhile but until having access to a large enough database such as this, it's difficult to obtain answers.

Let's then ask a few questions and interrogate the database for hints...

I. Did musicians, audio engineers, self-identified audio hardware and music reviewers show any strong preferences?

There were a total of 17 musicians, 13 audio engineers, 11 hardware reviewers, and 8 music/album reviewers in this database. Note that there is some overlap but for the sake of simplicity, let's not worry too much about this here. Let's do what we've done before and calculate the weighted totals so we combine as much data as possible (preference + rater confidence). Remember the weightings: 1 point = "essentially no difference", 2 points = "slight difference", 3 points = "moderate difference", 4 points = "clear difference".

Interesting. In the weighted results, we see that musicians seemed to have a preference towards the PCM samples whereas there were varying amounts of preference towards MQA Core with the other smaller subgroups. Folks who self-identified as writing audio hardware reviews were quite balanced and essentially 50:50. The most skewed results were the audio recording / mixing / engineering folks who preferred the MQA version more (without applying the weighting, with 13 "engineers" and 3 tracks each, there was a total of 18 votes for PCM compared to 21 MQA; interesting skew but not particularly large).

Realize that these are small sample sizes and with the weighting system applied, wider swings in values are to be expected.

II. Did price of the playback system result in a difference in preference?

This one should be interesting. Remember that often there seems to be this desire to correlate price with sound quality especially in some of the industry-affiliated writings out there. So, within the dataset, is there any evidence that those with more expensive systems like those who used US$5000+ gear select a result much different from the average?

In total, there were 34 submissions from people using >US$5000 systems. Lumped together with 34 submissions, 3 tracks each, with no weightings applied, there was a preference of 56 PCM (55%) to 46 MQA (45%). Using a one-tailed binomial test of significance assuming a 0.5 probability of outcome (ie. flip of a coin) the results are z=0.89, p=0.186 - not particularly significant with probability of seeing this level of outcome in 18.6% of random coin tosses.

With weighting applied, this is what we're seeing for the scores of preference:

Overall then the folks with more expensive systems showed a propensity towards the standard hi-res PCM sample with a weighted score of 54% compared to 50.5% when we include listeners testing with <US$5000 systems (as shown last time). Though not necessarily significant, it at least suggests a trend towards preferring the hi-res PCM if you have a more expensive (and presumably "better") audio system which was maintained when weighted with the confidence data. Clearly this is a different narrative than the claims of "hi-end" writers.

III. Was there a difference between using speakers vs. headphones only?

From a time-domain perspective, this is useful to know because headphones are typically simpler and allow the listener to experience a more direct sound from the drivers without the effects of the room which can be very significant!

Here is the weighted score for the listeners who used speakers only (n = 50):

And here is the weighted score for listeners using headphones only (n = 26):

There were 7 listeners who used both speakers and headphones excluded from this analysis. It appears that MQA is preferred by the 50 speaker-only listeners but conversely the headphone users preferred the PCM samples! Note though that there were fewer "headphone only" listeners so results would be more easily skewed especially with the weightings applied. When we take away the weightings for the headphone group, it was only 40 PCM : 38 MQA. Basically what this means is that some headphone listeners declared that when they heard a bigger difference between the two, it was in the direction of a hi-res PCM preference.

IV. What did the younger group hear?

Segregating respondents into age groups can be a bit tricky. But let's face it, given the choice I would want to have the same hearing acuity as when I was 20 or 30!

Remember this graph from last time:

We have a group of 20 "younger" respondents from ages 21 to 40. I wonder how those guys managed with this test in terms of their weighted preference?

Well, with 20 respondents, 3 tracks each for a total of 60 comparisons, without weightings applied, they preferred PCM by 33 to the MQA Core 27 (not significant, p = 0.259, z = 0.645). And when the weighted scores are plotted:

As you can see, overall they had a stronger preference for the PCM samples when combined with the confidence weightings. The interesting thing I found with this subgroup is that the preference was towards the hi-res PCM samples for all 3 tracks, plus the weighted scores pushed it towards the PCM samples even more. I had not seen this stronger tendency with every track with the other subgroups. With an n of 20 only, it would not be wise to make too much of it other than to earmark the result perhaps for further study.

V. What did the "Golden Ears" prefer?!

Alright guys, let's now look at the "golden ears" group of listeners :-). These are the guys (alas the one lady respondent did not make it into this hallowed cohort) who were able to select all 3 MQA Core decode or all 3 Hi-Res PCM samples "correctly".

Remember that even with a flip of a coin, we expect out of 83 listeners that 1/8 of them (10.4) will be able to "guess" the 3 PCM or MQA samples.

In total, I found 9 listeners selecting all MQA and 12 listeners selecting all hi-res PCM. Interesting that more "golden ears" went for the standard PCM. What was interesting however was that one of the respondents left this message for me in the comments:

For what it's worth, I participated in your 2014 survey for 24 vs 16 bit and got all 3 right (probability 1/27 or 0.037...)

Doesn't mean much, but if I my choices here are the MQA ones, maybe there is something to the de-blurring claim, because this is mainly what I found for my choices. Then maybe it's the other way around and MQA sounds worse than straight hi-res... ;-)

Well friends, I think we've found our "Platinum Ear"; the proverbial "needle in the haystack"... Indeed, this individual selected all 3 MQA choices! Good job man, and achieving it with total equipment cost in the "$2000 - $5000" range using a pair of Grado SR 80 headphones! Apparently no need for a megabuck audio system. Please PM me on Computer Audiophile, I want to learn your Jedi listening secrets. Very impressive.

As a group, I was curious as to what level of confidence these "golden ears" rated the samples:

That's interesting because on the whole, this "golden ears" group actually showed quite a bit more confidence in their answers compared to the overall data set of all respondents (again, see the post last time for a comparable graph)! Nonetheless we see that 57% of comparisons still only suggested slight audible differences and although the "moderate" group is quite large, those who offered that they heard a "clear" difference in these golden ears were only 8% of comparisons, no different really than the 7.6% of the total respondents from last week.

If you're wondering, digging deeper in the "golden ears club" were:

2 musicians preferring MQA and 4 musicians for hi-res PCM

2 audio engineers for MQA and also 2 for standard PCM

2 audio hardware reviewers for MQA and 2 for standard PCM

1 music / album reviewer for MQA and 1 for standard PCM

Pretty evenly matched except for musicians and their higher Hi-Res PCM preference.

VI. A few thoughts to end...

Well, there you have it. Some subgroup analysis of the overall data. On the whole, I think it's clear again that I'm certainly not seeing an "obvious" preference for the MQA Core decoding compared to the same recording at 24/96 Hi-Res or downsampled to 24/88 from DXD using the 2L samples. As discussed in the procedures post, remember that I have standardized all tracks to the same MQA-like upsampling filter to remove this potential variable.

However, there were a few unexpected or interesting findings. Remember that this is not necessarily statistically significant so there's no need to make a big deal out of these findings. Subgroup analysis isolates out smaller sample sets and hence the "power" of the analysis weakens. Furthermore, at no point in the analyses did I find a preference for either PCM or MQA to be >60% of the selections regardless of sample size, whether unweighted or weighted...

1. The "audio recording / mixing / engineering" group of listeners did seem to show a larger preference towards MQA. However, the slightly larger group of "musicians" did not show this same preference.

2. There was an overall preference for MQA with those using speakers only, but headphone folks seemed to have a stronger preference to PCM. So what does this mean? Probably nothing... But I suppose MQA proponents could say this is the result of better imaging or soundstaging when listening with speakers. But then again, headphones can convey better resolution. Maybe what's needed are comparisons with MQA vs. PCM binaural headphone listening for both high resolution and to look for subtle time-domain effects?

3. Folks with more expensive systems (>$US5000) overall had a preference for the PCM samples. Again, this seems counter to the claims of audiophile writers who almost universally prefer MQA despite mostly using expensive gear. If truly MQA improves resolution at the level of time domain, shouldn't folks with more expensive equipment tend to hear the benefits? Lots of possibilities here of course, one of which must be that price and sound quality do not necessarily correlate. Another possibility is that in fact the ones with expensive gear are getting benefits but they're not liking the partially-lossy nature of MQA or that it's affecting the noise floor!

4. The younger age group (21-40) preferred the standard Hi-Res PCM to a greater extent. Again, rather interesting I think considering that younger folks should have better hearing acuity. (One could say there is consistency with points 2 and 3 above where the folks with headphones and more expensive gear and potentially more resolution to listen into also preferred standard hi-res PCM!)

5. The "golden eared" cohort (either selecting all MQA tracks or all hi-res PCM) showed more confidence about their preference. But actual preference is difficult to gauge. There were overall more "PCM-preferring golden ears" than "MQA-preferring golden ears", yet one fellow who also did well on the 24-bit blind test awhile back rated a "moderate" preference towards the MQA :-). In any event, even the "golden ears" thought that the difference heard was at best "subtle" more than 50% of the time.

As suspected in Part II, we were not likely going to see massive divergences between the PCM and MQA Core results although the findings above I think are interesting and certainly can provide some direction if one wanted to take a next step and run controlled testing (in a lab). For example, let's find 20 audio engineers / trained listeners and run them through a blind listening test to see whether the slight preference towards MQA remains with good equipment. Who knows, maybe the MQA folks have already done these kinds of rigorous studies... It would certainly be interesting if such results were to be published and replicated.

As you can see, the subgroup analysis does provide interesting results for both MQA and standard Hi-Res PCM proponents; this is to be expected when we're exploring subtle differences at the margins of audibility. Overall though, I think the dataset still points to preference for standard hi-res PCM over the MQA Core decode when we take into account subgroups who should be better able to discern higher resolution.

Statistically, the database seems to suggest that younger musicians (<40), using more expensive gear (>US$5000), listening with headphones would more likely prefer the hi-res PCM. Conversely, older audio engineers, using less expensive equipment with speakers seem to like MQA Core decoding more. You decide which group you're closer to :-). Either way, nobody should be impressed with any great audible differences!

Okay then, next time, let's finish off the blind-test write-up with subjective comments from the listeners and how they would describe what they heard.

Enjoy the music everyone!

NEXT: MQA Core vs. Hi-Res Part IV: Subjective Results & The Wrap Up...

34 comments:

Mark's Blog30 September 2017 at 18:53
I am one of the "golden ears" who picked the PCM file as better in all three cases. While I have to admit I had a really had a hard time discerning the differences, it is also important that each of five times I sat down to listen (far enough apart that I actually forgot my earlier scoring, thanks to a long test period) I had the same results. During the test period I purchased a MQA capable DAC (full unfold to 192/24) and downloaded the original 2L files you used (in 44/16, 96/24/ and MQA). I had similar difficulty telling them apart, even with full "unfolding". There are very slight differences from 44/16 to 96/24 (as you might expect?), but between 96/24 and full unfold, not so much. At least with my $2500 headphone system.
ReplyDelete
Replies
SUBIT1 October 2017 at 00:41
Archimago, have you taken into consideration people who made audio analysis beforehand and "cheated" by running the files through spectrum analysis, phase, or null-testing?
ReplyDelete
Replies
StevenS1 October 2017 at 01:01
"Remember that this is not necessarily statistically significant so there's no need to make a big deal out of these findings." Might I suggest this be rewritten in all caps, and placed at the top of the article? Because none of this even establishes a compelling case for audible difference, in which case the 'preferences' are imagination-based too.
ReplyDelete
Replies
Gordon1 October 2017 at 07:43
Hi Archimago,

Thanks for doing this test and to those, like myself who contributed. It is in my experience that MQA does not offer any significant improvement over PCM. Also the results show that 2L's quality does not need MQA and sort of proves my comment a few weeks ago that when the quality already exits, MQA does not have anything more to offer.

I will of course stand corrected over much older material that may benefit from the de-blurring process, as equipment in the past may have been insufficient and therefore correctable today with new processes and equipment.

In any case, I still feel that MQA is not the sort thing the industry should be doing in terms of quality. Older recordings have been mastered time and again, many with beneficial results (Albert King is my favourite example) but these recordings remain steadfastly non hi-res. It is the music being made today that should be given proper recording equipment and work processes, so that re-mastering is not required and the public get high quality straight out of the studio.

I recently acquired a Goldfrapp (latest) album @ 24/48 and it sounds amazing, no distortion when turned up and exquisite at normal listening levels.

Similarly the Kraftwerk box set recently released, again in 24/48 sounds utterly amazing.

These 2 examples are proof to me that given the right equipment in the first place recording today can sound brilliant and do so when played back on "affordable" equipment (~ £500 to £1500), also likely, completely amazing on more expensive kit.

Just seems to me that MQA is trying to say that music reproduction had been wrong all along, both these recent recordings I refer to and your test confirm that MQA is just another way of doing the same thing.

Ultimately for all of us it is a matter of choice using your ears and whether someone chooses MQA or not doesn't really matter, just so long as what you hear pleases you and drives you to enjoy listening. However I for one will not be worrying about MQA, I am satisfied that I am not missing out.

I don't know about you but I think the discussion over MQA has run its course and people will choose with their wallets.

Thanks

Gordon

ReplyDelete
Replies
Balduin1 October 2017 at 07:45
Hi, I find it interesting, that Grado SR80 headphones were used to identify the MQA most precisely. Grado's sound signature is far from what is considered natural_accurate on headphones, as of today's research [ https://seanolive.blogspot.com/ ]. Maybe this imperfection with its highlights of some frequency ranges made the impact of MQA conversion more audible? :D

Measurements of a Grado SR80: http://en.goldenears.net/3817 which looks similar to the generic emulation of another Grado, on a calibrated headphones plugin [ https://picload.org/view/dglagpgr/grado_1000.png.html ]. Please note, that these graphs would need translation to what is perceived via loudspeakers; since the ear canal has a huge impact especially in the high frequency regions (measurements with dummy heads are usually made at an artificial eardrum-position).

Enjoy the music!
ReplyDelete
Replies
GillesP1 October 2017 at 08:15
Hi Archimago,

As the recently nominated "Platinum Ears" reader, I did send you a PM to your Computer Audiophile address as asked.

In it I mention that maybe the characteristics of the Grado SR80 headphones with their very close and detailed character helped, but I recently bought much more neutral AKG K702 and get the same impression. I'm no advocate of MQA of course, I just found more realistic details and imaging in the MQA files, whatever the reason...
ReplyDelete
Replies
StevenS1 October 2017 at 20:51
How does MQA 'deblurring' compare to Plangent processing, which operates earlier in the A/D chain, and acts on analog wow and flutter?
ReplyDelete
Replies
Yan2 October 2017 at 06:57
Thanks again for your work.

Recently I've done ABX tests at home and trying to find differences got me tired real fast and it was not "enjoying music moments".

So who can hear those subtle differences while listening to music for the sake of listening to music?

Is there a link with trained ear, like : once you're able to ear a particular thing you'll always notice it? Does it mean that the more you train your ear the less you'll enjoy day to day music?
ReplyDelete
Replies
Unknown2 October 2017 at 08:49
Archimago

Good work. I am concerned that most manufacturers are simply content to use the filters that come with the DA chips they buy off the shelf. Few if any devote effort to eliminating the audibility of filter effects.

If you are ever in Calgary then don't hesitate to contact me for a system demo.

Jeremy
ReplyDelete
Replies
Unknown3 October 2017 at 13:22
"5. The "golden eared" cohort (either selecting all MQA tracks or all hi-res PCM) showed more confidence about their preference. ... There were overall more "PCM-preferring golden ears" than "MQA-preferring golden ears", ..."

This sub-group analysis would prove, MQA is worse than PCM...? Even, with "golden eared" audiophiles...?

Isn't that a knock-out to MQA?
ReplyDelete
Replies

Add comment