Saturday 25 May 2024

"High-End" DAC Blind Listening Results - PART II: Results & Analyses

I imagine that the banner graphic above must be so bizarre for some audiophiles! How is it even possible that the audio output from something cheap like the Apple USB-C headphone dongle could be mistaken for the way more expensive Linn digital streamer devices that cost orders of magnitude more?!

Well, of course we can! And if we are to honestly appreciate the difference in sound output between the very cheap and very expensive (we can buy a lot of other stuff, services, and experiences for US$20,000!), IMO, as audiophiles, we must open our minds to such comparisons. The hi-fi audiophile pursuit is not a cult; everything is up for empirical examination regardless of company, price tag, or which heroic personality is attached to said product.

Last week in Part I, I unveiled the identity and discussed those DAC/streamer devices. Let's proceed today with looking at the data from the recent 2024 "High-End" DAC Blind Listening Survey which collected listener impressions for 6 weeks; plenty of time I trust for those motivated to download, listen, and offer their subjective opinions.

As usual, let's go through the data broadly and then let's see if the results can provide some answers for specific questions around audibility, preferences, and listener subgroups. I'll group these questions and evaluations into a number of Sections.

Grab your favorite beverage, have a seat, this is a pretty long one... ðŸ™‚

I. Listener Demographics:

To start, as with any test result, we must make sure to contextualize the source of the information. In this case, anonymous Internet responses were collected from audiophiles around the world. As in previous listening tests, most of the respondents are from North America and Europe as shown on this map:


Thanks especially to the American audiophiles who showed up for the test! As you can see from the small bar graph bottom left, we also got a good number from the UK, Germany, France, the Netherlands, and Italy. We got a few from Asia; this is common due to fewer native English speakers and blogs like this one is inaccessible beyond the Great Firewall of China.

It's no surprise that like the General Audiophile Survey in 2023, the vast majority of us audiophiles who bother to test out hardware stuff are men - 97%; thanks to the women who gave this test a try!



And when it comes to age, we're also seeing a similar picture as last year with peak participation in the 51-60 years old group followed closely by those 61-70 years old. This correlates to the demographic we often see at places like audio shows:



Later on, we can analyze the preferences based on "younger" (≤50, 34% respondents) compared to "older" (>50, 65%) age groups to see if there could be differences.

II. What equipment did the listeners use?

I always find it interesting to know what gear was used for testing because this is the stuff you guys own. I suspect that those of you who would try a test like this represents probably a cohort of the more curious audiophiles - not just those who would take what's written in the press at face value or buy expensive goods just for luxury.

For this test, I see that most listeners used headphones:



That definitely makes sense for detailed listening tests. Headphones could be particularly sensitive for low noise, fine details and nuances, there are no room effects and there could be isolation from ambient noise for a better experience of the dynamics (IEM & closed more isolating than open headphones). However, loudspeaker listening provides a different perspective into how we judge characteristics like the soundstage that would be very different from using headphones.

Nice to see 11% of you used both headphones and loudspeakers!

Next, let's consider the overall system price: digital player, (pre)amp, speakers & headphones only:



44% of you used systems between $2k to $10k. 16% over $10k. So the median price would be around the $5,000 mark which I think is a very reasonable amount.

As we would expect, the price of headphone systems would be less on average compared to full loudspeaker systems - notice the skew:


It would be hard to find a reasonable-sounding loudspeaker system for <$500, but this is very much doable with headphones and IEMs.

Of those who listed their gear in the comments section, let's have a look at the kinds of hardware used. I noticed a number of you had the RME ADI-2 DAC and Topping D90SE for example which I only listed once. General computers like Mac Minis, Macbook Pros and Windows PCs are not listed; let's just focus on the specific audio hardware.

Streamers / DAC:
iFi Zen DAC v.2, Topping E30 II, RME ADI-2 DAC FS, Topping E30 Lite DAC, Cambridge DACMagic 100, Gustard DAC X18, SMSL M500 Mk I, THX Onyx DAC, M-Audio Air 192|4, Rega DAC-R, EverSolo Z6, SMSL SU-10, Audio-GD R1, Violectric V800, Topping D90SE, Topping DM7, Chord Mojo 2, Lumin U1 Mini, Topping D20, Chord Hugo TT2+MScaler, Aurender N200, Orchard Audio Pecan Pi+, Topping NX4, Metrum Acoustics ONYX, iFi Zen Stream, Innuos Zenith MK2, Auralic Vega DAC, Ferrum stack (Hypsos, Wandla, OOR), E1DA 9039S, SMSL DL200, SMSL SU-8, Schiit Bifrost 2/64, Wiim Pro Plus, Aune X8 XVIII Magic DAC, Marantz SA-10, Denafrips Ares II, SMSL DO100, Chord Mojo + Poly, TASCAM CD-200SB, Gustard X16, Topping DX1, Oppo UDP-205, NuPrime DAC-10, FiiO K7, Exasound e22, RME ADI-2 Pro FS Black Edition

(Pre)Amplifier:
Audiolab 6000A, Yamaha A-S3200, NAD 350, Cambridge Topaz AM10, NuForce IA7 V3, Violectric DHA V226, Rega Brio, BV Audio PA300SSE + BV Audio Pre1, Marantz PM-11, Schiit Mjolnir 3, Topping A90 Discrete, TubeCube | 7, NAD C298, McIntosh MC275, Roksan K3, Fosi V3, Simaudio Moon Neo 340i, Electrocompaniet ECI 5 MK II, Topping PA5, Hypex NCore, TacT 2150, ZMF Homage, Decware Taboo MkIII, SMSL SH-9, Musical Fidelity KW550, Schiit Freya + Aegir, Topping EHA5, Marantz SR7012, Devialet Expert Pro 140, Topping L30 II, Gustard H16, JDS Atom 2, ZeroZone IRS 2092 monoblocks, Chord SPM 600

Headphones:

Speakers:
JBL L100 Classic, Monitor Audio Bronze 100, Genelec 8351b (with GLM room correction), Altec 14, Wharfedale Dovedale, APS Klasik Studio Monitor, Triangle Altea ESW, Dynavoice DF-5, KEF R900, Acoustic Energy AE1 Active, KEF 105.2 Reference, ProAc Response DT8, Harbeth M30.2 + REL T/9x sub, DIY SEAS-based speakers, Driade Premium 2S, Canton Reference 3.2 DC, Linkwitz LX521.4, Neumann KH310A, Eminent Tech LFT-8b, Dali Grand Coupe, Dynaudio S 1.3 SE, Dali Menuet SE, B&W 802-D3, B&W 683 + SVS SB-1000 sub, ATC SCM40A + Arendal 1723-1S sub, KEF R3 Meta, Spatial Audio Lab M5 Sapphire, Paradigm Premiere 800F + SVS PB-2000 sub, DeVore O Baby, Yamaha C40, Neumann K+H O300, Unity Audio Boulder MKI + JLAudio 12" subs, Magneplanar MG1.7, Martin Logan Ascent

Nice cross-section of hardware in use out in the real world spanning a wide price range.

Clearly many of you spent significant time listening and this shows in the detailed responses including how you set up the gear, and even descriptions of the room and various treatments and specialty cables (not listed).

I noticed 2 resubmissions where the respondent added extra information as a separate entry that I consolidated, otherwise I did not see any irregularities to the submissions suggesting that people "cheated" or data that looked bot-generated.

III. Did listeners get the impression of audible differences between the Test Samples (DACs)?


18% thought that there were no audible differences between the samples and another 25% felt there was "very little difference" thus not worth spending any money to upgrade the DACs.

Remember, we're talking about streamers/DACs varying from US$10 to US$20,000 with the sonic output captured in high-resolution and reproduced using the hardware listed above with the majority of DACs these days capable of better-than-CD quality. Almost half - 43% - of the listeners thought it simply wasn't worth spending more money given the lack of change in the hi-res audio signature captured.

That still means that 57% thought there was a difference in the sound and they would consider upgrading based on the difference. 20% of respondents went beyond and felt they "definitely need(ed) to upgrade" or it was "essential" to upgrade based on what they heard.


IV. Of the listeners who heard a difference, in what device order did they prefer?

Now we start getting into the "meat" of the matter...

Since those who said "No Difference" left the DACs unsorted (A-B-C), let's filter out that group, leaving a total of 86 (82%) responses with preference ranking:


Listeners ranked by #1 being "best" and #3 "worst". Taking an average of all 86 responses, we are able to see that in fact the more expensive and objectively better performing DACs were both able to beat the Apple USB-C headphone dongle! I think this is a nice demonstration that with enough audiophile listeners, we can actually achieve the "power" to detect differences using blind testing.

Surprised?

Notice from the length of the bars that the difference isn't by any means massive. If there is absolutely no difference, all the devices would average out to "2". The question then becomes, is this statistically significant? The "null hypothesis" being that these deviations are just a result of random chance. Let's run this through a 3x3 (3 devices x 3 categories of preferences) Ï‡2 test using the raw data to create the multivariate contingency table. 

Calculated Pearson's χ2 = 6.35, p-value = .175 therefore not significant even if we relax this to p < .10 level of confidence.

So while the overall pattern is not statistically significant by the usual definition, I would say that we're seeing a "trend" worth examining deeper. I know typically we talk about 5% (p < .05) as a threshold, but this is audio so I think it's fine to even relax things a bit.

Let's keep looking and asking questions...

V. Okay then, what if we look only at listeners who thought they heard a bigger difference and felt that the DAC was worth upgrading?

This will filter out the 26 responses that thought they heard a difference but "not worth spending money to upgrade", leaving us with 60 responses to analyze who felt more confident in their choice:



That pattern of preference for the Linn Klimax DSM/2 > Linn Majik DS > Apple USB-C dongle persists with now an even higher ranking for the Linn Klimax DSM/2. So what does the statistical analysis suggest?

χ2 = 12, p-value = .0174 therefore is likely significant with p < .05.

Way to go audiophiles!

I ran various analyses beyond this but since the number of respondents drops precipitously, statistical analysis did not show any further significance for those who expressed an even higher confidence that the DACs they chose were worth upgrading to.

Here's something interesting though - for those 6 who were the most confident that it was "essential" to upgrade the DAC because they heard a major difference, they liked the Linn Majik DS/1 + Dynamik the "best":


With only 6 samples, not unexpectedly, this was not calculated to be a significant pattern. It is however a good reminder that expressing confidence that one heard very audible differences does not mean the preference of the person was towards what we might expect to be the best or most expensive DAC.

VI. Does age play a role in discriminating differences?

As you can see in the Demographics (I) section above, there were 36 participants aged <50 years old. Of this group, 27 of them (75%) felt they heard a difference. In the >50 years old group, there were 69 and a total of 59 (86%) thought they could hear a difference.

Let's have a look at the preferences of the "younger" ≤50 years vs. the "older" >50 years groups who heard a difference:



Interesting, it looks like the younger folks were overall better able to, as a group, differentiate the Linn devices from the Apple USB-C dongle. Statistically not significant however and probably not surprising with younger ears.

VII. Was there a difference between headphone listeners vs. speaker listeners?

Check this out - for the listeners who thought they could hear a difference, let's separate them by whether they used headphones, speakers, or both:


Excellent! That pattern among the headphone listeners is highly significant with χ2 = 18.9, p-value = .00084; clearly much better than the typical p < .05 significance threshold!

To better illustrate the significance of this finding among headphone listeners, let me graph the results for each device based on the distribution of preferences:


Notice how many thought the Apple USB-C was the worst sounding of the 3 devices! In comparison, notice the relative preference for the DSM/2 as either "best" or "middle" subjective quality with few thinking it sounded the "worst". Beautiful!

So, headphone listeners as a cohort were the ones able to best differentiate the Linn DACs as sounding better than the Apple dongle. Notice that the loudspeaker listeners ranked the Linn Majik DS + Dynamik as sounding best to them and the expensive Klimax DSM/2 as being the worst! While the number is small at only 9, the Headphone + Speakers group did think the Klimax DSM/2 sounded best, but still ranked the Apple dongle as second best.

My respect to the headphone listeners as a group!

VIII. How about those with higher priced sound systems?

Already I think from what we see above achieved by the headphone listeners, the price of our hardware might not need to be too expensive to appreciate the difference between DACs. So, how did those who believe they heard a difference using <US$2,000 gear perform compared to those using >US$2,000?



Indeed, it's actually those listeners who overall spent less money that performed better picking the more expensive Linn streamers! Obviously $2,000 isn't a lot of money, but in comparison, while those who spent >US$2,000 were still able to separate out the Linns from the Apple dongle, thought that the less expensive Majik DS edged out the Klimax DSM/2 slightly.

For the subgroup who noted that they heard a difference (even if very small), using systems costing <US$2,000, with an n=35, we get χ2 = 8.06, p-value = .09. I'll leave you to decide if this is significant enough! At this lower price range, clearly we're again likely just seeing the ability of headphone listeners to discern a difference.

How did those with some of the most expensive systems do? Let's look at the 13 who had systems >US$10k:


Yikes. The listeners with more expensive systems ranked the $10 Apple USB-C dongle above the $20k Klimax DSM/2. As expected, the vast majority of these listeners are older (90% between 51-70 years old), 100% males, 92% using loudspeakers.

IX. How did the Musicians and Audio Engineering subgroups do?


While the numbers are small, we can see that the Apple USB-C dongle did really well even among these subgroups who in principle have more ear-training experience and musical knowledge. The audio engineers did show a good preference towards the Linn Klimax DSM/2.


Portrait of The Golden Ear ðŸ™‚

X. Any Golden Ears out there!?

As with my other blind listening tests/surveys, I like to look for those individuals who were able to accurately select the "best" components with confidence. For this test then, I'd like to award it to the 3 European listeners from the UK, Sweden, and Russia who selected the C-B-A order of preference (Klimax > Majik > Apple dongle), declared that the DAC upgrade was "definitely" worth it (though not "essential"), and all 3 achieved this while using gear <US$2,000!

What hardware did these guys use?
Macbook Pro M3 (audio at 96kHz) → TRUTHEAR x Crinacle Zero IEM 
E1DA 9039S / SMSL DL200 → Beyerdynamic DT770 Pro and Topping PA5 → speakers (not identified)
Nice work boys!


XI. Summary

Well done and thanks for all the time spent on the listening, audiophiles!

Although there are other correlations in the data I could look for, I think I've answered the questions I was most curious about based on the 105 of you who sent in your listening impression from around the world. Let's then summarize the most important points:

1. Audibility of DACs, even when devices are as disparate as a US$10 Apple USB-C headphone dongle and US$20k Linn Klimax DSM/2 (based on 2020 MSRP), is not high. (Section III) 18% of respondents felt they could not discern a difference, and another 25% felt that whatever differences they thought they heard were so small, that it wasn't worth spending money on. So in total 43% simply did not think there is a valuable "upgrade" path. On the other end, 20% did feel they heard a difference and thought they "definitely" or even "essentially" needed to upgrade to the better sounding device (even if what they thought was better sounding might not have been the most expensive Linn).

With the actual DACs connected to the same system, it's certainly possible that the effect would be stronger than what was heard by respondents playing back the 24/96 files. However, linnrd and I did some volume-controlled listening on the evening of the test sample recording and to be honest, even with his excellent audio system, the Majik and Klimax sounded basically identical with instantaneous A/B switching while Roon streamed the same music simultaneously to both. Even if we were able to show the Klimax beating the Majik, it would be far from a slam dunk! We can talk more about this next time when discussing subjective impressions.

2. There was a trend towards preferring the objectively higher performing and more expensive devices: $20k Linn Klimax DSM/2 > $3k Linn Majik DS + Dynamik Power Supply > $10 Apple dongle among all 86 listeners who felt they heard a difference. (Section IV)

That trend towards preferring the Linn streamers did seem to hit significance when we consider those who had more confidence in their ranking and thought it might be at least "worth spending money" to upgrade.

This finding I think is also a nod to the idea that in a blind test, audiophiles do overall prefer higher-fidelity gear which correlates to better objective performance (like the 1kHz 0dBFS THD+N), at least in this test.

3. So what subgroup of listeners was best able to rank in the expected order based on price and highest-to-lowest performing devices? Clearly the headphone listeners. (Section VII)

This subgroup of 42 respondents was the "engine" that drove audibility to the point of demonstrating a significant result! They were able to demonstrate the superiority of the more expensive Linn streamers, particularly the Klimax DSM/2, over the Apple USB-C dongle.

4. Was there a correlation towards preference for the more expensive (and higher objective performance) Linn devices with higher-price sound systems? NO! (Section VIII)

Since headphone users clearly discerned the differences better, their systems typically would be priced lower than those listening with full loudspeakers (Section II). To give a concrete number, those who spent <US$2,000 in their audio system were more accurately able to identify the Linn Klimax DSM/2 as the best sounding device with 74% of listeners in that group using headphones only and another 9% using both headphones and speakers to evaluate.

This is important because some audiophiles claim that in order to hear differences with more expensive DACs, one must in a way "match" the price of the system and the component. Clearly the "sound" of the $20k Linn Klimax DSM/2 did not require that listeners have >$20k sound systems to significantly rank as subjectively sounding "better"! A nice headphone system say around $1-2k performed well already these days and I suspect will allow the audiophile to perceive differences to a resolution that their hearing will permit. I saw no evidence among the small group of listeners with much more expensive systems that they preferred the more expensive and higher-resolution Linn Klimax DSM/2.

This suggests that when listening with loudspeakers in typical rooms, there is a lower chance of benefitting from the highest performing DACs (those with highest dynamic range, lowest distortion). This makes sense when thinking about the typical ambient noise, limitations of speaker resolution, and effect from room reflections. As usual, I would highly recommend loudspeaker listeners to focus more on the sound room before getting too excited with very expensive DACs or possibly even speakers in order to maximize quality.

5. I don't see good evidence that age itself played a big part in the results obtained. The differences in Section VI were likely the result of older adults listening to loudspeakers and younger listeners using headphones. Among the 35 who used loudspeakers, 88% were 51 years old or above as compared to only 55% of headphone respondents (n=42) being above 51 years old.

6. You don't need to be a musician or audio engineer.  (Section IX) While the audio engineer group preferred the Linn DSM/2 as sounding the best, neither of these groups in aggregate ranked Linn Klimax DSM/2 > Linn Majik DS > Apple USB-C dongle as one might expect based on objective performance and price.

--------------------

TL;DR: Yes, audiophiles can tell the difference between higher performing DACs like the Linn streamers compared to the Apple USB-C dongle with preference for the Linns. Headphone users were able to differentiate best. And one does not need to spend a lot of money to resolve the differences between the hi-res 24/96, FLAC lossless test files.

Obvious Next Step: Can listeners differentiate the $20k Linn Klimax DSM/2 from a $120 Topping D10s (or $150 D10 Balanced), both devices with THD+N better than -100dB?! Anyone up for a follow-up blind listening survey?! ðŸ¤” The hypothesis being that the objective performance is what's important and correlates with "best" sounding, not the "high-end" price or brand name (counter to Jim Austin's claim as discussed last week).

While this isn't research done in a formal audio lab with trained listeners by researchers with grants, I think what I'm offering here is a realistic evaluation based on the perception of more than 100 audiophiles around the world who donated their time. I hope we as audiophiles are not afraid to ask the questions that need to be asked and make sure that we collect the "crowd sourced" data based on controlled procedures. Employing safeguards against strong biases (like using blind listening) rather than accepting all opinions at face value is essential if we are to find truth (J Gordon Holt was right). Beyond objective measurements which we can collect and compare, this is how as hobbyists, we too can try contributing in a practical way to "science".

As per my usual sequence, next time in Part III let me publish some of the individual comments I received and the more subjective observations from respondents.

--------------------


Checking out Twenty One Pilots' latest album Clancy (2024, DR5 2-channel which is atrocious, DR12 multichannel/Atmos much better) this weekend for some indie pop / alt. rock flavors...

As discussed not long ago, the multichannel downmix is often a much different and IMO superior experience from the 2-channel mix! In comparison to the relatively small difference between DACs, this is way more important.

As usual, albums like this do not benefit from 24-bits! Waste of storage space.

I hope you're all enjoying the music, dear audiophiles!

Addendum - May 27, 2024:
Added graph showing distribution of "best"/"mid"/"worst" by subjective preference for the headphone listeners who reported hearing a difference, showing the nature of that statistically significant finding. 

42 comments:

  1. "Obvious Next Step: Can listeners differentiate the $20k Linn Klimax DSM/2 from a $120 Topping D10s (or $150 D10 Balanced), both devices with THD+N better than -100dB?!...."

    This has got to be the more interesting test! I already have a Topping D10 so if you just send me the $20k Linn I'll let you know how it goes, thanks.

    ReplyDelete
    Replies
    1. Nice one Julian... :-)
      I suspect the cost of shipping the Linn with insurance would be way more than the cost of the Topping D10.

      Delete
  2. OR!
    Between a $4.5K Chord TT2 (for example) and a $150 Topping.
    Not the most expensive from either.
    Thank you Archie, for the stats.

    ReplyDelete
    Replies
    1. Two distinct DACs, with differing technologies, hyped, etc.

      Delete
    2. Good suggestion Ken!
      I do know someone locally with a Chord DAVE so that might be an opportunity...

      Another option might be a fancy/expensive R2R DAC vs. something like the Topping D10s sigma-delta DAC.

      Many combinations and permutations!

      Delete
  3. My AirPod result not included? 😢

    ReplyDelete
    Replies
    1. Hey there ST. Yes, actually your response was included in the analysis where appropriate although I didn't list it in the hardware section but included it now...

      Initially I was thinking of adding a "wireless" category but there were only 2 of you, the other person using the Sony WH-1000XM5. Not surprisingly, neither of you thought there was an audible difference so that category skipped my mind and was not created.

      No surprise that neither of you thought there was any audible difference. The Airpods are only able to handle up to AAC 256kbps and any extra processing like ANC or Spatial more than likely would have made audibility between the test tracks worse.

      Delete
    2. I am not sure which results you referred. I didn’t know AirPod is only capable of 256! That probably explained why no diff with spatial on and off. Anyway, what I indicated was C, B,A and wrote this under the comment section in the survey. “ Hi Archi, ST here. First listening, somehow I was leaning towards C. Honestly, I don't think it made any difference that I can randomly tell one is better than the other.
      Even with the same DAC, you tend to prefer one over the other subjectively.
      Used AirPod Pro Gen2 with IPhone 15 Pro Max. Tried with Spatial on and Off.
      ST
      p.s. Thank you for making an effort for such comparisons. I hope you get many to participate.
      Cheers!“

      Delete
    3. Yes, you ranked C-B-A ST but you indicated "No difference" on the audibility question so I take it that this was more of a "hunch" than meaningful ranking hence would not have included this in many of the analyses.

      If I did, this would have made the case of headphone listeners able to differentiate the devices even stronger!

      Yeah, the AirPods are only Bluetooth AAC 256kbps lossy devices... All Apple wireless headphones at this point are based on AAC lossy codec.

      Delete
  4. A suggestion, Archimago:
    I think many people would be very happy if you could ask the people that you considered the "best" listeners if they would do an ABX test, although I've already shown that it's at least possible to do it between two of the audio samples, but maybe we can fix that somehow or exclude that one sample.
    Anybody else who would like to participate would of course be welcome.

    ReplyDelete
    Replies
    1. Interesting idea Anders,
      Without E-mail address to contact the respondents since this was all collected anonymously, it would be difficult to get everyone.

      For sure, an ABX would be great for all who responded C-B-A! Ideally bringing them into an actual hi-res playback "lab" where we can control the ABX software, and listening hardware would be awesome.

      Delete
    2. Arch, this is such important work and you are doing this entire industry a huge service! I second the request for ABX tests as part of these surveys. It would tell a lot if the people who supposedly heard a huge difference could not pass an ABX test.

      Delete
    3. Hey Taylor,
      Yeah, will need to strategize on how to get this done!

      Delete
    4. I have an idea of how it might work out. And we can talk more later if you want! But my basic idea is that, if we are comparing DAC A and DAC B in a future study, in addition to having us download samples from each DAC and giving our comments on them like we usually do, you could also have us download ten pairings of sound files. Each pairing would have a "Sample 1" and a "Sample 2," and which DAC is assigned to which sample would be randomly assigned individually for each pairing. Then, in the survey, you would ask which DAC was which for each pairing. I know many people have opinions on how you should carry out these studies, and it's so easy for us all to have lots of opinions on the matter when we're not doing the work to design and organize them! But let me know if I can help in any way. I do have PhD-level study design and statistics training, but I might have forgotten most of it by now!

      Delete
    5. Excellent Taylor!

      Hey E-mail me... Let's chat! Always fun working on a collab effort on these things. :-)

      Delete
  5. Probably a little too long for a Blog Comment, I posted a comprehensive commentary on your "Results & Analysis" here: https://audiophilestyle.com/forums/topic/69643-invite-blind-listening-survey-high-end-dacs/?do=findComment&comment=1281171.
    Thank you for this opportunity.

    ReplyDelete
    Replies
    1. Thanks Iving, I'll give the thread a look!

      Delete
  6. Looking forward to your suggested follow up test.

    ReplyDelete
  7. Looking at the DACs that the participants use, I see a wide variety with most of them modestly priced. It's interesting that the folks that discerned a difference between the recordings were using gear costing a lot less than both of the Linn DACs. Not sure what to make of that.

    ReplyDelete
    Replies
    1. Hey there Doug,
      Yeah, I think a reasonable perspective is that the playback DACs for a test like this just needs to be accurate enough to pass along the sound quality to the headphones/speakers so that listeners can judge which sounds "better" to them.

      So long as the device is "transparent" enough, it'll be fine... And transparency these days doesn't need to be expensive.

      Delete
  8. Notwithstanding the limits of the ADC ( played directly some of the source files and compared via HQP>Holo Red>Holo May), once again, the interpretation and manipulation of data proves to be an exercise of bias.
    On my expensive system, to my ears and those of my younger girlfriend, B ruled, A and C were crap. And I confirm unblinded. Not sure I/we even bothered ranking A and C, they were just unbearable pit... ears (I coin pitears if not existing yet).
    I use (very) expensive speakers with excellent in room response. There's something speakers do that I have not experienced with headphones : realistic recreation of instruments in space. B did it. C was crap by homogenization, flattening of perspectives (when applicable), an example of digital = sanitisation. I guess those who chose A were seduced by the loose flattering low end and not troubled by its counterpart, harshness, on the other side of the spectrum.
    So, rather than drawing conclusions about people and their hearings, I suggest you simply conclude that a company like Linn (already much much overrated in LP times IMO) has the marketing clout to take more money from a musically inferior product, provided it reaches its target, made for headphones listeners and/or people who favour more hifi sound, just like there is mastered for Beats headphones or iTunes. So I concur with the suggestion to compare the Klimax to other designs for nice reading specs : guess the point will be made C is really crap for the money. This being, in my personal experience, with deep clean low end system, a May is very much worth it compared to a Spring though reviewers claimed the contrary and buyers like to think so : a bit more money can be well spent in digital, but certainly not on a Klimax.

    ReplyDelete
    Replies
    1. It's legitimate to say that the validity or correctness of rankings C>B>A is superficially dependent on retail price. Weightier validity comes from the Linn factory itself. To argue that B is better than C is to override the evolved listening skills and credentials of a miscellaneous host of Linn engineers and other product development professionals over a significant period of time. I'm not a great fan of Linn speakers (cf. electronics) myself - and I consider the LP12 now over-priced. But the aggregate experience of Linn staff and Linn aficionados is not at all likely to fly in the face of reality as starkly as you suggest. Not to mention all the attendant press reviews and Forum chatter from the horse's mouth on Forums such The Wam (now defunct). Your own listening remarks reek of subjectivity, no? The drift of Archimago's findings is that three DUTs (even if they include two same-brand streamers, and are not DACs strictly - as such) can be discriminated meaningfully in blind tests. To the extent that this is true, some people perform better than others against the cost criterion but also against the cumulative external validity I have tried to depict. What's more, a conglomeration of participant factors, identified by Archimago, reasonably suggests that some people are more discerning than others. A second significant outcome of this survey - also requiring further work invested in carefully curated listening surveys - is the potential for discriminating the rational hobbyist from the "audiophool" too ready to part with substantial cash hinged four square on excessive confidence in his own listening ability. The present survey is not perfect, but it's a rare and creditworthy effort from within the hobby to address a couple of the most thorny issues of our day. Footnote: It's fortunate for Archimago that there were effects approaching, and in some micro-tests reaching, statistical significance. If the preferences/rankings had been more random, we would have had to focus on the importance of not reading too much into a null result. Specifically - scattergun listener rankings (indeed all unaccounted variance in the actual data presented) would have been attributed properly to the possibility that the additional ADC/DAC "lens" obfuscated audibility - or that the power of suggestion and other types of Expectation Bias produced listener assertions of both no and actual difference between DUTs. That statistical error should not be attached, except by way of speculation, to the assertion that participants were just as likely to prefer a cheap DAC as an expensive one.

      Delete
    2. reek is a very nasty rude word, man ... I see you refer to audiophilestyle above, the place to be for neurotics who compare the length of their... filters. I assume/ claim shamelessly and without false modesty that I have an exceptionally trained brain and an outstanding system of a level that very few people will experience for long calm periods (outside of shows/demos I mean). So, if B sounds better to me (and many high end owners it seems) I assume B is better, period. There is one knowledgeable person on AS to whom you could ask about the chips, Miska. Maybe there's a technical explanation but I don't buy the "aggregate experience of Linn staff and Linn aficionados" and that C is better because it's more expensive and newer. I wish Topping aficionados will soon be made happy by a blind test comparing one their DACs to the Klimax

      Delete
    3. The whole point of studies such as this is Objectivity. The whole point of Objectvity is that we stand a fighting chance of agreement.

      People are calling for ABX of the two Linn streamers, and Archimago is saying the "next obvious step" is comparison of the Klimax DSM/2 with Topping D10. imo there is too much emphasis on price - as if our only mission were to expose industry charlatans. It pleases me to read Archimago say, "The hypothesis being that the objective performance is what's important and correlates with "best" sounding, not the "high-end" price or brand name". I am far more interested in establishing common ground.

      btw I don't doubt Miska's technical chops - but HQPlayer was a total waste of money for me and I don't use it. Not so much to do with SQ. Everything to do with ergonomics.

      Delete
    4. btw I don't think DSM/2 vs Topping is the way to go - on the grounds that DSM/2 is too multi-functional, not just a DAC. We need two or more DACs - where the units germane are functionally just DACs. Of course it makes it interesting if they vary hugely in retail price.

      Delete
    5. Thank you for showig and analyzing all the data you got! Interesting, but don't care a single bit about the actual results per se...

      Also thank you for reminding about modern compression obsession of released music, and variations in differnt formats. Really frustrating. I'm happy that I have all my vinyls and CDs from 1980-90s left!

      Delete
    6. Hey guys, interesting discussions! Despite disagreements, I appreciate that here on the blog over the years, we can be respectful and have good in-depth debates.

      Hi Oui Oui, I don't doubt that you have a great sounding system and that you and your girlfriend preferred Sample B. That's great; looks like you preferred the Linn FPGA processing (whatever they did, maybe their filtering) + the Wolfson DAC they used + their analogue output. However, regardless of the price, the Klimax DSM/2 Katalyst is their higher grade, upgraded "next generation" DAC architecture so presumably their engineers thought this was a qualitatively better product since nobody (I hope) willingly releases newer generations with lower quality and higher prices!

      Also, interestingly even though you disagree with the sound of the Klimax DSM/2 being better, on the whole, the listeners in this survey actually preferred the DSM/2 which again can be seen as confirmatory for Linn enthusiasts that this newer device "sounds better". Plus objectively, we can see that a simple THD+N does suggest that technically the DSM/2 is also the better streamer/DAC.

      Greetings again Iving. Yeah man, personally, I don't like dwelling too much on the $$$ other than over the years reminding audiophiles that money doesn't necessarily buy "better" and there is still the concept of "diminishing returns" on price-performance that certain salesmen (like Herb Reichert in his videos) rejects for no good reason. Money might buy amazingly luxurious materials and workmanship, but I think we must dissociate "sound quality" from the luxury physical product. "High-end" in my mind has always been synonymous with "high priced", not necessarily "better sound".

      I see your point about the DSM/2 being more than just a DAC vs. a Topping D10s (or D10 Balanced). linnrd and his friends have a number of Linn streamers and I borrowed a Klimax DS/2 (Exakt) connected to my system currently to listen and test out. An older device than the DSM/2 but still costs a number of thousands MSRP, functioning as just an ethernet DAC streamer. At some point I will probably publish the measurements of these Linn devices in more detail although given that we don't have many chances at blind testing, would love to use more "extreme" devices based on price or audiophile sentiment to demonstrate audibility (or not).

      Just to be clear for those wondering... I have no animosity towards Linn; these Linn streamers just happened to be available to test thanks to linnrd and friends. It's a free world and any company can sell whatever they want. Unless a company is silly and releases worthless products (like MQA! or snake oil), I have no misgivings. :-) Even if I think the price:performance might be excessive, I am glad that the blind listeners were able to show preference towards the newer and objectively higher performing Klimax DSM/2 in this blind test!

      Delete
    7. Archi you write : "the listeners in this survey actually preferred the DSM/2" Well, from your data, younger people, people listening through loudspeakers, people listening trough the most expensive systems, people reacting most strongly to the differences (for whom "upgrading" -- actually downgrading price wise -- is essential) DID NOT.

      Delete
    8. I find it extremely interesting that a subgroup within the study clearly preferred B. Maybe different ways of perceiving music exist and that's why there are endless discussions about audio quality. The second interesting finding is that whatever it is that triggers the like and dislike survived the AD process. This would suggest that magazines should have AMPT like recordings along with their reviews.

      Delete
    9. Hi Oui Oui. Just to give you more of what I'm seeing...
      - For the "younger" folks <50 years old, n=27, it was very close between Linn DS (1.89) and the DSM/2 (1.93) vs. Apple dongle (2.19).
      - There were 16 people in the >$10k system group and 3 of them thought there was no difference, plus 1 thought not worth upgrading.
      - In those who heard the most difference - let's say everyone who thought the upgrade was "definitely" or "essentially" worth it, there were a total of 21 and it went C (1.86) > B (2.05) > A (2.1). In my graph above, I only showed the 6 that said the difference was worth an "essential" upgrade.

      Sure, we can look at the data and make direct observations. But since the raw data is noisy, IMO, it's more useful to make conclusions based on statistically significant findings.

      Delete
    10. Hi Jogi,
      Yes, I believe the AMPT is useful to demonstrate audible differences between DACs. It can show noise differences and resolution issues...

      Delete
    11. only sound conclusion is that everyone has to find out for her/himself and system. That people with headphones and cheaper systems account for the majority does not make them right neither confirms that C is better, just better suited for that majority. And quite marginally for the majority. So your test's main contribution could be that everyone has to find out for her/himself and system with a very open mind regarding prices. Yet, with HQP on a dedicated powerful machine, Holo Red and May (and rest of the system capable of clean low end and soundstaging), I don't personally subscribe to the option that money is wasted on digital.

      Delete
    12. Yes I can agree with that. Group listening helps us understand trends and broad preferences. Whether those trends apply to any one person is never guaranteed. We see this broadly across other findings in audio such as the target curve for headphones...

      As you know, I've published on HQP over the years. Yes, it's definitely capable of some excellent results and I have no doubt the HoloAudio DACs can sound great. Purchasing good products with excellent build and workmanship is certainly not wasting money... As with sound quality, each of us will need to have our own definition of value.

      Delete
  9. We are again in the realm of mountainous striving producing: a mouse. Small effects of questionable significance.

    The place to go after this would be, *what*, if anything are the 'golden ears' hearing? Measurements to the rescue!

    ReplyDelete
  10. And it should be a caution to purveyors (which Archimago is not!) of the usual 'night and day/veils lifted/music bloomed/even my spouse could' claims that infest audiophilia high and low.
    That's really the assertion these results should be compared to.

    ReplyDelete
    Replies
    1. Yes, agree Steven.

      These results, while showing a difference, even to the point of significance especially with headphone listeners, is reflective of the observational power we can achieve if we collectively recruit the ears & brains of at least tens of audiophiles.

      While among the group there could be some "Golden Ears", it would be a fallacy to assume that there are many. Descriptions of "veils lifted", "night & day" differences, "obvious", "hard to miss" changes would not be phrases I would use to describe the audible effect without specific reference to individuals, other evidence, or with adequate context.

      Delete
    2. The 'significant' results are not highly powered; and though I know you wouldn't make extravagent claims of audible DAC difference, certainly many others in this hobby (if not many in your subject pool -- I can't say) very commonly do. But assuming the significant results here mean there indeed was an audible difference -- isn't the next step to determine what caused it?

      Delete
    3. Hi Steven,
      Indeed it would be nice to take the next step and grab a few of the Golden Ears to explore in detail what characteristic(s) specifically they are responding to as sounding "better". It'll be a hard job I think because the margins of audibility is small and one that needs to be done in a lab with controlled conditions. Good luck even getting 100 listeners to participate in a high-powered blind test! (For example, even the blinded listening test at McGill for MQA previously summarized employed 30 individuals.)

      Trained listeners, younger folks, etc. could help but then if we did just that, it could be hard to generalize the listening test to audiophile hobbyists for whom these "high-end" products are being targeted for. I would not be surprised if the results here is all we see for the next number of years when it comes to blind listening; there's just not much motivation out there to explore such things even if companies make all kinds of claims of audibility with $$$$ gear!

      I doubt that many in the hobby will use this test to make extraordinary claims. The folks who make extravagant claims are usually Industry salesmen like the magazine reviewers or "infomercial" guys on YouTube to push products. If they tried and actually referenced these results back here, they'd be opening up their readers/viewers to this blog and all that this blog stands for which at its heart is an assault on the myriad snake oil, "high-end" products of questionable value, and I believe a desire for companies to keep audiophiles ignorant of objective measurements and skepticism towards the cultishness of some of these companies.

      Delete
    4. I was thinking more of measurement of the DACs to see if there is an measured difference that could plausibly be an audible one.

      Delete
  11. Great stuff Arch! Thanks so much for all your effort. My favourite result is the one suggesting that the more expensive systems did not help discern more reliably between the DACs. I've spent plenty on my system, but I care most about how I enjoy music on it; I'm not to fussed about whether I can tell one DAC from another. But I've always known it was Golden Ear B.S. that you "have to have a super resolving system" to hear all the wonders they purport to hear in the tweak stuff - cables, cable lifters, footers, AC cables etc. It's nonsense because the level of differences they claim - obviously better dynamics, more extended bass, everything but the kitchen sink - they type of differences they describe is something more akin to a re-mastering of a track, and the differences would be easily audible on even modest equipment. In fact I like to point out: if you go on amazon in the audio cables section, you'll find countless reviews touting obvious sonic differences among those with relatively cheap gear. The implication for the Golden Ear is either: 1. You don't REALLY need an ultra resolving system to hear cable differences OR...2. people easily imagine sonic differences. Either one undercuts their claims.

    ReplyDelete
    Replies
    1. Thanks for the note Vaal,
      I was honestly expecting a completely negative result since I saw a number of people posting on forums that they could not hear any difference over the weeks of data collection. The numbers clearly did not look random at all once I dug under the surface and started enumerating what the headphone listeners were saying!

      Yes, the finding that expensive sound systems did not help differentiate the devices (in terms of objective performance or even price) I think is an important one to contradict the claim of certain "high-end" guys who confound ultra-high-prices with high quality sound. It's really just another way to remind audiophiles that there is such a thing as diminishing returns.

      Heck with some components there might even be a Gaussian (bell-shaped) pattern where some of the most expensive products (like maybe exotic tube gear?) might not be rated "best" sounding when put to a blind audition compared to something priced more reasonably.

      As for tweak stuff like the cables, lifters, footers, etc... There never has been a good blind test demonstrating benefits for those things in general. Alas, I believe much of the "received wisdom" among audiophiles have clearly been built on exaggerations, misperceptions, and sometimes just plain lies.

      Delete
  12. If you repeat or do another round of this test and are committed to a ranked preference analysis rather than (say) sorting, you might consider having duplicates as a control. The stats are not strong, so you want some QC check as validation.

    ReplyDelete