Saturday 25 May 2019

BLIND TEST Results Part 4: "Do digital audio players sound different playing 16/44.1 music?" - Subjective Comments. Final thoughts on blind testing and critiques.


As we close off discussions and posts around the Internet Blind Test of devices playing 16/44.1 music, I want to publish some of the subjective comments from respondents who undertook this test... Impressions in the respondents' own words about the test when they submitted their results to me.

Remember that these are subjective. Human perception, especially when differences are at the margins of our perception are of course tough to describe. And when we compound that with the limited utility of words to describe ephemeral experiences (even with codifying the terminology as was attempted years back), it's no surprise that meaning can often only be conveyed as impressions. It's great to see the respondents trying their best and in many instances, I certainly appreciate the impressive use of language to express the experiences. Let's have some fun with these!

I think the best way to do this is to present the comments organized from those who thought the difference was "huge" or "big", down to those who ultimately thought there was no audible difference. Remember that while objectively we can say something about which device measured better and I'm happy to say "well done" to those who appear to have "golden ears", there is ultimately no absolute "right" or "wrong" to the preferences we make. I'll focus on the longer comments, especially ones that described the perception of specific devices, and add some of my thoughts where applicable.

I. Those who thought the difference was "huge" or "big":

"If I was to listen to these devices on their own, without the ability to make blind comparisons, I’d be happy to own D or C. I’d pick up the problems with A and B, even in isolation.
General comments: The language in my subjective descriptions below must be taken as relative, not absolute! However, I think your recordings or methodology may have issues. Comparison with Original 16/44 sources: I sourced original CD copies of these tracks, but I did not listen to them until after evaluating your 24/96 tracks. I turned R128 loudness compensation turned on (after evaluating your tracks) because the originals were between 3-5dB louder. I heard specific improvements in the original Handel and Maxi Priest tracks versus your recordings: 
Handel: better stereo spread, much better individual voice definition, more solid continuo. 
Wild World: better mono/center focus on vocals (lead and backing), bass, drums. The other two tracks just seemed clearer overall. I didn’t scrutinize them very hard. Since the only thing common to the 16 recordings is your ADC, this would indicate that the crosstalk and noise in the ADC, and perhaps frequency imbalance between channels, hurt the recordings. Anyways, I’ll leave this as an open issue until you release the actual tracks you used. It’s possible the ones I sourced are different masterings. 
Subjective Differences:  
Salvant: There is a very low level single squeak at 0:47 just before a note is struck on the piano. This is audible in D only. There is what appears to be noise from the seat of a chair around 0:20-0:30 that is clearly audible on C/D, but only barely on A/B. Starting at 0:50, the singing of “le mal de vivre” is muted on all but D. This line is repeated starting at 0:57, where it shows clipping distortion on A/B but not on C/D. A seems quieter and less weighty, perhaps with a response dip in the low mids. Vocal sibilants are very sloppy, turning "s" into “shhh” sounds, very prominent at 1:38-. B is the noisiest, same sibilance problem, and with distorted vocal transients too. The sibilance on C is very clean but is somewhat separated from the lower vocal harmonics. I'm pretty sensitive to this common phenomenon. D has the longest reverb tail in the first phrase of the track, seeming like one extra reverberant bounce is audible versus the other devices. C is very clean, but D is more realistic and gets my vote for best. During the lead-in, D seems to have the highest noise floor, but this is not apparent elsewhere on the track. For better or worse, my initial ranking was derived from this track didn't change as I listened to the other tracks which spotlighted other differences. 
Handel: D has the best focus on the voices and violins, and the strongest continuo. C is close, A/B have very hashy highs in the voices. (Note that the original 16/44 has much better delineation of individual voices and groups in the choir. To be confirmed...)
Wild World: After the intro (after 0:26): C has best, solid bass line, most depth to the vocal, but it is somewhat hashy or grainy. D has the best vocal, with good depth but excellent clarity. A’s lead and backup vocals buzz! 
Satriani: No difference in the main components of the music: the kick drum and the lead guitar. The clapping on the backbeat, once the guitar comes in, sounds like clapping on B/C/D, and sounds like noise on A. The crowd chanting is clean on C/D, not so on A/B. 
Thanks!" 
Thank you! That's an amazing response with very detailed notes for folks to check out and listen to. Regarding the ADC, I actually would not blame it for any coloration heard given that in my testing and comparisons with other ADC's I have, the RME is superior to others for transparency I have used (Creative, Tascam, Focusrite). The "coloration" aspect will require more work around showing comparisons which I don't have time for this week. However, I can prove that there is no issue with channel imbalance with the RME DAC and the settings I used with the aid of the basic Dynamic Range Meter, analyzing the "Handel Messiah" track for example (the most dynamic and IMO complex of the pieces, preferred by the musician respondents):

Original 16/44.1 CD music:

Device C (Oppo UDP-205):

Device A (ASRock Motherboard):
Notice that the difference between peak and average amplitude for the original CD rip and the highly accurate recording of the Oppo UDP-205 showed an exact left-right difference! This is a nice demonstration of just how accurate the Oppo is as a DAC and the precision of the RME ADC! In comparison with the "least accurate" device in this blind test, the ASRock, playback was not able to maintain the same amplitude differences. There is a 0.12-0.18dB channel imbalance with the right channel slightly louder than the left. Even though peak and amplitude volumes are exact with the original and Oppo playback, remember that the 24/96 recording is not obviously going to be "bitperfect" since it will capture slight differences such as the result of digital filtering so some variation will be found as demonstrated by the DR value ("crest factor") between devices; specifically the Oppo's use of minimum phase filtering will have an effect here.

What this means is that if you as a listener / tester hear a significant channel imbalance, the imbalance is a result of the playback Device (like the ASRock motherboard) or there might be an imbalance in your playback system that you're listening with, not a fault of the RME ADC.

Another thing to keep in mind is that playback at 24/96 of the ADC recordings might sound different from a straight 16/44 playback on one's DAC due to your playback machine's characteristics. The 24/96 recordings will have captured much of the digital filter characteristics of the test devices (up to 48kHz or so) whereas a 16/44 playback of the source would be using your DAC's digital filter (or even no digital filtering if you're using a NOS DAC).

----------
"D - sharp, detailed. I liked it more, but I put it below."
----------
"Device A sounds plastic and not clear. Devices B and D give a feeling of constant tinnitus in the background. In addition D sounds rough. Device C is cleanest and doesn't put immediate annoyances to front." 
Interesting descriptions.  Well done with selecting Oppo > iPhone > Sony SACD > ASRock motherboard in your response.
----------
"Device A (#4)
Prominent treble and bass (i.e., V-shaped)
Strikes me as very “bright”
Would be uncomfortable to listen to over time

Device B (#3)
More laid back (i.e., less treble) than A
“Sweet” / warm sound
Better balance overall than A
Loud parts got a bit splatty, though

Device C (#2)
Less sweet than B (i.e., more neutral) - balance between treble / mid / bass about the same as Device B
Less “splatty” than Device B on loud parts
B and C very similar otherwise

Device D (#1)
Immediately noticed more “spacious” sounding than any of the others
Very precise positioning of various voices / instruments
Neutral balance
Loud parts well controlled
Overall the most detailed and pleasant to listen to"
Nice descriptions... Great to have these impression notes for others to see if they can corroborate!
----------
"First of all, I believe there is a problem with Device B - something is wrong with imaging - is the only track where music comes far outside left/right from the speakers, + too much left image.   Phase error?
Device A has difficulties with placing of the instruments, most in the center and difficult to place the voices - mostly noticeable on 'For unto us a child is born'
Device C is very controlled but sounds the most digital (sometimes slightly harsh). Also limited in soundstage depth.
Device D fullest sound, nice depth, especially more palpable voice from Cécile McLorin Salvant.

The difference between (Device) D & C was bigger with source 2 (Roon without filter, i.e. DAC filter is active) , than with Source 1 (DSDIn HQplayer, 256 HQplayer poly-sinc-xtr-lp).
 
For me the difference between the filters is far more noticable, than upsampling rate, ...
As I am not capable of hearing anymore >10kHz , there cannot be a direct link what a filter is doing in the frequency area of 20kHz and up (maybe an indirect impact)
So I still do not quite understand how filters really work (keeping in mind that I normally use the same dithering as well)
So, would be interesting to see a test/blog on this subject.
(with HQplayer, I can convinced that with music I know well, I can pass a blind test between poly-sic-xtr, poly-sinc-ext2 and ClosedForm filters)"
Interesting comment... Not sure if I hear that much difference with Device B. Remember that the iPhone 6 like all Apple products I have tested recently use relatively steep minimum phase filters. Not sure if that has anything to do with it though. (Remember that back in 2015, we tested minimum vs. linear phase filters here with the same steepness and I could not find significance for preference... Might be a highly individual thing.)

----------
"Between the best and worst device I could hear a difference in a variety of things such as overall sound, detail of the voice and background, timbre of the singer, the dynamic range- it felt more powerful in the best device, more spacious but also more realistic- felt as if the singer was in front of me. 
For the best and 2nd best there were small but obvious differences: the best device felt more 'real'as if I was hearing it live- all the detail of the recording  was faithfully reproduced but felt less in the 2nd best device. Another major difference was the sound felt far more powerful in the best device whereas the 2nd best had a slight less power to it."

Thanks man. Looks like you selected the Sony SACD player as your favourite.

----------
"Best was full bodied last place was not liked at all. 2nd place had crisp sound but not as full as the best. This is my partners system."
Thank you! You were the only woman in the blind test and you liked the iPhone > Oppo > Sony > motherboard.

----------
"I came to this test with the strong bias that I probably would not be able to differentiate and yet... C was consistently the one I preferred. 
The difference is not dramatic, but everything sounded cleaner/more defined on C I would definitely pick C over A and D at all times.  
Personal experience includes 9 pairs of speakers in the house, 4-5 amplification combos and probably a dozen of sources (from Pi + Khadas Tone Board to Linn Akurate DSM). Not a high-end fetishist, some combos definitely sound better than others when mixing and matching, but price doesn't always win by far. If I have that much gear it is because I am a compulsive buyer, not a pilgrim in search of audio nirvana. And not a golden ear either, can't reliably (if at all) detect hi-res from cd quality. 
For all I know, C could be cleaned up/processed and less "authentic", but that is the one I prefer ;) 
thx for the blog!"
Well, surprise surprise! You, sir, obviously picked the Oppo and I suspect you're more of a "golden ear" than you thought :-). Be careful with compulsive buying though!

----------
Device A sounded weird for lack of a better word on my mostly Martin Logan system. Like there was a halo around the performers. Not as obvious on the the Squeezebox monitor system, but still there. I thought Device D was better, but it was kind of "soft" sounding to me for lack of a better word. Not quite as detailed as B or C. I thought B&C were pretty close to each other. Definitely a step above A and D. If I had to pick one of those two it would be C. It just seemed to have the detail of B, but without a very slight brightness that seemed to show up sometimes in B
You clearly did well. I see you selected Oppo > iPhone > Sony > ASRock motherboard. Nicely done.

----------
"I have compared to the original 16/44 tracks that you used for the comparison and I must say that all four sound better than your converted samples! Therefore it was not as easy as with a good source to evaluate a ranking."
Right, remember we have to be careful about comparing with the original source recordings as there have been significant changes to the levels made. The comments above with the first respondent applies here. Plus of course I'm recording off a motherboard in one of them :-).

Just to give you an idea of how much change, between the original CD rip of "Handel Messiah" and the Oppo 24/96 recording, in order to equalize volume to an average of -18LUFS target using EBU R128, the original rip would have needed -0.92dB reduction, while the RME ADC recorded version of the Oppo needed +2.3dB gain. That's a pretty significant volume difference of >3dB which if not properly compensated will typically bias against the ADC recorded 24/96 version which is of lower amplitude.

----------
"Wild World - A sounded like was playing in a cave, very small sound stage; D had some weird sibilance, noticable with the bell; B/C sounded similar, good separation with C being the best. Le Mal de Vivre - A sounded a bit nasal; D again had some weird sibilance; B/C sounded very close, with C sounding "more present / life like". For Unto Us A Child Is Born - track content was a too busy to distinguish between A-D.Crowd Chant - I did not like this track so did not use it."
Great job with the listening! Oppo > iPhone > Sony > ASRock motherboard.

II. Those who thought the difference was SMALL, but worth spending some money to upgrade the sound.

"I noticed very SLIGHT differences in tone and dynamics. Overall the differences were extremely small and if one of the players cost say $100 at the low end and a few thousand at the high end the tiny performance jump wouldn't be worth that sort of a price increase now if the "cheap" one is $100 and the "expensive" one is say $250 or so then it may be worth it. But honestly it wouldn't surprise me if my "favorite" pick was the least expensive option. Anyway great poll and test had a lot of fun doing it."
Not bad. You picked the iPhone and Oppo as #1 and #2 and were able to list the motherboard as "worst". I like that you've incorporated the very important concept of dollar value into the listening impressions.

----------
"From these tester songs and past experience, I think a higher sample rate mainly benefits highs, but the clarity overall, to me, seems noticeable. Also thank you for introducing me to La Mal de Vivre, beautiful. Please validate my purchases."
Hmmm, not sure if the results validated your purchases!? But I see you did pick the Oppo as best and the motherboard in 3rd place :-).

----------
"The bottom end on Maxi Priest was the best differentiator, followed by the sibilance on Cecile."
I see you preferred the Sony CD player > iPhone > Oppo > Motherboard based on listening to the bass and sibilance impressions.
----------
"I started with listening for less than 5 seconds to the first track of each device. That first impression told me I could not hear any differences and it would be very hard to hear any differences. All 4 devices sounded good. 
After listening to all songs on all devices, I noticed that with device B I was sometimes distracted and least involved with the music.
With device A and D I listened to music. The difference between A and D are zero and had to choose which one came first.
Device C sounded the best. But the differences are very subtle. Small details where easier to hear and to follow. Also details where more alive and had more speed/presence. Device C got me the most involved with the music.

It is always a pleasure to read blogs and I learned a lot, Thank you,
Arjan V******g The Netherlands"

Great work from The Netherlands, Arjan! Clearly you liked C (Oppo) best.

----------
"Soundstage varied slightly, instruments seems placed slightly varied positions,  some voices and instruments weren't well separated in D, very similar in ABC."
----------
"Strongly disliked A. Sounded very shouty and thick. Loved B, which was liquid and deep. C and D were closer. C seemed less resolving than B, and at first I was sure D was better, more lifelike. Then on another listen C didn't seem that deficient. That's when I quit as trying to rank the two of them wasn't much fun."
Yeah, like I said... This ain't easy :-). The "strong dislike" of A in this comment and among others is what ultimately made the results significant in pushing the ASRock motherboard as the "worst" sounding.

----------
"C seemed to me to be the most natural and well articulated, especially on vocals (solo and choral).  It seemed to define the sound stage a bit more fully,  e.g. the tenor & bass (male) voices in Child seemed more clearly behind the sopranos (females) and the oboe line was both more distinctly separate from the voices and perhaps a hair more realistically "reedy" without being hash. 
I consistently found C to be more realistic and rich than A (and, to a slightly lesser extent, B). The lower fundamentals from the piano in Le Mal de Vivre seemed clearer, a bit more distinct, and less "bundled" within the harmonic structure, which was also the case for D vs A.  My proven impression on initial listening was that the highs were less distinct and maybe even less extended on A than the others, with B also below C and D in this regard."
I think ESS Technologies and Oppo very much would be pleased by this listener response :-). I see you selected Oppo > Sony > iPhone > motherboard. Good job.

----------
"I want to note that while I hear (or think I hear) small differences between the samples, I cannot point which one is the "correct" sound. Ask you raised the question, I can order them in subjective preference. However, if I had a reference (e.g. original file) I would prefer the sound that's closest to it - no need to sugar-coat audio. We're chasing "high fidelity" after all, right? 
I also want to add that I did not attempt to ABX the samples to confirm that I am hearing the difference, so take this with a grain of delicious, subjective salt. 
Impressions (I'm overstating the differences for the sake of comparison. I found the overall differences fairly small):  - I consistently found this to sound the "harshest" and to lose clarity in the dynamic parts, compared to the other samples. It's my least preferred sound of the four. B - I found this to sound very close to D. Compared to A and C I found it more "spacious" and "reverby" - this could mean more of the ambience in the recordings came though, or that the transients were not as clean - not sure. I found the imaging and dynamics better than A. C - I found this to sound the "cleanest", maybe what one could call "dry". Bass seemed the tightest, transients the clearest of the bunch. My gut feeling is this file added the least amount of coloration to the original sound - although I can't be sure of it without a reference. D - Again, sounds very close to B, maybe having the tiniest bit more definition.Additional comments:- I heard the least amount of difference on "Le Mal de Vivre"- I had the chance to listen to Joe Satriani live last year. Definitely prefer the sound quality of the recordings compared to the live sound.- I was only aware of Mr Big's version of "Wild World". This one has a great sound as well!"
Another wonderful comment and excellent observations! Yes, it's hard to know what is "correct" isn't it? But yet in reality we make adjudications all the time despite none of us really knowing what the "actual" sound was like in the studio/live (unless we attended the concert, but even then, it all depends on where the mics were placed). We "guess" at what is "correct" every time we go into a dealership and listen to a device we might be interested in. Since most audiophiles do not run objective tests, most would not have the benefit of instrumentation to tell them if their devices actually perform with low noise or can confirm low distortion. For this test, Device C (Oppo) is objectively the "most correct" based on measured fidelity. If you pull the 24/96 recording up in an audio editor, you'll be able to confirm that it has the cleanest ultrasonic profile on account of the high quality digital filter. And if you compare the FFT with Device A (ASRock motherboard), you'll also see that Device A in quiet passages will have the 60Hz hum seeping through and has high low-frequency noise.

Despite the uncertainty, it looks like you picked the Oppo still as the "best". Oppo > Sony > iPhone > Motherboard. Well done.

----------
"A - had a small treble spike or something that made it sound exciting, but tiring.  Great imaging on "Wild World" though B - The vocals seemed a little thin C - Best vocals, did not seem fatiguing D - Seemed too smooth sometimes, very mellow and neutral
All in all, the differences were very small, and perhaps imaginary in some cases. Device C would probably be my favorite, though I would avoid that Crowd Chant thing. ;)"
Another vote for ESS Tech and Oppo. I swear I'm not picking out the Oppo comments! These are just the folks leaving their detailed impressions...

----------
"The only noticeable difference I hear, is in the high frequency details of the / 'S'-s, 'SH'-s /, otherwise not much of the difference between the 4 devices."
 Oppo > Sony > Motherboard > iPhone.

----------
"In 'For Unto Us A Child Is Born' there seems to be a better location and separation of the different voices and words are more clearly pronounced with device B than with the others. The differences between C and D are very small, but the bass seems more precise on C than on D. Device A seems a little flat or unengaging? 'Le Mal de Vivre' is a beautiful recording and all 4 devices sound good when you enjoy the music. Only the piano and 'space' might be better with device B. There are small differences regarding the voice character, but hard to say what is most correct. 
In general differences between devices seem small. If the test should be an even better help to choose between devices I would prefer more examples of well recorded classical music and acoustic jazz. 
All together this test is very interesting and I'm looking forward to see how much I've been fooled. Thank you for this initiative and your other great work! 
Best regards Peter, Copenhagen Area, Denmark"
Thanks for the comments and feedback, Peter of Denmark! You selected iPhone > Oppo > Sony > Motherboard.

----------
"'A' seemed more clearly differentiated from the rest than they were from each other.  "A" was a bit veiled overall, with less clear separation among instruments & voices. "C" may have been a hair more rich and lifelike than "B" or "D", but "A" was more different from the rest than any of these distinctions."
Nicely done. Another example of why A (motherboard) scored poorly overall. Oppo > iPhone > Sony > motherboard.


III. Those who thought the difference was very small and not worth spending money for an upgrade...

"I can hear very very faint hints of differences but I can not decide which is better as I have not heard the music live...
"A" feels everything is on the same level...
"B" is similar but has slight details but lower authority...
"C" has everything on a different level...
"D" less fatiguing for me...
For me D ~= A > B ~= C"
An interesting example where the listener put the Sony and Oppo on different ends of the spectrum. What's interesting here is that B and C (iPhone and Oppo) are minimum phase filter devices (the Oppo defaults to minimum phase but can be easily switched to linear phase), and it seems this listener has a preference towards the two linear phase devices.

Respondent, you might want to check out the Minimum Phase vs. Linear Phase blind test from 2015 and confirm if you might be sensitive to the digital filter phase setting.

----------
"B Device is faster, better timing, better pace, base tighter, more air C Device nearly same pace as B but less air A Device is same as B but slower D Device is less musical of them but pace is same as B Device

Keep the good work Archimago. The industry needs more person like you to debunk the myths. Wish you good health."
Thanks man! Appreciate the feedback and perceptions. I definitely think the press needs more writers who can be critical and debunk much of the silliness out there. Otherwise it's really up to the audiophiles, the "grassroots" participants in the hobby to find truth and balance ourselves. Unless things change (I hope it can!), I think this is likely the foreseeable default trajectory.

----------
"I set up a play list with all 16 tracks so I could easily skip around via Roon remote.  I listened on 3-4 occasions also employing my 12yr old daughter and wife.  At times I felt I could hear a subtle difference yet it was not consistent and not something I could articulate.  My daughter and wife thought they all sounded the same (but I can't say they were totally invested in the experiment).  For all intents and purposes they were identical.  If my life depended on it I would say device B seemed to be the most different but I have about 5% confidence in that.   Certainly my equipment is not fancy but I think it gives pretty good bang for the buck and almost certainly outperforms any mass market consumer brand systems."
Thanks man... Based on writers in magazine articles, I always thought that the wives had golden ears and could casually just hear cable changes from the kitchen! And the kids probably can tell hi-res from CD quality upstairs in their bedroom playing Minecraft!

You've confirmed for me that wives and kids really are not much better that the "man of the house" :-).

----------
"Here are a couple of ABX tests along with some of my comments. Bottom line while I could tell some differences between the devices, but without concentration and over speakers, it is unlikely I could tell any difference with casual listening over speakers and probably headphones without ABX'ing the tracks. My sorted preference above is just a guess :-)  While I like all of the tracks, I went with the Satarini track as it is the music I am most familiar with, has high frequency transients over a continuous repetitive sound, which is a good cue when switching ABX tracks. I wish I had more time as I would have done all the ABX permutations and then identified the devices for which ones were a bit brighter or the stereo image was a bit different.  In the end though, they are so close in sound that all 4 devices sound virtually identical to my ears, especially if I took the ABX tests out of the loop, it is unlikely that I could tell the difference between any of them. 
foobar2000 v1.4.2
2019-02-02 22:25:47
File A: A - Crowd Chant.flac
SHA1: f885cd36956113871f7deda8c237d7181e7e6a3c
File B: B - Crowd Chant.flac
SHA1: 244c56e96d415ea2c6dc16e276df27dc8ec80429
Output:
ASIO : ASIO Lynx Hilo USB
Crossfading: NO
22:25:47 : Test started.
22:28:16 : 01/01
22:28:58 : 02/02
22:30:01 : 03/03
22:30:15 : 04/04
22:30:32 : 04/05
22:31:41 : 04/06
22:32:05 : 04/07
22:32:45 : 05/08
22:33:03 : 06/09
22:33:43 : 07/10
22:34:06 : 08/11
22:34:36 : 08/12
22:35:13 : 09/13
22:35:31 : 09/14
22:35:55 : 09/15
22:36:29 : 10/16
22:36:29 : Test finished.
 ----------
Total: 10/16
Probability that you were guessing: 22.7%
 -- signature --
43d9aebf5c55fd2591e173d578e09e4d46d34491
 
My notes. I could not tell you which one, but the hi-hat seems louder or slightly different tone than the other. Also. one had a better phantom center and the other had a wider presentation or a different phantom center. Filtering differences I presume. It is the slightest difference too. I listed at my regular volume, which is not that loud, and took me over 10 minutes. On some it took a lot of switching back and forth. On others, sometimes I catch it on the very first switch.

foobar2000 v1.4.2
2019-02-02 22:40:33
File A: C - Crowd Chant.flac
SHA1: 4857ee039a99868dc33a1132cc9129dcde52c264
File B: D - Crowd Chant.flac
SHA1: 043bbceefd3e583b6b4067b3dac9b887a575e403
Output:
ASIO : ASIO Lynx Hilo USB
Crossfading: NO
22:40:33 : Test started.
22:41:11 : 01/01
22:41:50 : 01/02
22:42:25 : 01/03
22:42:49 : 02/04
22:43:07 : 03/05
22:43:36 : 04/06
22:44:04 : 05/07
22:44:24 : 06/08
22:44:51 : 07/09
22:45:02 : 07/10
22:45:22 : 08/11
22:45:42 : 09/12
22:46:06 : 10/13
22:46:19 : 10/14
22:46:49 : 11/15
22:47:03 : 12/16
22:47:03 : Test finished.
 ----------
Total: 12/16
Probability that you were guessing: 3.8%
 -- signature --
093e324f9fc5901884fbdeecb9eada9f7f675d7d
 
My notes. Again, one sounded a bit brighter than the other or a wider sound stage or both. Could not tell you which one, but there is an audible difference to my ears and took 7 minutes compared to the other one. My ears got tired towards the end as it after the A versus B test. I suspect that any differences are due to the different filters frequency and phase response. Ever so slight and you need to know what to listen for the hear a just noticeable difference.  Like I say, take away my ABX testing and I would not be able to tell the difference.
Nicely done!
Cheers, Mitch"
Thanks Mitch, now that's dedication; I like how you roll. ABX testing to statistically show that there was only a low probability of "guessing" - now that's serious :-). Like you noted, this is with a good amount of concentration on the sound (rather than just enjoying the music). In "normal" listening, I agree that the sonic difference is far from a "slam dunk" in audibility.

----------
"D - sounded most detailed and open A -  produced the most "warm" sound - great midrange B - was as warm sounding as A but otherwise highs and lows were boring C - it was the most boring and a bit unfocused but as I stated earlier the differences were minor" 
----------
"I recognise that at my age I should expect a deterioration in my hearing. However after a hearing test 6 months ago I was shown that my hearing range was far higher for my age - which was very pleasing. Last year I introduced the high resolution Benchmark Line Amplifier into my system and this made a big difference in subjectively hearing a difference between the USB and SPDIF feeds. Maybe it's a question of timing versus noise anyway the Auralic Aries (WiFi input) sounds better (more drive, pace, finer detail) than the Linn (LAN input). 
In your blind equipment test I really struggled to discern any real difference after some considerable time of evaluating."
Congrats on the good results of the hearing test 6 months ago! Thanks for taking the time and giving this a spin :-).


----------
"I only really used 2 tracks, Wild World primarily and a cross check with Le Mal de Vivre. None were really my taste. Would have loved something electronic! 
Anyway I was torn between saying all the same or subtle, but I did feel there were subtleties but again the tracks were unfamiliar and repeating - playing in different order abcd dcba a d random I could really be sure. Ultimately I needed longer then the hour I had but certainly I was trying to find something rather than it be obvious. 
My main notes hotted down were:b and c very similar but both better than a which seemed slightly ‘flatter’d seemed similar to b and c but with more bass detail. 
Again, that was on one round, in later ordering I was less sure other than a sounding slightly worse. (Less defined voices)
great test - it’s a lot of effort for you to setup so thanks - and fiddly to listen to with a 2 1/2 year old, but looking forward to results."
Nice work, you were able to consistently pick out "A" (motherboard) being worse... Have fun with the 2.5 year old :-). Kids grow up fast! Make sure to take lots of photos and videos.

----------
"Overall extremely similar sounding. Device C seemed to have less resolution but I kept wondering whether it was just a touch softer. In some cases I thought I heard more detail from 1-2 devices but it sounded a little harsh, so depending on the music I preferred one device over another. I mostly listened in blind "shootout" mode to keep me honest and found the test very difficult."
Yeah, not easy. Here's a comment that ESS Tech and Oppo should not use as an endorsement. :-)

----------
"Device A definitely sounded slightly flatter, less involving than the others. Better separation of instruments, vocals on B, C and D - but very little difference between those three. Definite feeling of more 'air' around voice in 'Le Mal de Vivre' in these three devices as well, possibly device D being the best in this respect. Soundstaging very similar across all devices. Have not had the chance to listen on my main system yet - Roon/Mac Mini via Schiit Eitr into Naim Supernait driving Leema Xone speakers."

Sony > iPhone > Oppo > Motherboard.
----------
"'B' was the only one that I think I could hear a difference. A, C and D very similar. Compared to the others, I found B had a more open, airy sound with deeper bass. At times A, C and D all sounded (very slightly) compressed, constrained and distorted."
Interesting vote for the iPhone 6.

----------
"I have listened to all devices/tracks through speakers and headphones as well. I could not hear significant differences between the devices by my 60 years old ears. I can not say definitely, one of four devices sounds better/worse than the others.  :-(                                                     
Tip for testing (for you) - there are specialized  racks (some of them pretty expensive) for hifi equipment. Do you think, they may have any serious influence on the sound quality, specially from digital sources?"
Thanks for the input. I have not read any good articles to suggest that the expensive, special hi-fi racks will do anything for digital equipment.

We obviously want good looking stands and sturdy racks, but not necessarily because this would affect the electrical properties of things like DACs... Unless of course one has vacuum tubes in the digital device which could be microphonic and "vibration control" could have an effect. In this regard, I can also appreciate the importance of vibration isolation for things like turntables (as you can see, I put mine on the floor!).

----------
"Device B sounded musically more realistic and alive, but the difference was just barely noticeable.  My wife was of the same opinion.  I am not sure the distinction would survive a double blind test, but we did not try that."
More iPhone 6 lovers :-). Again, looks like the wife's ears also did not show any magical abilities unlike reports elsewhere!


IV. Those who thought there was no audible difference between device recordings.

"Listened to all tracks with my wife, and we could not hear any audible difference."
Yet again! The wife has let the audiophile down. :-(

----------
"More umff in C but its probably wrong."
Well well well... Looks like you were perhaps "right" in picking out the Oppo! :-)

----------
"My opinion....mmmm ...Didn't attempt to listen to my main system with SoundLab speakers as I no longer believe that difference will be audible once you match the output. Subjectively, I liked B but only got 8/16 on a quick blindtest. Maybe because it slightly brighter or louder."
The iPhone 6 strikes again!

----------
"I HAVE BEEN INVOLVED IN MANY DAC AND CD COMPARISONS OVER THE YEARS AND IT ALWAYS AMAZES ME HOW SMALL, IF ANY, THE DIFFERENCES ACTUALLY ARE. IT IS CRITICALLY IMPORTANT, OF COURSE, TO MATCH VOLUME LEVELS AND I BELIEVE YOU HAVE DONE AN EXCELLENT JOB IN THIS. I ONLY USED THE MESSIAH AND CECILE FOR EVALUATION AS I HAVE THIS MUSIC IN MY COLLECTION AND KNOW THEM WELL. INCIDENTALLY, THE SOUND QUALITY OF THESE TRACKS PLAYED FROM MY NAS THROUGH JRIVER MC IS FAR SUPERIOR TO THE SOUND FROM YOUR DOWNLOAD. THERE DOES APPEAR TO BE A SIGNIFICANT LOSS OF RESOLUTION IN YOUR DOWNLOAD. ANYWAY, THANKS FOR TAKING THE TIME TO DO THIS; IT HAS BEEN FUN."
Thanks for the note! I've included the original digital files for anyone to download and listen in Part 1. It's also possible that there are different masterings of the music... You're right about the importance of volume leveling (see note above). Remember that there should be a loss of resolution in some of these devices, especially the Device A recordings from the computer motherboard! (Remember that the respondents had no idea just how "poor" some of the devices I was recording from were.)

----------
"Sorry, but I could find no noticeable difference, not with the ATMOS setup nor with a direct analog stereo from my DAC on the 1st setup. Sure not with expensive or mid-priced headphones and not on my entry price level 2nd setup. Did you really present different recordings or did you trick us to find differences where are no difference at all?
I am able and could prove that I can tell the difference of a 256 kbps MP3 from 320 kbps MP3 in 9 out of 10 cases. I am able to [hear] the difference between a 320 kbps OggVorbis (Spotify) from a MQA on Tidal in 8 out of 10 cases. I am not able to find differences between 320 kbps OggVorbis and CD or Tidal Hifi in a statistical relevant manner."
As you can see, no trickery employed in this test! :-)

There is a case to be made that the psychoacoustic bitrate reduction of MP3 would results in even more distortion than these hi-res device recordings. Also, the psychoacoustic algorithm while good, might not apply transparently to everyone with some being more sensitive...

----------
"Interesting test! Don't think my hearing is bad at all, but I've always said that the difference between DACs are so small nowadays that it doesn't matter. Hope this blind test can prove this once and for all ;)"
I think you'd be happy with the results. Obviously the test was not easy and other than the relatively poor output quality from the motherboard, the others (iPhone 6, Oppo UDP-205, Sony SACD/CD player) were certainly difficult to differentiate! IMO, there is a "threshold" of accuracy in reproduction beyond which noticeable differences are hard to tease out.

----------
"Quantitatively and qualitatively I found negligible difference across the test tracks. I thought I could hear differences but these went as soon as they came as I continues to switched between tracks using the ABX software - results copied in below for reference. I will be interesting the see the results, for pride and price.  p.s. "le mal de vivre" sounded wonderful on the LX system.

Acer Aspire 5738 output to Behringer UCA222 using Lacinato ABX shootout
 
Etymotic HF5 custom fit
1 time (20.00%) C - Crowd Chant.flac)
0 times (0.00%) B - Crowd Chant.flac)
3 times (60.00%)D - Crowd Chant.flac)
1 time (20.00%)A - Crowd Chant.flac)
------------
1 time (20.00%)D - For Unto Us A Child Is Born.flac)
2 times (40.00%)B - For Unto Us A Child Is Born.flac)
2 times (40.00%)A - For Unto Us A Child Is Born.flac)
0 times (0.00%) C - For Unto Us A Child Is Born.flac)
-----------------
1 time (20.00%)D - Le Mal de Vivre.flac)
3 times (60.00%)C - Le Mal de Vivre.flac)
0 times (0.00%) A - Le Mal de Vivre.flac)
1 time (20.00%)B - Le Mal de Vivre.flac)
-------------
1 time (20.00%)A - Wild World.flac)
0 times (0.00%) B - Wild World.flac)
2 times (40.00%)C - Wild World.flac)
2 times (40.00%)D - Wild World.flac)
 
Sennheiser HD380pro
2 times (40.00%)A - Crowd Chant.flac)
2 times (40.00%)D - Crowd Chant.flac)
1 time (20.00%)C - Crowd Chant.flac)
0 times (0.00%)B - Crowd Chant.flac)
----------
2 times (40.00%)D - For Unto Us A Child Is Born.flac)
1 time (20.00%)C - For Unto Us A Child Is Born.flac)
1 time (20.00%)B - For Unto Us A Child Is Born.flac)
1 time (20.00%)A - For Unto Us A Child Is Born.flac)
------------
2 times (40.00%)B - Le Mal de Vivre.flac)
2 times (40.00%)A - Le Mal de Vivre.flac)
0 times (0.00%) D - Le Mal de Vivre.flac)
1 time (20.00%)C - Le Mal de Vivre.flac)
------------
0 times (0.00%) A - Wild World.flac)
1 time (20.00%)D - Wild World.flac)
2 times (40.00%)C - Wild World.flac)
2 times (40.00%)B - Wild World.flac)
 
line in to Hypex DLCP x6 UcD180 driving LXmini+ 2
1 time (20.00%)B - Wild World.flac)
2 times (40.00%)C - Wild World.flac)
2 times (40.00%)D - Wild World.flac)
0 times (0.00%) A - Wild World.flac)
------------
1 time (25.00%)A - Crowd Chant.flac)
1 time (25.00%)D - Crowd Chant.flac)
2 times (50.00%)C - Crowd Chant.flac)
0 times (0.00%)B - Crowd Chant.flac)
--------------
1 time (20.00%)A - For Unto Us A Child Is Born.flac)
0 times (0.00%) B - For Unto Us A Child Is Born.flac)
2 times (40.00%)C - For Unto Us A Child Is Born.flac)
2 times (40.00%)D - For Unto Us A Child Is Born.flac)
-----------------
0 times (0.00%) A - Le Mal de Vivre.flac)
2 times (50.00%)B - Le Mal de Vivre.flac)
1 time (25.00%)C - Le Mal de Vivre.flac)
1 time (25.00%)D - Le Mal de Vivre.flac)"
Wow. Very cool testing procedure and nice selection of gear used! Even though you could not hear a significant difference, I certainly appreciate the time spent.


----------
"For hearing test we often use this: https://tech.ebu.ch/news/ebu-cds-now-online-31oct08
Thank you"
Cool, someone familiar with the EBU sound testing method. Thanks for the link to the test material!

----------
"I did think I noticed differences (unfocused female voice in B for example) but I was simply hearing more detail in the recordings. Every time when I tried another device the same effect was there.  
I did hear more quiet non repeatable clicks than I expected in the X1. 
I find inexpensive audio electronics to be very acceptable nowadays, but less so speakers and indeed unchecked rips from dodgy computer drives... 
I value and enjoy your articles.
Many Thanks and Kind Regards
Chris C********"
Thanks Chris! You bring out an important phenomenon.

When we concentrate and listen intently, we often pick up those little subtle sounds that we thought we heard for the first time. However, when we go back to listen again to the other devices, we realize that they were there all along! Without the ability to do quick A/B testing, the listener typically will not have a chance to go back and confirm...

This results in a bias problem and comes up for example with cable tests that the manufacturers put together in audio shows. They'll use a "poor" generic cable, then prime listeners with expectations of how good the expensive one will be, then play the same song with the $5,000 cable, then ask "Did you hear the difference? Notice the fantastic natural reverb!". Not exactly fair if the listeners can't request quickly going back to the "poor" cable and double checking on that reverb, is it?

And so is born those classic (if not clichéreviewer comments like: "I heard things I never heard before thanks to this new DAC/preamp/amplifier/speaker/cable!" :-)



V. Parting comments on blind testing and critiques...

I hope you've enjoyed reading the subjective comments above. It was actually lots of fun digging through the results and going through these impressions from the respondents, answering any questions I can along the way.

Of course, I hope the blind test participants had "fun". Like I said last week, I believe this was not an easy test and I hope ultimately the experience was of value and now you can say you've taken part in a type of blind test and the results are out in cyberspace for all to see. I doubt most of us want to perform blind tests regularly (rather masochistic I think to want this, and there's only so much time in life to enjoy good music!). However, I do hope in one's lifetime as a "hardware audiophile" seeking high quality audio reproduction, one does have the opportunity to try a few of these once awhile. Remember, I recently quoted J. Gordon Holt's reference to the importance of "basic honesty controls" in the audiophile hobby. Measurements and blind tests have important roles to play by providing reality testing and keeping tabs on boastful claims.

By all means, hang out at one's audio dealership occasionally, perhaps make a pilgrimage to an audio show to hear the "latest and greatest", and go over to a fellow audiophile's home to check out his/her gear while listening to music. Just keep in mind to try an "honesty test" every once awhile! Doing this is probably good for an audiophile's soul to maintain equilibrium.

I believe "rational audiophiles" (nay, just simply rational human beings), appreciate that there is a world of "objective truth" out there that we should be tapping into and that not everything in this world is about one's own preferences or beliefs even if often these are the most important factors in decision-making. As we've seen in the news recently, there have been sad examples of solipsistic thinking leading to poor outcomes. Consider the unsubstantiated and irrational fears around immunization and measles outbreaks that should never be happening especially in the developed world in 2019. Though obviously as audiophiles we don't generally have to worry about disastrous outcomes (I hope nobody drains the bank account too much, feels too ripped off by "snake oil", or experiences marital discord on behalf of poor "acceptance factor" purchases!), it is important to keep an eye on achieving reasonable goals around "hi-fi" rather than going extreme into what J. G. Holt called and Brent Butterworth recently wrote about - "My Fi".

Something I have noticed over the years when conducting blind tests here is that some individuals are very much opposed to the exercise (especially the act of blinding). These are individuals who will come out and say that the "ADC isn't good enough", or the "90 second sample isn't long enough", or that the test will likely yield no significant results because of some characteristic they don't like without actually having any reference to evidence. While I appreciate the use of critical thinking skills, I wonder, do these individuals apply the same critical thought process to their own subjective impressions before expressing them online? Do they critically consider the subjective opinions of those they read in the media, the comments expressed on YouTube audiophile channels (like this and this), or the testimony of those in the Industry with financial interests? IMO, the fear of testing is a sign of insecurity much of the time.

"Pure subjectivist" audiophiles often have no qualms about writing in reviews that they heard "obvious" changes, how "thick veils" were lifted, how a cable or component rejuvenated their system, how much the bass improved, or the "sweetness" of the treble was elevated. But the moment one wants to suggest a test to prove the veracity of these claims in a controlled, blinded fashion, they either disengage, disparage, or just whine. In some of instances, they'll bring up a failed test from ages back and suggest that somehow this is representative of controlled, blind testing for all time (ahem... JA and his 1978 Quad experience). Despite blind testing being widely recognized as an essential part of most serious research (let's not split hairs at this time about whether single or double blind, ABX, etc...), many audiophiles prefer to find faults and make unreasonable excuses (like this).

Seriously guys, if some commonly-held subjective claims like changing USB cables resulted in massive differences (as recently claimed by Paul McGowan in his video ~5:30 where "every head" turned, asking "what just happened?"), then how hard would this be to prove with a blind test? Shouldn't that be even easier than picking out Device A as a computer motherboard in this test based on the "strength" of that story? Why not use basic "honesty control" techniques which can be used to elevate countless anecdotes such as this into the realm of evidence that can be replicated and hopefully substantiated?

I read with interest Steven Stone's recent article ("An Audio Test That May or May Not Prove Something") on our test here. I appreciate Mr. Stone for letting his readers know about the test; I suspect the article added a few more respondents to the database of 101. As you can see, the article starts not with trying to recognize the importance of independent blind testing but rather, by implication and language used, seems to want to cast doubts for readers as to what it means "if no discernible difference was found". What if "no discernible difference" simply means exactly that? There was indeed nothing to find! Sure, we have to be critical about research design, proper data collection, reasonable analysis, consider the respondents, etc., but isn't it OK also that results are just negative? (In fact, in the research world there are valid concerns about "positive publication bias".)

When it comes to high fidelity audio, with the evolution of digital technology getting better, faster, cheaper, and the human auditory system obviously not evolving any quicker in response, should there not come a point where the technology is more than "good enough" and testing at the margins of audibility will show no further difference? By the way, reading that first paragraph by Mr. Stone, it sounded like even before any results were known or published, he expected our results here to be negative - O ye of little faith!

Since Mr. Stone's article summarized nicely some criticisms of this blind test I've also read expressed elsewhere, let's address them briefly, but I hope satisfactorily.

One criticism is around the ADC/DAC steps involved... Folks, have a listen to this 8th generation AD/DA comparison done by esldude on Audiophile Style if you think modern AD/DA conversions are so terrible that they result in massive changes/coloration in sound. For the blind test here, I'm only doing a single generation audio capture in 24/96 of 16/44.1 playback and the listeners will have heard 2 DA steps (once from the Device playing while digitized by the ADC, and the other from their own DAC in 24/96 high resolution). Again, I refer the reader to the comparison above where I showed the exact left-right balance for peak and average amplitudes when we compare the original CD rip with the Oppo's ADC recording. IMO, there is no big problem here.

As for the other critiques from Mr. Stone, such as "the device or App used for the actual volume adjustments was not noted" - well, why should I need to say every little detail at the outset? If this was such a big deal, thinking that I was going to use some poor software or "app", anyone could have just posted a comment on the Test Invitation page and I would have answered. I trust there's overall satisfaction with 32-bit volume adjustments using Adobe Audition CS6 which I explained in Part 1 of this series.

Also, "For me the main issue in this test was the use of 16/44 files". Huh? Why is this the main issue? As I explained, these days, are we not still collecting and playing 16/44 digital audio primarily? And why would "whether the original source file was a digital file or CD" matter? Although I cannot be sure, but it seems to me like he is hinting at the insecurity of the "bits are not bits" crowd who still seem to think that a bitperfect rip from a CD somehow "sounds different" from an otherwise exact digital file as if things like jitter somehow can travel with the data or maybe FLAC conversion affects the sound even on modern hardware (looked into awhile back). I hope this is not what he's getting at because it obviously flies against how digital audio works.

Remember that digitizing the analogue output from 16/44.1 playback to 24/96 is not the same as taking a digital signal and upsampling in a non-integer fashion (not that these days that's even a big problem!). There is no issue - remember, digital audio playback is not "stair-stepped" as some ridiculous ads might show or misinformed commentators might insinuate. The analogue output is smooth and the 24/96 digitization just sampled that.

Nonetheless, it's good to see that Mr. Stone was able to conclude that "this one has at least a chance of arriving at something..." (despite "My initial response to this on-line test was decidedly negative" - why!?). As you've read last week, indeed there are findings worth thinking about. I wonder if Mr. Stone himself might have been surprised by these results. But why is it that so many in the seemingly "subjective" camp are so eager to find faults with more objective methods when really IMO they should be putting more thought into their own "blind spots", biases, perceptual and cognitive limitations. Do some listeners think they have no such limitations? What preconceived notions do some hold about the nature of sound and human perception? Do they honestly think that the best engineers in this world who conceived the devices we listen to will turn a blind eye to objective verification of their designs?

I can only assume that some actually believe they are immune to psychological biases -  for example, consider this "professional scientist"... For those curious, here's a little review to consider (even if you're not a medical student) and a more scholarly paper on confirmation bias.

I know that Mr. Stone also writes for The Absolute Sound. Considering that TAS is quite widely circulated here in North America, it is unfortunately also a place where objective exploration of technological products is nowhere to be found. Or when it is, it's honestly kinda weird.

As much as I enjoy running these kinds of tests and exploring the results with you, I've always believed that it should be the "professional" media's role in educating and trying to find truth in the claims the Industry makes. Instead of finding truth, sadly these days, the media often act as nothing more than perpetuators of many questionable claims and as advertisers for such products. IMO, they should be the ones engaging with the audiophile hobbyists to show what is important and what isn't. Would it not be noble as actual journalists to sniff out the snake oil, correct misconceptions, squash myths, in effect presenting a balanced picture for consumers? Are the audiophile press (both in print and online) doing this at least to some extent? If not, then I think audiophiles should be asking in whose interest does the media serve.

I recognize of course that the audiophile media is but a tiny grain of sand considering all the places where these same questions can (and should) be asked.

That's all for now. Hope you enjoyed the blind test and reviewing the results... A final thanks and congrats to all the 101 participants for a job well done.

Until next time, enjoy the music!

29 comments:

  1. I did not participate in the test (came late to it) but enjoyed reading the results. IMO, the hostility to these tests from the audio review community is based on a perceived threat to their credibility and livelihood. And I would say that their fears are justified.

    ReplyDelete
    Replies
    1. Yeah, I suppose that is the fear of the salesman who prides himself on being specially attuned believing that non-technical "experience" is able to build credibility for the sales job.

      Unlike say jewelry or maybe handbags where the materials and workmanship *is* the thing one tries to sell, for audiophiles, as much as these factors are important, it is the "sound quality" that the reviewers and audiophiles use to differentiate devices. After all, what "guy" wants to be seen as putting money down on a more expensive DAC with full recognition that the "sound quality" itself is not where the money went, but rather the trimmings?

      I think there's something to be said about chasing after luxury as being "OK" as well. I mean, who doesn't like the "finer things in life"?

      At least acknowledging that would be honest in many cases in the world of the "high end"!

      Delete
  2. Great hearing acuity and analysis in some of those findings!
    If I may venture an a posteriori comment, for what it’s worth:

    As someone who did not find significant difference between the four devices, I was pleased at first to see no clear ordering (statistically valid) appearing except for the « worst » device, so that got me worried.

    At first, thinking that the low frequency limitations of my AKG K702 headphones masked any 60 Hz hum, I routed the files to my main system having a 15 inch Velodyne subwoofer (through a Chromecast Audio) but heard no hum. Not surprising, -104 dBFS versus -114 should not be heard except maybe with cataclysmic volume setting…

    This puzzled me because, even going back again to compare device A and C with the two pieces I used (the Handel and the singer/piano) I still found no difference. But then I forced myself to compare the two other pieces and NOW I heard the difference! For device C, perceived stereo separation on « Wild world » is much better and the « voice » of the singer is cleaner. On the « Crowd chant », the beginning drum beat heard as an almost pulsed white noise shows more high frequency in device C and the crowd has more individual voices to my ears.

    It seems that, for me, there needs to be very dense sound to be able to notice things like this. With highly dynamic and not so complex music (my two choices) I didn’t hear those differences. Maybe the two pop music examples with multi-channel mixing where a lot of similar volume tracks are panned left to right and slightly separated tax more the clarity of a DAC than a more or less « live » performance. I think also that what downrates device A must be the noisy electronics inside the computer that muddies the amplification circuits rather than the DAC chip itself.

    My bad! I guess I should not have stuck with music I liked for this test… If a dense piece of classical music like a Mahler symphony extract had been used, I might have done better.

    And now I’m stuck with « Wild world » as an earworm for a while… :-)

    ReplyDelete
    Replies
    1. LOL, thanks Gilles for the "extended" testing! Time to take out the karaoke machine for "Wild World" :-).

      Yeah, there's something to be said about the nature of the music used. I know a guy who normally doesn't like hard rock at all but prefers to use AC/DC and Metallica when doing A/B testing.

      There are certainly many differences between the devices technically. Noise is the one I point my finger to because it's the easiest to show a difference especially with the levels from the motherboard, but also there are phase differences from the digital filter, subtle frequency response variations, and also clock differences with each device.

      The human mind of course integrates all of these variations into perception and maybe the sum of everything together pushes perception pass the threshold into reportable audibility.

      Delete
  3. Interesting reading. Thank you for reminding me of my scoring - it appears my scores show I heard more clearly than I reported. Taking the time to tally up = A 11, B 11, C 18, D 18. So a reasonably clear preference for the Sony and Oppo, and more so as the better kit was used.
    Sennheiser HD380 = A 5, B 5, C 5, D 5.
    Etymotic HF5 = A 4, B 3, C 6, D 7.
    Hypex/ LXmini+2 = A 2, B 3, C 7, D 6.
    Lesson learned; review the data and well as the perception. For pride and price it’s a bit of a relief.
    Thank you for taking the time to run these tests.

    ReplyDelete
    Replies
    1. Nice Giraffe!

      There is certainly more to perception than what we perhaps might consciously realize. I have on occasion run into a situation where I thought my ABX testing was "surely" going to be negative but the result in fact suggests there was some perceptible difference after all.

      Only with repeated runs and data like this can we then put it together and see potential patterns. Nonetheless, given that you still consciously didn't think you heard much difference, it's important to recognize the subtlety of the sound between these very disparate DAC's/players. And one will have to think about the monetary value and how much the DAC has a role in actual music enjoyment.

      Delete
    2. Indeed so - don’t forget to enjoy the music.

      Also, just in case you didn’t realise, the correct spelling is Linkwitz.

      Delete
    3. Woops...

      Fixed the spelling. Somehow had an Eastern European spelling in mind :-).

      Delete
  4. Thanks so much for posting my comment as the first in the "slight differences" section I'm truly honored! It's funny to me how much the audiophile press doesn't take into account price to performance ratio. They talk about uber-expensive equipment like it's totally reasonable and worth it to spend $20,000 on an optical disc player/transport. That's more than I've spent on most cars I've owned! I actually think I am definitely guilty of some "confirmation bias" though. I just replaced my amplifier I had an entry level Pioneer AVR that I had been using for several years and just replaced it with an entry level Yamaha integrated amp. With the Yamaha I "hear" more detail, dynamics, seperation, imaging, clarity, etc. Now the "objective differences" are the Yamaha is class a/b whereas the Pioneer is class D and the Yamaha is more "honestly rated" watts wise I've confirmed this specifically by using a digital spf meter I had to turn the volume knob on the Pioneer to 75%-85% of max to get to 85 db depending on the recording and source. Whereas with the Yamaha I can easily get there under well under 50% same speakers of course, if I pass 50% on the Yamaha it's too damn loud to listen to. I can't actually confirm if one is better than the other as I have no way to actually easily a/b the devices. But I think I'm in the sweet spot for price to performance ratio with my current setup. Nah who am I kidding? I clearly need a $250,000 amplifier.

    ReplyDelete
    Replies
    1. Hey BMN,
      Yes. You need that $250k amplifier :-).

      But seriously, congrats on that Yamaha amp upgrade and confirming with the SPL meter that indeed there was a clear difference (which you heard already of course).

      DAC's were the obvious choice for a widely distributed blind test like this on account of them being relatively easy to measure and capture. As we proceed down the audio system chain towards preamp, amp and ultimately transducers, things do get more difficult and impossible to "blind test" like this. But I do believe that the cost-benefit equation remains important!

      Delete
  5. I read quite a number of people said noise isn't the main problem of the ASRock board. I keep an archive of some of my old measurements and found a 1999 Sound Blaster Live (CT4830 with 18-bit AC97 codec) recorded by a 2005 X-Fi XtremeMusic (SB0460, ADC:WM8775), with a GPU card installed (GeForce FX5500) and a Pentium 4 system.

    https://forums.dearhoney.idv.tw/viewtopic.php?f=1&t=54766
    see the attached CT4830_RMAA.rar

    No high 50/60Hz spikes at least on output, but non-flat frequency response, channel imbalance and high noise floor. People who read FFT analysis tend to underestimate noise level and dynamic range since as long as there is no spurious tones, noise floor can always be visually reduced by using higher FFT sizes. However, in case of this CT4830, the noise was clearly audible to me.

    Anyway, I can make some other speculations of what people heard with the ASRock board. Apart from Oppo, both the Sony SACD player and iPhone also have a bit of intersample headroom. Obviously Crowd Chant clipped a lot, but the ASRock even clipped a bit at 1:15 - 1:16 of Wild World and For Unto Us A Child Is Born after 1:10.

    As for the comment of ABXing lossy vs lossless formats and worries about software volume control, I have some recommended links:

    Quality of software volume control:
    https://www.audiosciencereview.com/forum/index.php?threads/dac-attenuation-before-power-amp.5844/post-131431
    https://www.audiosciencereview.com/forum/index.php?threads/does-dsd-sound-better-than-pcm.5700/post-135477

    Importance of software volume control/management when ABXing lossy formats:
    https://forum.cockos.com/showpost.php?p=2001665&postcount=30

    Guess what, it is not the use of software volume controls degrade quality, it is the opposite, people who don't use software volume controls can actually get additional distortion.

    ReplyDelete
    Replies
    1. Thanks Bennet for all the work!

      That's an awesome post on the importance of keeping in mind the internal data conversion between integer and floating point conversions. I think you've got a great point also about the clipping characteristics of the different DACs and how that as well is another variable to consider as adding to the audibility.

      Delete
  6. Pretty epic Arch! Interesting results and comments to be sure. Thanks for including my ABX results and comments. For a 60 year old with years in the recording studio and live sound engineering, it seems I can still hear reasonably well :-) Mind you, I did protect my hearing as best as possible by being the hand controlling the volume (with an SPL meter) or using ear plugs or simply walking out of concerts with (wildly) excessive SPL, but I feel a large part of this and for others is training ones ears to know what to listen for.

    I can give an analogy example that most guitar players can relate to – transcribing notes by listening to a song. There is a process involved in training ones ear to hear the notes. It takes time and repetition to uncover, not only the notes, and writing them down in tab or notation, but how the note was played, long, short, etc. It is a real process that sometimes seems to take forever (ever tab a Jimi tune!?!). It is a process by which the more you listen, the more notes you hear. It is an ear awakening experience when you discover notes that you had not heard before, even though you have listened to the song already 30 times or more. It is a humbling experience as one thinks I should be able to hear that right off, but it is not how we as humans work. We are not binary like digital audio and therein lies the rub.

    What your blind test brings up is that for most rational folks, digital audio transparency is achieved not only in ubiquitous DAC devices like the iPhone, but also the 16/44.1 digital audio format. What does that say about mega dollar DAC’s? Or Hi rez digital audio files? I totally get luxury audio, but one must be clear that the dollars spent is on the luxury part and not additional audio performance. The issue I have are with audio products claiming superior audio performance without any rational or engineering basis for such a claim. No measurements, no blind listening tests, nothing other than marketing speak. As a result, we get, as an example, slight of hand pseudo engineering/marketing tricks like this because most consumers don’t understand how digital audio works: https://www.audiosciencereview.com/forum/index.php?threads/link-by-stack-audio-and-signal-detoxing.7578/#post-177129

    Again epic job Arch, I am hopeful folks understand that maybe their hard earned dollars are better spent on items that do have real audible differences like speakers. Or simply purchasing more music to enjoy!

    Keep up the great work man!

    Cheers,
    Mitch

    ReplyDelete
    Replies
    1. Thanks Mitch! Amen bro, testify.

      Nice to have input from a guy who worked behind the console for years and experience with those pesky eye diagrams :-).

      Great comment about guitar transcriptions and right on the money with the discussion about "money" as in the cost of luxury DACs these days.

      Cheers!

      Delete
  7. Respect. Archimago is the hardest workin' audiophile on the internets.

    ReplyDelete
    Replies
    1. LOL. The things we do for passion / love / neurosis...

      Delete
    2. *pushes Bran out of tower*

      Delete
    3. Don't remind me of GOT :-(.

      Still disappointed about that last season...

      Of course, when it comes to neurosis, at least I'm not as severe as some audiophiles. At least "bits are essentially bits" and "not everything matter" IMO :-).

      Delete
  8. Again, thanks for this test - very interesting.

    I was reading through the comments and saw one set that looked very insightful and well-written - then realized it must be mine ;) I clearly heard the inferiority of the motherboard (as did almost everyone else) but also preferred the Sony to the Oppo.

    I'm a value shopper ;) but would certainly pay to upgrade over that motherboard!

    ReplyDelete
    Replies
    1. LOL. Yeah...

      Value shopper. And maybe with preference for linear phase filter setting!

      Thanks for the feedback!

      Delete
  9. Thanks Archimago, I really enjoyed reading the conclusions to this test! It's remarkable that you were able to get enough resolving power to show significant results re: the motherboard with N=101; I'd have thought that with audio, even that might have been too slight.

    I'm also very pleased with your application of experimental design & statistics. All too often, I find the "objectivist" railing on about their ABX to fall blindly into statistical pitfalls.

    It's a shame that we don't see the knowledge & courage to perform tests at this level of rigor more broadly-- I'd love to see a "mainstream" audience (i.e. stereophile/head-fi levels of reach) contribute to N>1000 statistics on this matter (of course, the nature of that audience is prohibitive in itself...). I've had this thought many a time, mostly on superbestaudiofriends, where we see anecdotal (N<3) blind testing on DACs to mixed (but often informative nevertheless) result.

    If you have any interest in following up this magnificent series, one thing I'm certainly curious about is the audibility of different DAC technologies. In the past, you've addressed filters & NOS from a measurement perspective, but I'm curious what a test like this would do with different fundamental conversion technologies at a similar price point-- perhaps your ESS powered Oppo vs. the AKM powered RME ADI-2 vs. a Schiit multibit IC vs. a discrete resistor ladder DAC (vs. perhaps a pure DSD DAC).

    ReplyDelete
    Replies
    1. Hi dhruvfire,
      Thanks for the comment. As the technology gets better and better, I think the only way to tease these things out realistically is with tests like this where one needs those numbers of people all listening and contributing to statistically demonstrate differences...

      I agree. The days of simple "n<3" types of anecdotes (or n=1 - ie. the lone reviewer) are really quite limited in contributing to actual knowledge of how these things sound among the public.

      Great idea with the DAC tests. Objectively, the hypothesis probably would be that the discrete resistor DACs would have higher noise and poorer linearity thus potentially can be differentiated especially if fed unfiltered 44kHz material.

      The trick with these tests is really about having a reasonable methodology in place and of course coordination for gathering results while maintaining the blind condition. Could be hard in many cases especially over the internet!

      Will certainly think about future tests like this!

      Delete
  10. Thanks a lot for this great test. Fun tidbit: I spent a few evenings in intensive (and tiring) listening sessions, taking notes on paper and then tallied and summarized them, again on paper. I kept my final selection on a flashcard, waiting for the disclosure. When it finally came, you guessed it, the flashcard was nowhere to be found (lesson: never, ever, clean your office) and I could not remember the one I picked. Now, I know and feel a bit relieved ;)

    And, as Mitch said above, if my choice was not pure luck, the credit goes not to my ear, but to the main test speakers. All my cables are plain standard cheap stuff, I can't tell the difference between a DAC like the KTB or a more expensive one, while I may have used a relatively expensive amp for the test (don't remember), cheap hypex monoblocks would have done the same job. Speakers, on the other hand...

    And one last word, I believe Mitch above is the person who wrote the e-book I am currently seeing advertised on this page. This is really a great book for hifi lovers and I wholeheartedly recommend it (as a customer myself) even if you don't need to use DSP on your system.

    Thanks again.

    ReplyDelete
    Replies
    1. Hi Pierre,
      Yup the same Mitch.

      Good speakers are essential and fine tuning the sound with DSP makes a big difference!

      Delete
  11. Hi Archimago! (Sorry for my English, I am not native speaker) I am following your blog many years, but this will be my first post here. I have some ideas about this particular test. First of all used music material. There is track Crowd Chant which has True peak level +0.2 dB according to Jriver and +0.5 dB according to musictester.com. True peak levels above 0 dB are causing intersample overload in digital filters. Which means that any DAC which don't have headroom in digital filter will cause distorted audio signal. Best example from your test is device A. Load your flac file to spectrum analyzer and you will see it. Remember JRaudio method how to measure it. Since a lot of music nowadays are recorded at levels above 0dB TP, means that listening music via devices like A will cause distorted sound.If people don't hear it then it means that they don't have enough resolution audio systems or their listening experience is too low. But everyone can be happy with what he has and I am not going to blame them. From this point of view device C is the best.

    Another idea what I wanted to share is about music for tests, try music with deep bass and then you immediately can hear that devices like B can't reproduce it. Because by design those are made to have good standby time and talking time. Deep bass drains batteries. Once I wanted to show my friend how this kind of device can play streamed music and we did test of the same track from CD and from streaming service. Difference was in bass area (we had connected device to tube amplifier, so load impedance was 1 Mohm). And again if people are happy with portable devices, I don't mind.

    Final words. DBT is good for having unbiased opinion, however I don't trust in pure listening tests and pure measurements tests only. Both needs to be combined and we need to have broader view on subject and ask questions why I can't hear what is measurable and why I can't measure what I can hear.

    And I always like your last message – Wishing you all happy listening!

    ReplyDelete
    Replies
    1. http://musictester.com/demo/ tells me that the various versions of Crowd Chant have a True Peak in the range -2 to -3 dB , which accords with what Adobe Audition reports.

      Delete
    2. I mean original 16/44.1 90-120 second clips used in the "Do digital audio players sound different playing 16/44.1 music?" blind test.

      Delete
  12. This comment has been removed by the author.

    ReplyDelete