See Part I: PROCEDURE for details around the test samples used and how this study was conducted.
In this installment, let's have a look at the results from the 24-bit vs. 16-bit listening test among respondents.
First I need to remind everyone that the test procedure was not easy. As demonstrated in Part I, the sonic difference between the original 24-bit track and the 16-bit dithered version is down below -90dB. This makes the test much more difficult than the previous high bit-rate MP3 test from last year... Whether you were able to detect the 24-bit version or not, I applaud your efforts and input.
As I noted previously, there were 140 total respondents and looking at the transfer statistics from my FTP server, I know the test was downloaded at least 350 times. Response rate just based on my FTP server transfer was therefore about 40% of all who downloaded. The actual response rate would likely be significantly lower since there were other download sites.
First let us consider the characteristics of the respondents taking this blind test. Being that this is an internet test, involves downloading 200MB worth of high-resolution audio data in FLAC, and given the target audiophile forums where the test was advertised, it is reasonable to conclude that many if not most are tech savvy audiophiles rather than the "average" music listener.
Not surprisingly, the vast majority (98%) were men which is expected (just have a look around audio clubs, audio shows, etc.) - thanks to the 2 ladies that responded!:
The survey also asked if some of the respondents belonged to specific categories such as musicians and those with audio engineering experience. This could be useful in the sub-analysis to see if there were more "golden ears" in these groups:
By self report, there were >20% musicians and audio "engineers". Of course these 2 groups were not exclusive and 17/31 musicians also identified themselves as doing audio recording/mixing/editing.
As for the hardware utilized by the respondents, here is the general layout of the type of gear being used to evaluate:
In terms of operating systems, of the 3 main OSs - Windows, Mac, Linux - it's clear that Windows predominated. 129 respondents used one of these 3 OS's and Windows was 60% of that followed by Mac at 23% and Linux 17%. Among streaming devices the Squeezebox was tops. Most respondents used an external USB/Firewire DAC to conduct the evaluation; not surprising that in the computer audio world, SPDIF interfaces are no longer as common and a few used the HDMI interface (surround receiver devices).
There was an even split in respondents using speakers (bookshelf + tower) of 74 and headphones 72 (a few used both).
Here's how the audio system "cost structure" looked (US$):
Weighted average using the median price in each category yields a system price of around $8160 on account of the number of expensive 5-figure systems reported (22% had systems >$10,000). The median audio system price is in the $1000-$3000 range. This is very reasonable and again speaks to the demographic who would download and try a test like this. Objective >16-bit resolution is easily achieved in a $1000-3000 system as demonstrated with even relatively inexpensive DACs measured here over the last year and by having a look at the Stereophile objective results.
Many respondents went into detail describing their systems in the survey. The first 25 responses included full Meridian active speakers, Sennheiser HD800 headphones with upgraded cabling, custom amplifiers, tube amplifiers, custom ESS9023 DAC, NAD amp, Lyngdorf TDAI 2170 digital amps into Intonation Terzian speakers, Overdrive SE USB DAC, Parasound Halo JC-1 monoblocks, custom ribbon speakers, Cambridge Azure 840E, Focal 1028BE speakers, Sonus Faber Cremona Auditor M speakers, Sony MDR-7509HD headphones, Grado SR325 headphones, Audiolab M-DAC, Chord Hugo DAC, AKG Q701 headphones, Squeezebox Transporter, PS Audio 4.6 preamp, Pass Aleph 5 amplifier, Devialet 170 integrated DAC/amp, Martin Logan Montis speakers, Geek Out 720. Clearly, many respondents used very high quality equipment for this test.
As a reflection of the technological savvy of the respondents, many utilized ABX testing such as the Foobar ABX tool:
20% utilized listening tools to evaluate (ignore that 3rd bar above since it's just a reflection of how many left a description, 29/140 used an ABX tool). Other than Foobar ABX, Mac ABXTester was common, and others described their own script.
II. Were the 24-bit audio files distinguishable from the same files dithered down to 16-bits (and fed into the DAC in the 24-bit container) by the respondents as a whole?
In total, the final result looked like this:
As you can see, in aggregate there is no evidence to show that the 140 respondents were able to identify the 24-bit sample. In fact it was an exact 50/50 for the Vivaldi and Goldberg! As for the Bozza sample, more respondents actually thought the dithered 16-bit version was the "better" sounding 24-bit file (statistically non-significant however, p-value 0.28).
Looking at the individual responses, there were a total of 20 respondents who correctly identified the B-A-A selection of 24-bit samples, and 21 selected the opposite A-B-B. This too is in line with expectations that 17.5 would pick each of these patterns based on chance alone.
III. How certain were the respondents that they answered correctly (ie. able to identify the 24-bit sample)?
24-32% of respondents felt they were unable to hear a difference (1 star = "Guessing"). If we consider that those who chose "2 Stars = more than a guess" also represent a very low level of certainty, then we can see that 45-52% of respondents really had quite low confidence that they were able to tell the difference.
Fewer respondents were "certain" about the solo piano piece (Goldberg), and in general more seemed confident about the Bozza piece. This could be listening fatigue if one were to progress through Bozza-Vivaldi-Goldberg in sequence to account for this result.
IV. Were the respondents who felt more certain about their answer more likely able to identify the 24-bit audio?
Let us have a look at the results reported by those who rated their confidence level as 4 or 5 ("very confident" to "certain" - 25-30% of all the responses):
"Correct" responses being the ones who were successful in identifying the 24-bit sample. As can be seen, there is no evidence to suggest that even in those respondents with a strong sense of confidence were able to identify the 24-bit sample (as sounding better). In fact, for the Goldberg sample, only 44% of those who were quite "certain" selected the 24-bit version correctly.
V. Were the subgroups (musicians, sound engineers, hardware reviewers) able to identify the 24-bit audio better?
Due to the fact that respondents admitting to "guessing" tended to answer with A-A-A and this would severely impact a small sample size, I decided to not count the "guesses" in these smaller subgroups and see if there was any pattern of higher accuracy compared to all respondents.
As a subgroup (total of 31 respondents), the self identified respondents with a "good amount" of musical background did not do well. In fact, this group of respondents consistently scored worse than the combined result. Curiously, the musician group seemed to select the 16-bit dithered Vivaldi as the "better" sounding version (p-value 0.047).
Sound "Engineers" (those with experience recording, mixing, editing):
As a group the "engineers" faired better than the musicians in terms of accurately identifying the 24-bit tracks. This subgroup surpassed the accuracy of the combined respondents marginally. Again, the number of individuals was small (34). There was an overlap between the "musician" and "engineer" group with 17 individuals identifying themselves as both.
This was an optional survey item that could be interesting to look at since audiophiles who provide hardware review opinions can have significant influence on sentiment and purchasing decisions.
With only 8 respondents, it would be difficult to draw any firm conclusion other than there is no evidence to suggest this subgroup was any more able to identify the 24-bit from dithered 16-bit audio.
VI. Were those with more expensive hardware able to identify the 24-bit audio better?
In total, there were 44 (31.4%) respondents using $6000+ equipment to perform this test, let us see if they were more accurate than the group average in identifying the 24-bit sample:
As you can see, the ~30% of respondents utilizing equipment costing >$6000 were not able to accurately identify the 24-bit audio track any better than the group average. The Vivaldi track was exactly at 50% accuracy.
VII. Did Headphone Use Improve Accuracy?
72 respondents used headphones in their evaluation. Since headphones can be potentially more accurate (no room acoustics, better noise isolation) at a lower overall cost, it would be interesting to see if accuracy in determining which was the 24-bit sample was any better.
As you can see, headphone use did not result in any appreciable improvement.
VIII. Did age have any effect on the accuracy?
There were 44 respondents 51+ in age. As a group, this is how they did compared to the overall result:
No evidence again of any significant change in accuracy in identifying the 24-bit audio.
This survey was targeted to audiophile enthusiasts who in general reported using equipment beyond typical consumer electronics. The majority (77%) were using audio systems reported in excess of US$1,000 and 22% were listening with systems in excess of $10,000. Furthermore, 20% used an ABX utility in the evaluation process suggesting good effort in trying to discern sonic differences. There were no surprises in terms of demographics with the vast majority being males, with an age distribution centred around 41-50 years old.
Subgroup analysis of "musicians" and those who work with the technical aspects of recording, editing and mixing ("engineers") did not demonstrate evidence of special abilities at discerning the 24-bit audio. The "engineers" group did perform slightly better overall. The small group of individuals who identified themselves as writing hardware reviews did not show an increase in accuracy.
About 50% of respondents admitted that they had low confidence in their ability to discern differences. Conversely, 25-30% (depending on which musical sample) of respondents reported a strong sense of "certainty" that they were correct in identifying the 24-bit sample. Nonetheless, analysis was not able to demonstrate improved accuracy despite claims of increased subjective confidence by the respondents.
Furthermore, analysis of those utilizing more expensive audio systems ($6,000+) did not show any evidence of the respondents being able to identify the 24-bit audio. Those using headphones likewise did not show any stronger preference for the higher bit-depth sample. No difference was noted in the "older" (51+ years) age group data (not surprising if there is no discernible difference even with potential age-related hearing acuity changes).
Limitations of the study includes the fact that this was an open test distributed via the Internet in an uncontrolled fashion. This allowed the opportunity for test subjects to analyze the audio files objectively rather than through pure listening. However, this is also the mechanism of delivery for high-resolution downloads and the test participants would likely be using the same equipment to listen. The benefit of course is that the results may reflect realistic feedback from potential consumers (if not the target audience) of high-resolution audio. Respondents were able to listen in their own home using their own equipment rather than an artificially controlled environment. The fact that there was no time limit (other than a 2 month window to gather survey submissions) should have been a less stressful experience for the testers.
140 participants is not a particularly large number of data points but it was adequate to demonstrate an even 50/50 split in preference across the 3 musical samples; a level of consistency which adds to the idea that listeners were unable to differentiate 24-bit audio from the dithered 16-bit counterpart. Replication of the results is of course advised.
As expressed previously in "High-Resolution Expectations" (See "Good Enough Room?" section), there is no good rationale for a dynamic range of greater than 16-bit digital audio in the home environment. The results of this survey appear to support the notion that high bit-depth music (24-bits) does not provide audible benefits despite the fact that objectively measurable DACs capable of >16-bit resolution are readily available at very reasonable cost these days.
If 24-bit audio imparts no audible benefit when listening to music compared to the same data dithered down to 16-bits, how certain can the audiophile consumer be that higher sampling rates (eg. 88/96/176/192kHz) would make much of any audible difference? This perhaps should be the target for another blind test. Methodologically, it would be extremely difficult to maintain the blind testing condition over the internet since it would be trivial to run the audio files through a spectrum analyser with no easy mechanism to conceal the bandwidth limitation of lower sampling rates (eg. 22kHz frequency headroom for 44kHz sampling). The reader is encouraged therefore to explore the effect of higher sample rates for him/herself.
One final comment in closing. Notice that the Goldberg track was soft and had a peak amplitude of -10.35dB as demonstrated by the DR Meter (see PROCEDURE post). This means that the full potential dynamic range was not being utilize and for the 16-bit dithered sample, the dynamic range can be encapsulated in <15-bits. Even with this limitation, there was no evidence that respondents were significantly able to identify a difference in aggregate or within subgroups.
As usual, I encourage others to do their own testing. Feel free to drop a link especially if there are other controlled, preferably blind tests showing a significant audible difference between 24-bit and 16-bit audio.
I will put up a Part III over the next week as well documenting the subjective comments made by respondents and final observations... Stay tuned.