Friday, 27 June 2014

24-Bit vs. 16-Bit Audio Test - Part II: RESULTS & CONCLUSIONS


See Part I: PROCEDURE for details around the test samples used and how this study was conducted.

In this installment, let's have a look at the results from the 24-bit vs. 16-bit listening test among respondents.

First I need to remind everyone that the test procedure was not easy. As demonstrated in Part I, the sonic difference between the original 24-bit track and the 16-bit dithered version is down below -90dB. This makes the test much more difficult than the previous high bit-rate MP3 test from last year... Whether you were able to detect the 24-bit version or not, I applaud your efforts and input.

As I noted previously, there were 140 total respondents and looking at the transfer statistics from my FTP server, I know the test was downloaded at least 350 times. Response rate just based on my FTP server transfer was therefore about 40% of all who downloaded. The actual response rate would likely be significantly lower since there were other download sites.

Results

I. Demographics:


First let us consider the characteristics of the respondents taking this blind test. Being that this is an internet test, involves downloading 200MB worth of high-resolution audio data in FLAC, and given the target audiophile forums where the test was advertised, it is reasonable to conclude that many if not most are tech savvy audiophiles rather than the "average" music listener.

Not surprisingly, the vast majority (98%) were men which is expected (just have a look around audio clubs, audio shows, etc.) - thanks to the 2 ladies that responded!:
The age distribution likewise isn't a surprise. Audiophiles tend to be a bit older overall, and the average age if we estimate using the median age in each range comes out to about 44 years old. The distribution looks like this:
Nice to see some teenagers and early 20 year olds with the majority in the 41-50 age category. If one were a computer audio manufacturer, the 40-50 age group would be the one to target for maximal effect in 2014.

The survey also asked if some of the respondents belonged to specific categories such as musicians and those with audio engineering experience. This could be useful  in the sub-analysis to see if there were more "golden ears" in these groups:
 
By self report, there were >20% musicians and audio "engineers". Of course these 2 groups were not exclusive and 17/31 musicians also identified themselves as doing audio recording/mixing/editing.

As for the hardware utilized by the respondents, here is the general layout of the type of gear being used to evaluate:
In terms of operating systems, of the 3 main OSs - Windows, Mac, Linux - it's clear that Windows predominated. 129 respondents used one of these 3 OS's and Windows was 60% of that followed by Mac at 23% and Linux 17%. Among streaming devices the Squeezebox was tops. Most respondents used an external USB/Firewire DAC to conduct the evaluation; not surprising that in the computer audio world, SPDIF interfaces are no longer as common and a few used the HDMI interface (surround receiver devices).

There was an even split in respondents using speakers (bookshelf + tower) of 74 and headphones 72 (a few used both).

Here's how the audio system "cost structure" looked (US$):
Weighted average using the median price in each category yields a system price of around $8160 on account of the number of expensive 5-figure systems reported (22% had systems >$10,000). The median audio system price is in the $1000-$3000 range. This is very reasonable and again speaks to the demographic who would download and try a test like this. Objective >16-bit resolution is easily achieved in a $1000-3000 system as demonstrated with even relatively inexpensive DACs measured here over the last year and by having a look at the Stereophile objective results.

Many respondents went into detail describing their systems in the survey.  The first 25 responses included full Meridian active speakers, Sennheiser HD800 headphones with upgraded cabling, custom amplifiers, tube amplifiers, custom ESS9023 DAC, NAD amp, Lyngdorf TDAI 2170 digital amps into Intonation Terzian speakers, Overdrive SE USB DAC, Parasound Halo JC-1 monoblocks, custom ribbon speakers, Cambridge Azure 840E, Focal 1028BE speakers, Sonus Faber Cremona Auditor M speakers, Sony MDR-7509HD headphones, Grado SR325 headphones, Audiolab M-DAC, Chord Hugo DAC, AKG Q701 headphones, Squeezebox Transporter, PS Audio 4.6 preamp, Pass Aleph 5 amplifier, Devialet 170 integrated DAC/amp, Martin Logan Montis speakers, Geek Out 720. Clearly, many respondents used very high quality equipment for this test.

As a reflection of the technological savvy of the respondents, many utilized ABX testing such as the Foobar ABX tool:
20% utilized listening tools to evaluate (ignore that 3rd bar above since it's just a reflection of how many left a description, 29/140 used an ABX tool). Other than Foobar ABX, Mac ABXTester was common, and others described their own script.

II. Were the 24-bit audio files distinguishable from the same files dithered down to 16-bits (and fed into the DAC in the 24-bit container) by the respondents as a whole?

In total, the final result looked like this:




As you can see, in aggregate there is no evidence to show that the 140 respondents were able to identify the 24-bit sample. In fact it was an exact 50/50 for the Vivaldi and Goldberg! As for the Bozza sample, more respondents actually thought the dithered 16-bit version was the "better" sounding 24-bit file (statistically non-significant however, p-value 0.28).

Looking at the individual responses, there were a total of 20 respondents who correctly identified the B-A-A selection of 24-bit samples, and 21 selected the opposite A-B-B. This too is in line with expectations that 17.5 would pick each of these patterns based on chance alone.

III. How certain were the respondents that they answered correctly (ie. able to identify the 24-bit sample)?

24-32% of respondents felt they were unable to hear a difference (1 star = "Guessing"). If we consider that those who chose "2 Stars = more than a guess" also represent a very low level of certainty, then we can see that 45-52% of respondents really had quite low confidence that they were able to tell the difference.

Fewer respondents were "certain" about the solo piano piece (Goldberg), and in general more seemed confident about the Bozza piece. This could be listening fatigue if one were to progress through Bozza-Vivaldi-Goldberg in sequence to account for this result.

IV. Were the respondents who felt more certain about their answer more likely able to identify the 24-bit audio?

Let us have a look at the results reported by those who rated their confidence level as 4 or 5 ("very confident" to "certain" - 25-30% of all the responses):

"Correct" responses being the ones who were successful in identifying the 24-bit sample. As can be seen, there is no evidence to suggest that even in those respondents with a strong sense of confidence were able to identify the 24-bit sample (as sounding better). In fact, for the Goldberg sample, only 44% of those who were quite "certain" selected the 24-bit version correctly.

V. Were the subgroups (musicians, sound engineers, hardware reviewers) able to identify the 24-bit audio better?

Due to the fact that respondents admitting to "guessing" tended to answer with A-A-A and this would severely impact a small sample size, I decided to not count the "guesses" in these smaller subgroups and see if there was any pattern of higher accuracy compared to all respondents.

Musicians:


As a subgroup (total of 31 respondents), the self identified respondents with a "good amount" of musical background did not do well. In fact, this group of respondents consistently scored worse than the combined result. Curiously, the musician group seemed to select the 16-bit dithered Vivaldi as the "better" sounding version (p-value 0.047).

Sound "Engineers" (those with experience recording, mixing, editing):


As a group the "engineers" faired better than the musicians in terms of accurately identifying the 24-bit tracks. This subgroup surpassed the accuracy of the combined respondents marginally. Again, the number of individuals was small (34). There was an overlap between the "musician" and "engineer" group with 17 individuals identifying themselves as both.

Hardware Reviewers:

This was an optional survey item that could be interesting to look at since audiophiles who provide hardware review opinions can have significant influence on sentiment and purchasing decisions.


With only 8 respondents, it would be difficult to draw any firm conclusion other than there is no evidence to suggest this subgroup was any more able to identify the 24-bit from dithered 16-bit audio.

VI. Were those with more expensive hardware able to identify the 24-bit audio better?

In total, there were 44 (31.4%) respondents using $6000+ equipment to perform this test, let us see if they were more accurate than the group average in identifying the 24-bit sample:


As you can see, the ~30% of respondents utilizing equipment costing >$6000 were not able to accurately identify the 24-bit audio track any better than the group average. The Vivaldi track was exactly at 50% accuracy.

VII. Did Headphone Use Improve Accuracy?

72 respondents used headphones in their evaluation. Since headphones can be potentially more accurate (no room acoustics, better noise isolation) at a lower overall cost, it would be interesting to see if accuracy in determining which was the 24-bit sample was any better.

As you can see, headphone use did not result in any appreciable improvement.

VIII. Did age have any effect on the accuracy?

There were 44 respondents 51+ in age. As a group, this is how they did compared to the overall result:


No evidence again of any significant change in accuracy in identifying the 24-bit audio.

Conclusions:


In a naturalistic survey of 140 respondents using high quality musical samples sourced from high-resolution 24/96 digital audio collected over 2 months, there was no evidence that 24-bit audio could be appreciably differentiated from the same music dithered down to 16-bits using a basic algorithm (Adobe Audition 3, flat triangular dither, 0.5 bits).

This survey was targeted to audiophile enthusiasts who in general reported using equipment beyond typical consumer electronics. The majority (77%) were using audio systems reported in excess of US$1,000 and 22% were listening with systems in excess of $10,000. Furthermore, 20% used an ABX utility in the evaluation process suggesting good effort in trying to discern sonic differences. There were no surprises in terms of demographics with the vast majority being males, with an age distribution centred around 41-50 years old.

Subgroup analysis of "musicians" and those who work with the technical aspects of recording, editing and mixing ("engineers") did not demonstrate evidence of special abilities at discerning the 24-bit audio. The "engineers" group did perform slightly better overall. The small group of individuals who identified themselves as writing hardware reviews did not show an increase in accuracy.

About 50% of respondents admitted that they had low confidence in their ability to discern differences. Conversely, 25-30% (depending on which musical sample) of respondents reported a strong sense of "certainty" that they were correct in identifying the 24-bit sample. Nonetheless, analysis was not able to demonstrate improved accuracy despite claims of increased subjective confidence by the respondents.

Furthermore, analysis of those utilizing more expensive audio systems ($6,000+) did not show any evidence of the respondents being able to identify the 24-bit audio. Those using headphones likewise did not show any stronger preference for the higher bit-depth sample. No difference was noted in the "older" (51+ years) age group data (not surprising if there is no discernible difference even with potential age-related hearing acuity changes).

Limitations of the study includes the fact that this was an open test distributed via the Internet in an uncontrolled fashion. This allowed the opportunity for test subjects to analyze the audio files objectively rather than through pure listening. However, this is also the mechanism of delivery for high-resolution downloads and the test participants would likely be using the same equipment to listen. The benefit of course is that the results may reflect realistic feedback from potential consumers (if not the target audience) of high-resolution audio. Respondents were able to listen in their own home using their own equipment rather than an artificially controlled environment. The fact that there was no time limit (other than a 2 month window to gather survey submissions) should have been a less stressful experience for the testers.

140 participants is not a particularly large number of data points but it was adequate to demonstrate an even 50/50 split in preference across the 3 musical samples; a level of consistency which adds to the idea that listeners were unable to differentiate 24-bit audio from the dithered 16-bit counterpart. Replication of the results is of course advised.

As expressed previously in "High-Resolution Expectations" (See "Good Enough Room?" section), there is no good rationale for a dynamic range of greater than 16-bit digital audio in the home environment. The results of this survey appear to support the notion that high bit-depth music (24-bits) does not provide audible benefits despite the fact that objectively measurable DACs capable of >16-bit resolution are readily available at very reasonable cost these days.

If 24-bit audio imparts no audible benefit when listening to music compared to the same data dithered down to 16-bits, how certain can the audiophile consumer be that higher sampling rates (eg. 88/96/176/192kHz) would make much of any audible difference? This perhaps should be the target for another blind test. Methodologically, it would be extremely difficult to maintain the blind testing condition over the internet since it would be trivial to run the audio files through a spectrum analyser with no easy mechanism to conceal the bandwidth limitation of lower sampling rates (eg. 22kHz frequency headroom for 44kHz sampling). The reader is encouraged therefore to explore the effect of higher sample rates for him/herself.

One final comment in closing. Notice that the Goldberg track was soft and had a peak amplitude of -10.35dB as demonstrated by the DR Meter (see PROCEDURE post). This means that the full potential dynamic range was not being utilize and for the 16-bit dithered sample, the dynamic range can be encapsulated in <15-bits. Even with this limitation, there was no evidence that respondents were significantly able to identify a difference in aggregate or within subgroups.
 

-------------

As usual, I encourage others to do their own testing. Feel free to drop a link especially if there are other controlled, preferably blind tests showing a significant audible difference between 24-bit and 16-bit audio.

I will put up a Part III over the next week as well documenting the subjective comments made by respondents and final observations... Stay tuned.


Related to this test, you might find the follow-up blind test results from 2023 interesting as well:

Saturday, 21 June 2014

24-Bit vs. 16-Bit Audio Test - Part I: PROCEDURE

Disclosure: Just in case anyone is wondering, I want to make it clear that I have no affiliation with any audio company. I do not derive any financial benefit of significance from conducting this survey (a few dollars from the ad revenue I suppose). I enjoy the audio hobby and wanted to do some "reality testing".

Over the course of 2 months (April 19 to June 20, 2014), an invitation was extended from this blog (archimago.blogspot.ca) to various "audiophile" forums on the Internet for participants to submit responses to an anonymous survey to see if they can identify which sample of music was the original 24-bit source versus the same piece of music (exact same mastering) dithered down to 16-bits.

Although the following may seem pedantic, I want to lay out the procedure used transparently and in detail so as to be clear of the nature of this test and what was done to collect the data.

The musical samples were taken from freely available sources on the internet; 2 classical pieces from the Norwegian studio 2L recorded in high resolution digital and 1 from the Open Goldberg Variations. For the purposes of this test, the "high resolution" 24/96 file samples were utilized directly from those sources (ie. I did not want to do any manipulation of the data like resample to 48kHz).

Musical samples from 2L (available here):
1. Eugène Bozza - la Voie Triomphale (performed by The Staff Band of the Norwegian Armed Forces): A well recorded orchestral track originally recorded in DXD (32/352.8).

2. Vivaldi - Recitative and Aria from Cantata RV 679, "Che giova il sospirar, povero core" (performed by Tone Wik & Barokkanerne) - String orchestra with female vocals. Also DXD-recorded originally based on the description from the website.

The third sample is taken from the excellent recent recording off the Open Goldberg Variations. Again, I am using the 24/96 high-resolution download as a starting point:

3. Bach: GoldbergVariations BWV 988 - Aria (performed by Kimiko Ishizaka). The recording was done at Teldex Studio in Berlin using the Bösendorfer 290 Imperial CEUS concert grand piano. It has been said by some audiophiles that the piano is an extremely difficult instrument to reproduce well. It's also a much slower piece which provides an opportunity to listen to the note decay quality. Low-level spatial room acoustics are also easily heard on this recording.

Due to the size of high-resolution downloads, each sample was limited to 1.5-2 minutes (the 2L samples were 2 minutes long, 1.5 minutes for the Bach). Some of the more interesting or dynamic portions of the musical samples were selected. Only fade in and fade outs were added to the beginning and/or end of the tracks of <2 seconds so as not to be too abrupt. FLAC compression was used to decrease file size.

The dithering process was basic. Using an older version of Adobe Audition (version 3.0.1), a flat triangular dither of 0.5 bits was utilized with settings as shown:
The sample rate was kept at 96kHz. These are very conservative settings and no advanced settings like noise shaping was utilized as featured in some of the "better" dithering algorithms like iZotope's MBIT+ or Weiss' POWr, etc. Adobe Audition again was used to convert the dithered 16-bit data back to a 24-bits container.

The 24-bit and (effective) 16-bit versions were randomly assigned as Sample A or B and files were enumerated 1 to 6 in the final package downloaded by the respondents.

Due to the fact that this is an "open" test released on the Internet (rather than a listening test in a lab situation where variables could be easily controlled), some measures were implemented to prevent easy differentiation of 24 vs. 16 bit-depth by other means than just listening. (Thanks to Wombat for giving me some ideas.)

1. Files 2, 4 and 6 (Sample B of each track) had 1 ms cut off from the start and files 1, 3, and 5 (Sample A) had 1 ms truncated from the end. This maintains the exact duration of Sample A and B but shifted them temporally. Doing this confounded simple null tests that did not take into consideration the slight timing offset.

2. A very low level -140dB (average RMS power) white noise was mixed into the 16-bit dithered samples (remember, they were placed in 24-bit containers) to affect the LSB so that a simple program that just checked the bit-depth (by looking for "0" in the least significant bits) will think that this is an actual 24-bit resolution file. This small amount of white noise would be inaudible and well below the dithered 16-bit audio noise floor (and below the objective noise floor of actual DACs).

3. FLAC was consistently LESS EFFICIENT at compressing the dithered (effective 16-bit) files resulting in larger file sizes. As a result, one of the 24-bit files was purposely compressed at FLAC level 2 (versus level 8) to make the file size slightly larger than the respective dithered version.

[Of note: the beta-testers wanted me to implement even more than the above to hide the identity of the 16-bit dithered files! I suppose I had more faith in human nature.]

Knowing the above, if one were to align the files, cut off 2 seconds from the front and end (to account for any slight variation in the fades), we could run the files through a null test and obtain the following amplitude results:
Bozza - La Voie Triomphale
Vivaldi - Recitative & Aria
Bach - Goldberg Aria
As you can see, the null test demonstrates peak amplitude difference down in the -90dB level (and average RMS difference down at -98dB) as a result of dithering from 24 to 16-bits. Also, for those who had a peek, you can see the higher noise floor during quiet portions such as this fade-in portion in the Bach Goldberg (0.501 seconds in):
24-bit
Dithered to 16-bits
The resulting samples were also run through the DR Meter (version 1.1.1) in foobar to ensure that the volume levels were equivalent:

DR         Peak            RMS           Duration Track
--------------------------------------------------------------------------------
DR12     -10.35 dB   -26.90 dB      1:30 05-Sample A - Goldberg Aria
DR12     -10.35 dB   -26.90 dB      1:30 06-Sample B - Goldberg Aria
DR13      -0.17 dB   -17.36 dB      2:00 01-Sample A - Bozza: La Voie Triomphale
DR13      -0.17 dB   -17.36 dB      2:00 02-Sample B - Bozza: La Voie Triomphale
DR14      -4.13 dB   -21.41 dB      2:00 03-Sample A - Vivaldi: Recitative & Aria
DR14      -4.13 dB   -21.41 dB      2:00 04-Sample B - Vivaldi: Recitative & Aria

This also demonstrates that the samples were of good dynamic range - DR12 to 14. No major dynamic range compression, clipping or peak limiting in any of the source material as shown below:
Bozza - La Voie Triomphale
Vivaldi - Recitative & Aria
Bach - Goldberg Aria
These "audiophile" samples should therefore provide a good chance to experience dynamic nuances between 16-bit and 24-bit audio. (Much better than the typical compressed, limited audio of modern rock/pop recordings sold as "high resolution" routinely with <DR10.)

The samples were ZIPped together and distributed in a single file (~200MB in size). My FTP server was the primary download source with secondary download sites at privatebits.net (thanks again Ingemar), Uploaded.net, and FilePost.com.

Here then is the randomization used:

01 - Sample A - Bozza - La Voie Triomphale --- 16-bit
02 - Sample B - Bozza - La Voie Triomphale --- 24-bit
03 - Sample A - Vivaldi - Recitative & Aria --- 24-bit
04 - Sample B - Vivaldi - Recitative & Aria --- 16-bit
05 - Sample A - Goldberg --- 24-bit
06 - Sample B - Goldberg --- 16-bit

The 24-bit original audio files for the test samples are therefore B-A-A.

"Advertising" for this test was done through forum invitations extended to:
A few other smaller forums had invitations advertised as well. Invitations included a request for participants to NOT share their findings so as to affect others, and a warning that this is a 24-bit test, so the participant should try to ensure that the equipment (at least the DAC) is capable of >16-bit resolution. In general, participants were dissuaded from just using a direct computer motherboard/laptop output. I visited the advertisement threads on occasion and also reminded of the closure date on June 20, 2014. "Golden eared" audiophiles and those with high-end audio equipment were encouraged to participate. Due to the 2-month window, participants were asked not to rush the listening evaluation.

Participant results were collected through an active, paid account on: http://freeonlinesurveys.com/. Cookies were used to prevent double entries from the same computer. Participants were asked to:
1. Identify what they believe to be the 24-bit sample. (Presumably the "better sounding" track.)
2. Identify their level of certainty for each test track. Asked to grade on a 5 point scale (1 = "guess", 5 = "certain").
3. Tell me whether an ABX tool or other instantaneous comparison tool was utilized.
4. Provide demographics: gender, age, "musician" background, audio engineering/editing background, audio hardware reviewer status.
5. Describe evaluation hardware: components, cost of equipment.
6. Provide their subjective input: details on the hardware, any surprises in terms of difficulty, and a description of the audible difference (if any).

As suggested by the nature of this test and the data collected, I wished to answer the following questions (as expressed on April 30th on this thread in the Squeezebox forum):

Primary objectives:
1. How "easy" was it for people to detect (or report) a difference?
2. How accurate were the respondents in detecting the 24-bit sample?

It'll be interesting also to have a look at:
1. Which musical piece was it easier to hear a difference in.
2. Whether more expensive gear resulted in more accurate detection.
3. Whether age was a factor (might be hard to generalize unless I can normalize the gear quality).
4. Whether those who felt confident that they got it right actually did. Perhaps a measure of human ability to self-evaluate.
5. Whether there were more successful results from headphones vs. speakers.

Thank you to all the "beta testers" involved before the survey went public! Also, thank you again to all the participants who took the time.


24-bit vs. 16-bit Blind Listening Test Closed...

The day has arrived...



The survey for the blind test ended today! Thank you for everyone with the patience in taking the time to listen to the 3 samples and submitting your results. A few people admitted to only listening "a few times" but it certainly looks like the majority took the time to seriously listen and I certainly appreciate the detailed responses provided.

In total, I received 140 responses over the 2 months. Here's the map of the countries with submissions:



As you can see, not unexpectedly we have 3 main "clusters" of input from audiophiles - N. America, Europe, and the Pacific region (Asia + Australia + New Zealand). Then there's the single South African submission :-). The breakdown looks like this:

North America: 36 USA + 12 Canada = 48

Europe: 14 UK + 1 Spain + 6 France +2 Belgium + 8 Netherlands + 12 Germany + 1 Denmark + 4 Sweden + 4 Norway +1 Finland + 1 Estonia + 2 Austria + 3 Italy + 7 Croatia + 4 Hungary + 1 Bulgaria + 1 Turkey + 1 Cyprus +1 Israel = 74

Asia & Oceania: 1 India + 1 China + 1 Taiwan + 2 Malaysia + 3 Australia + 1 New Zealand = 9

Africa: 1 South Africa

Unknown: (for some reason IP could not be traced to country, I've seen this with Russian IP addresses) 8

I didn't work out the per-capita numbers but 7 from Croatia caught my eyes! Nice.

As in the MP3 Blind Test, I'm going to be posting the results over the next week or two in parts. Coming up in the next 24 hours will be a description of the procedure. This will include the "answers" as to which samples were the 24-bit audio. I'll speak about how the files were created as well as the dithering algorithm used. Following this will be the results and then a discussion of the implications of the findings...

Stay tuned!

Part I: PROCEDURE

Thursday, 12 June 2014

REMINDER: 1 Week Left (24-bit vs. 16-bit blind test)

¡Hola amigos!
Greetings from here:
Swimming with the turtles and stingrays off the coast of the Mayan Riviera...
Thought I'd just put up a little reminder that I'll be closing the blind test on June 20th - approximately 1 week from now. At this point, we're up to 120 responses on the survey (muchas gracias).

Although there are always limits to test methodology, and I certainly do not pretend that all variables have been controlled for, (indeed, it is impossible in cases like this where it's being done "remotely" over the internet!) I do believe this is a valuable test for the audiophile community. It's an opportunity to expose one's expectations (yes - 24-bits provide 16 million "levels" vs. the paltry 65536 "levels" of 16-bits) to reality testing in the comfort of one's home; away from Industry biases, suggestions from audio gurus, and group expectations that may be set-up when one goes to a show room or trade shows. This is about what audio lovers around the world actually hear in the real world...

Please put in your own response and suggest it to audiophile friends who may want to give this a try before the closing date. Feel free to also put it up on audio(phile) forums you may frequent. Just remember - you better have a system that has >16-bit capability.

Golden ears and those with 5+ figure audio systems - I would really love to have your continued survey response! I would also love to get musicians, sound engineers, and reviewers of audio hardware involved.

As usual, test details including procedure and files can be found here:
http://archimago.blogspot.com/2014/04/internet-test-24-bit-vs-16-bit-audio.html

Talk to you all later - likely after the test end date...