Over the course of 2 months (April 19 to June 20, 2014), an invitation was extended from this blog (archimago.blogspot.ca) to various "audiophile" forums on the Internet for participants to submit responses to an anonymous survey to see if they can identify which sample of music was the original 24-bit source versus the same piece of music (exact same mastering) dithered down to 16-bits.
Although the following may seem pedantic, I want to lay out the procedure used transparently and in detail so as to be clear of the nature of this test and what was done to collect the data.
The musical samples were taken from freely available sources on the internet; 2 classical pieces from the Norwegian studio 2L recorded in high resolution digital and 1 from the Open Goldberg Variations. For the purposes of this test, the "high resolution" 24/96 file samples were utilized directly from those sources (ie. I did not want to do any manipulation of the data like resample to 48kHz).
Musical samples from 2L (available here):
1. Eugène Bozza - la Voie Triomphale (performed by The Staff Band of the Norwegian Armed Forces): A well recorded orchestral track originally recorded in DXD (32/352.8).
2. Vivaldi - Recitative and Aria from Cantata RV 679, "Che giova il sospirar, povero core" (performed by Tone Wik & Barokkanerne) - String orchestra with female vocals. Also DXD-recorded originally based on the description from the website.
The third sample is taken from the excellent recent recording off the Open Goldberg Variations. Again, I am using the 24/96 high-resolution download as a starting point:
Due to the size of high-resolution downloads, each sample was limited to 1.5-2 minutes (the 2L samples were 2 minutes long, 1.5 minutes for the Bach). Some of the more interesting or dynamic portions of the musical samples were selected. Only fade in and fade outs were added to the beginning and/or end of the tracks of <2 seconds so as not to be too abrupt. FLAC compression was used to decrease file size.
The dithering process was basic. Using an older version of Adobe Audition (version 3.0.1), a flat triangular dither of 0.5 bits was utilized with settings as shown:
The sample rate was kept at 96kHz. These are very conservative settings and no advanced settings like noise shaping was utilized as featured in some of the "better" dithering algorithms like iZotope's MBIT+ or Weiss' POWr, etc. Adobe Audition again was used to convert the dithered 16-bit data back to a 24-bits container.
The 24-bit and (effective) 16-bit versions were randomly assigned as Sample A or B and files were enumerated 1 to 6 in the final package downloaded by the respondents.
Due to the fact that this is an "open" test released on the Internet (rather than a listening test in a lab situation where variables could be easily controlled), some measures were implemented to prevent easy differentiation of 24 vs. 16 bit-depth by other means than just listening. (Thanks to Wombat for giving me some ideas.)
1. Files 2, 4 and 6 (Sample B of each track) had 1 ms cut off from the start and files 1, 3, and 5 (Sample A) had 1 ms truncated from the end. This maintains the exact duration of Sample A and B but shifted them temporally. Doing this confounded simple null tests that did not take into consideration the slight timing offset.
2. A very low level -140dB (average RMS power) white noise was mixed into the 16-bit dithered samples (remember, they were placed in 24-bit containers) to affect the LSB so that a simple program that just checked the bit-depth (by looking for "0" in the least significant bits) will think that this is an actual 24-bit resolution file. This small amount of white noise would be inaudible and well below the dithered 16-bit audio noise floor (and below the objective noise floor of actual DACs).
3. FLAC was consistently LESS EFFICIENT at compressing the dithered (effective 16-bit) files resulting in larger file sizes. As a result, one of the 24-bit files was purposely compressed at FLAC level 2 (versus level 8) to make the file size slightly larger than the respective dithered version.
[Of note: the beta-testers wanted me to implement even more than the above to hide the identity of the 16-bit dithered files! I suppose I had more faith in human nature.]
Knowing the above, if one were to align the files, cut off 2 seconds from the front and end (to account for any slight variation in the fades), we could run the files through a null test and obtain the following amplitude results:
|Bozza - La Voie Triomphale|
|Vivaldi - Recitative & Aria|
|Bach - Goldberg Aria|
As you can see, the null test demonstrates peak amplitude difference down in the -90dB level (and average RMS difference down at -98dB) as a result of dithering from 24 to 16-bits. Also, for those who had a peek, you can see the higher noise floor during quiet portions such as this fade-in portion in the Bach Goldberg (0.501 seconds in):
|Dithered to 16-bits|
The resulting samples were also run through the DR Meter (version 1.1.1) in foobar to ensure that the volume levels were equivalent:
DR Peak RMS Duration Track
DR12 -10.35 dB -26.90 dB 1:30 05-Sample A - Goldberg Aria
DR12 -10.35 dB -26.90 dB 1:30 06-Sample B - Goldberg Aria
DR13 -0.17 dB -17.36 dB 2:00 01-Sample A - Bozza: La Voie Triomphale
DR13 -0.17 dB -17.36 dB 2:00 02-Sample B - Bozza: La Voie Triomphale
DR14 -4.13 dB -21.41 dB 2:00 03-Sample A - Vivaldi: Recitative & Aria
DR14 -4.13 dB -21.41 dB 2:00 04-Sample B - Vivaldi: Recitative & Aria
This also demonstrates that the samples were of good dynamic range - DR12 to 14. No major dynamic range compression, clipping or peak limiting in any of the source material as shown below:
|Bozza - La Voie Triomphale|
|Vivaldi - Recitative & Aria|
|Bach - Goldberg Aria|
These "audiophile" samples should therefore provide a good chance to experience dynamic nuances between 16-bit and 24-bit audio. (Much better than the typical compressed, limited audio of modern rock/pop recordings sold as "high resolution" routinely with <DR10.)
The samples were ZIPped together and distributed in a single file (~200MB in size). My FTP server was the primary download source with secondary download sites at privatebits.net (thanks again Ingemar), Uploaded.net, and FilePost.com.
Here then is the randomization used:
01 - Sample A - Bozza - La Voie Triomphale --- 16-bit
02 - Sample B - Bozza - La Voie Triomphale --- 24-bit
03 - Sample A - Vivaldi - Recitative & Aria --- 24-bit
04 - Sample B - Vivaldi - Recitative & Aria --- 16-bit
05 - Sample A - Goldberg --- 24-bit
06 - Sample B - Goldberg --- 16-bit
The 24-bit original audio files for the test samples are therefore B-A-A.
"Advertising" for this test was done through forum invitations extended to:
A few other smaller forums had invitations advertised as well. Invitations included a request for participants to NOT share their findings so as to affect others, and a warning that this is a 24-bit test, so the participant should try to ensure that the equipment (at least the DAC) is capable of >16-bit resolution. In general, participants were dissuaded from just using a direct computer motherboard/laptop output. I visited the advertisement threads on occasion and also reminded of the closure date on June 20, 2014. "Golden eared" audiophiles and those with high-end audio equipment were encouraged to participate. Due to the 2-month window, participants were asked not to rush the listening evaluation.
Participant results were collected through an active, paid account on: http://freeonlinesurveys.com/. Cookies were used to prevent double entries from the same computer. Participants were asked to:
1. Identify what they believe to be the 24-bit sample. (Presumably the "better sounding" track.)
2. Identify their level of certainty for each test track. Asked to grade on a 5 point scale (1 = "guess", 5 = "certain").
3. Tell me whether an ABX tool or other instantaneous comparison tool was utilized.
4. Provide demographics: gender, age, "musician" background, audio engineering/editing background, audio hardware reviewer status.
5. Describe evaluation hardware: components, cost of equipment.
6. Provide their subjective input: details on the hardware, any surprises in terms of difficulty, and a description of the audible difference (if any).
As suggested by the nature of this test and the data collected, I wished to answer the following questions (as expressed on April 30th on this thread in the Squeezebox forum):
1. How "easy" was it for people to detect (or report) a difference?
2. How accurate were the respondents in detecting the 24-bit sample?
It'll be interesting also to have a look at:
1. Which musical piece was it easier to hear a difference in.
2. Whether more expensive gear resulted in more accurate detection.
3. Whether age was a factor (might be hard to generalize unless I can normalize the gear quality).
4. Whether those who felt confident that they got it right actually did. Perhaps a measure of human ability to self-evaluate.
5. Whether there were more successful results from headphones vs. speakers.
Thank you to all the "beta testers" involved before the survey went public! Also, thank you again to all the participants who took the time.