Monday, 20 May 2013

PROTOCOL: [UPDATED] The DiffMaker Audio Composite (DMAC) Test.

Up to now, I have been using primarily a combination of RightMark along with the Dunn J-Test for my audio measurements. IMO, the standard procedure I've used thus far isn't bad and can already detect many anomalies in the hardware tested so far within the limits of the test system (ie. using my E-MU 0404USB as the ADC).

In the days ahead, I am going to start doing some Audio DiffMaker tests where appropriate; another freely available tool for the audiophile tester to find out what works, what doesn't, and to identify the difference. If you have not already guessed, some of my motivation in doing these tests is not only to feed my own curiosity, but also to encourage others to understand the tests and technology - hopefully in time elevate the knowledge base rather than unquestioned acceptance of many senseless audiophile myths out there.

If you peruse the DiffMaker site, it's quite obvious what this program does. It basically takes two recordings of the audio (presumably under 2 conditions or with different hardware), inverts one of them, and applies it to the other to see if the signals "null" each other out. The "magic" of course is in the algorithm used to align the samples in terms of time (including sample rate drift), and signal amplitude. If the recordings are identical, there should be a complete null where the result is silence. The program will create the "null" WAV file to review (very useful) and spit out a number representing the amount of "audio energy" left in the resulting null'ed audio file - expressed as dB's. The program calls this the "Correlated Null Depth". The higher this value, the more correlated the 2 samples are (ie. the "closer" they sound).

The beauty of this method is that one is free to use any audio input signal - freed from the need to remain bound to synthetic test tones which thus far I have been using. The main limitation so far with this software I have seen appears to be memory limits I've run into with long audio segments, it also takes a fair bit of computation to get the results. With my 6GB Windows 8 x64 laptop and DiffMaker 3.22 (September 2008), once I go beyond ~35 seconds 24/96 audio, the program runs into an error condition - presumably memory issues. Fair enough, I think 35 seconds is adequate to allow a decent comparison.

After a bit of consideration, I decided to create a "composite" audio test signal that I hope represents a reasonable survey of real music that is also challenging enough for a high-end audio system to reproduce.  For fun, I've called this audio track the "DiffMaker Audio Composite" (DMAC) Test which I think would be a reasonable test to apply to future evaluations I post on the blog. The DMAC consists of the following 4 tracks - all downsampled to 24/44kHz. Why you may ask? Simply because most digital music exists as 44kHz so it's important that this sampling rate be done right, and it is believed by many that 24-bit depth is the major factor lending improvement to hi-res audio quality. The tracks:

Rebecca Pidgeon - "Spanish Harlem" 3:02-3:11 (The Raven, 1994) - 9 seconds taken from the 2009 Bob Katz 15th Anniversary Edition at 24/88. Well known to most audiophiles as a vocal test track... Shakers in the background and such... Good evaluation of the mids.

The Prodigy - "Smack My Bitch Up" 2:13-2:22 (Fat Of The Land, 1997) - 9 seconds of loud and clipped techno/electronica. I applied -2dB to the track to allow extra headroom for the ADC without clipping. Low dynamic range, but intense bass. An example of "modern" mastering efforts. Taken from the CD 16/44.

Rachel Podger & Brecon Baroque - "Concerto In G Minor, BWV 1056: Presto" 00:02-0:10 (J.S. Bach: Violin Concertos, 2010, Channel Classics SACD to 24/88) - 8 seconds of lovely string classical work - good mid-range to highs, nice "microdynamics".

Pink Floyd - "Time" 00:06-00:10 (Dark Side Of The Moon, 1973) - 4 seconds of bells & chimes taken from the start of this track. Quite a lot of high-frequency content, detail in the sound, and channel separation. I used the 2011 24/96 Immersion Box Set remaster.

Interspersed between each track are dual bursts of 0.1s 1kHz tone at -4dBFS interspersed with 0.1s silence. This serves as a "beacon" for DiffMaker's alignment algorithm. The trickiest part of this test is temporal alignment and doing this has significantly improved the consistency of the results for me.

DMAC Waveform:


Vital stats for the 35 second test track:
DR9 (thanks in a large part to the loud compressed Prodigy track). Peak volume: -1.37 / -1.46 dB. Average RMS Power: -27.1 / -26.66 dB.

As with any proposed test, first thing to do is some form of validation.

I. Reliability

Setup: MacBook Pro Decibel --> shielded USB --> TEAC UD-501 (SHARP filter) --> shielded  RCA --> E-MU 0404USB --> shielded USB --> Win8 laptop

Although the DMAC track is 16/44, it was measured back at 24/96 where the E-MU 0404USB functioned optimally. I also turned ON compensation for sample rate drift. The rest of the settings are as per default.

Here are 15 runs with the DMAC track played back through my TEAC UD-501 looking at the reported "correlated null depth" as an objective measure by the program. I also had a look at the null waveforms to ensure there were no obvious technical issues. The runs were spaced out over 24-hours to capture changes in conditions that may be present over the course of the day, temperature variation, electrical condition, and how long the DAC and ADC had been turned on in order to get a sense of the error range. Interestingly, from what I can tell, the result seemed to vary with ambient temperature. Trials 4-8 were done in mid-day with temperatures going up to ~30 degrees Celsius where I did the tests. Of course, maybe other factors like electrical noise and powerline quality may have a hand in the variation during that time of the day. In general, since I do most of my testing in the evenings, those lower results serve as a reasonable lower extreme for this test. (BTW: I turned the WiFi off on the computers if anyone thinks that makes a difference.)



As you see, there is a range of results (mean = 80.74/79.66, standard dev = 3.88 / 3.89). Remember that because we are measuring the analogue output from the DAC, there will be some noise in the signal - this is an inevitable property of analogue signals especially since I'm re-digitizing it back with the ADC to measure.

II. Validity

Given the error range above, is it good enough to detect very small changes?

Let's try to measure the following conditions:

1. Adobe Audition 3 Graphic EQ boost of +0.3dB at 16kHz with another EQ boost of +0.3dB at 5kHz. The 16kHz change should be inaudible, and the 5kHz adjustment likewise should be inaudible except maybe to the best young golden ears. I was unable to ABX this EQ change using the Sennheiser HD800 + TEAC UD-501.

2. TEAC UD-501 digital filter set to SLOW. This involves a high frequency roll-off starting north of 15kHz. May be detectable to those with excellent high-frequency hearing but I think for the vast majority of us, this difference is unlikely to pass an ABX test.

3. TEAC UD-501 digital filter set to OFF. This is of course the "NOS" mode for the TEAC. I can quite readily hear the difference in an A-B test. Should not be a problem for the DMAC protocol.

Reminder of the TEAC filter frequency response curves:

Result of test conditions 1-3:

Not bad. Note that I only did 5 runs of each test condition (vs. 15 runs for the DMAC Reference). The Graphic EQ test and especially the "NOS" mode demonstrated significant variance from the Reference results. Setting the digital filter to SLOW hinted at lower correlation depth but remained within the range for the Reference tests suggesting that the DMAC protocol was unable to differentiate this condition (not surprising by the way since musical content drops off significantly up at 15+kHz where the SLOW roll-off operates).

4. Changes due to MP3 encoding. We know lossy encoding changes the bit-perfect nature of the signal. We know ~320kbps is audibly very subtle (as per the test that kicked off this blog). We know that lower bit rates will result  in more sonic degradation. Can the DMAC test differentiate MP3 from the lossless and further discriminate different bit rates using LAME 3.99.5 (3 runs each condition, CBR="Constant Bit Rate")?


Nice, it looks like indeed we can! Good correlation between decrease in "correlated null depth" (increasing variance) and lower bitrate for MP3 encoding. The machine isn't fooled by MP3 algorithms :-).

Of course there are other things I can do to demonstrate the validity of this test to show variance... I've done a few other things like varying degrees of EQ changes to demonstrate the correlation which I won't bore you with here.

Summary:

As you can see, it looks like the DMAC Test is quite reliable and can be shown to discriminate differences in audio even down to levels that are very unlikely to be heard by human listeners with the E-MU 0404USB as a measurement device.

A word about tests like this and audibility. Remember that humans listen with a powerful psychoacoustic "filter". The ear has significant physiological limitations. For example, we are sensitive especially to the 1-5kHz audio spectrum and quickly lose sensitivity to frequencies higher up - have a look at the Fletcher-Munson curves. Secondly, psychoacoustic effects like simultaneous and temporal masking renders certain details inaudible. This is part of the "magic" of lossy encoding algorithms - allowing software to throw out quite a lot of data/details yet maintaining excellent audio quality. (Interestingly, the DiffMaker program does have an "ARM-468 weighted energy" setting which may be closer to human perception but I have thus far not tried it yet.)

The results of tests like this one I believe can be used for correlation of the sonic output to demonstrate variance between signals (which is of course the intent of the software developers). However, because the machine does not have the psychoacoustic mechanism of humans, the results can never directly correlate with what is being heard subjectively. A good example is the similar score between the digital filter OFF (NOS) condition and MP3 192kbps. They both score around 50dB in "correlated null depth", but I would argue the MP3 encoding changes the sound significantly less than removing the digital filter (ie. the effect from a NOS DAC). In an AB test, I can detect a "dulling" of the high frequencies on tracks like the Prodigy sample with the digital filter turned off whereas the MP3 sounds less 'colored'.

One more thing about using the "Correlated Null Depth" value. What I'm showing here is all based on the measurements off my equipment using the E-MU 0404USB, TEAC UD-501 DAC, and procedure/settings I'm using. This means it's only useful for my test purposes and cannot be generalized otherwise. The measured value itself of course will fluctuate and time-to-time, I'm going to need to readjust the reference score based on hardware changes.

I look forward to incorporating this test with the others in the days ahead...

------
Addendum: Curious to see the difference between Reference null and what happens without a digital filter (ie. "NOS mode" on the TEAC)?

The following is what a high quality null WAV output looks like (~85dB) - "Spectral Frequency View" where the X-axis is time and Y-axis is frequency with the color representing amplitude at that specific frequency (blue/dark = low amount, red/bright = high):

Here's the TEAC UD-501 in "NOS mode" with digital filter turned off:

Impressive amount of variance. Also note the amount of high frequency content being recorded above 20kHz without the filter in place!

UPDATE: June 4, 2013
As requested, I've posted the DMAC test file as described above for download. Remember, this is like the "snips" of audio I posted months ago for the MP3 test. Although copyrighted material has been used to construct this test file, I believe it is being utilized as "fair use" for the purpose of education / demonstration...

 http://filepost.com/files/6mfe4c74/Archimago_-_DMAC_(24-44).zip

9 comments:

  1. I am not in the least surprised you found temperature dependency which is virtually unavoidable with electronics, unless good compensation circuits are present that is.
    Most likely the differences seen are not frequency (or BW) related but only amplitude related. Possibly caused by the reference voltage(s) in the DAC and or ADC.

    Even though I know this temperature dependency pretty well (I do tests for temperature dependency of certain analog electronics almost daily) I never linked that to audio.
    AFAIK you are the first one to show this in plots.

    I like this test idea.
    Perhaps a higher quality ADC and faster PC are to be considered.

    ReplyDelete
    Replies
    1. Doubt a PC would make a difference other than speed up the calculations on the DiffMaker program :-).

      The heat making a difference I think you're right. During the day that I was testing, it got quite warm here in Vancouver - with the gear sitting beside a large frosted glass window, temperatures probably ranged from low 20's to possibly low 30C thru the day... Also can't tell which piece of gear was most affected by the heat - DAC or ADC.

      Delete
  2. I have used Diffmaker myself. Your wise to include the short bursts to help Diffmaker sync up. Still I find it sometimes doesn't fully time align without getting somewhat corrupted by level differences.

    I found running AD and DA testing like yours even with locked clocks the timing between the AD and DA drifted enough to show up. Also the levels of the AD and DA would undergo very small short term variations as well. Testing one run vs another with the same equipment should give only the noise floor of your equipment. That it doesn't means something else is varying. That something being levels and timing. Diffmaker isn't totally successful fixing either though it does a good job.

    A worthwhile short section to add to your testing composite would allow you to see when timing is the primary reason for residuals in the difference signal. Make a test signal with two sine waves one octave apart, I like to use 3khz and 6 khz. Look at the FFT of the difference signal between two runs. If the residuals are the result of timing differences the residual levels of the two tones will differ by 6 db with the higher frequency tone having the higher level. If the residuals are for level differences only the two tones will have residuals at the same level. Once the contribution of timing and level gets within 10 db of each other you will see something between a zero and a 6 db difference.

    I also have found using Audacity to normalize the two files being compared will often result in Diffmaker giving a correlated null depth several decibels deeper. It works best if you normalize the two signals to a very small change in their existing average level. I believe by pre-normalizing the signal it lets Diffmaker do a cleaner job with time alignment or delay compensation.

    I don't know if you are locking the clocks of your AD and DA unit. If you do, and were say comparing two analog cables of different length, that length can effect the timing which sometimes Diffmaker successfully fixes and sometimes doesn't. 100 picoseconds is about the time the signal travels around an inch and a eighth. Obviously a meter vs two meter cable will alter the timing of the waveform sampling at the AD end by quite a bit.

    Good work here, glad to see someone blogging about these techniques.

    ReplyDelete
    Replies
    1. Thanks for the suggestions Dennis. Unfortunately I don't have the ability to lock the AD/DA clocks with this gear... Hence the "beacon beeps" to hopefully get them to sync up every 10 seconds or so as you noted.

      I'll have to look into that idea with dual sine waves! Great! ;-)

      Delete
  3. I mentioned this in the comments of one of your other blogs. For anyone who looks to using Diffmaker, sawtooth waves seem to be the waveform it can time align the easiest and most reliably. So my suggestion would be to replace the 1 khz beacon beeps with 1khz sawtooth beeps.

    ReplyDelete
  4. This comment has been removed by the author.

    ReplyDelete
  5. I just came from natural stone countertops werkblad.be where they (as also counter tops) selling composiet,keramiek,terrazzo,graniet. It looked like a fashion website how the products are presented. What do you think about that?

    ReplyDelete
  6. Lame composite provide the best composite material for beatify your house. they provide cheapest composite material in france. For any query visit us: www.lame-composite.com

    ReplyDelete