I received an invitation from Archimago to write something about volume control. While I am of the opinion that digital and analog volume controls can coexist to achieve an ideal gain stage, this article is mainly about PCM digital volume control.
The basic conclusion we can say regarding digital volume control is that as long as the playback device has higher bit-depth than the file source, it is possible to losslessly reduce the volume of a file until the playback device's bit-depth limit is reached. For example, with an ideal 24-bit device, it is possible to playback a 16-bit file 48dB lower without losing quality, because one bit has about 6dB of dynamic range (the exact formula of bitdepth and dynamic range is 6.02*n-bits + 1.76).
No DAC is ideal however. Remember that state of the art DACs these days actually "only" have about 21 bits of dynamic range at analog output. However, even within the digital domain, applying 48dB attenuation to a 16-bit signal to 24-bit output directly will result in some error. For example, here are some results of an original 1kHz sine wave, 16-bit (green) with -48dB gain using foobar2000, Adobe Audition and Reaper, directly converted to 24-bit output:
|Comparison of -48dB undithered gain (ie. volume reduction) using foobar, Adobe Audition, and Reaper compared to 16-bit dithered version.|
Of course, such artifacts can be avoided by using dither. foobar2000 does not support dithering at 24-bits, this is understandable as it is an audio player rather than a production tool. In the case of DAW software, both Audition and Reaper, and even Audacity can internally convert the 16-bit signal to a higher bitdepth (like 32-bits) then support 24-bit dithered output.
|-48dB attenuation to 16-bit signal. Output from Audition and Reaper at 24-bits with dither.|
So bennetng, do you mean digital volume controls can never achieve perfection and should never be used?
No, in fact it is the opposite. People who refuse to use digital volume control can actually be experiencing lower fidelity and higher distortion, not only on their systems, but such a mindset can even affect how recorded music is produced and distributed.
Archimago talked about intersample overload last year with a "malicious" test signal at +8dBTP (dB True Peak). iZotope has published a "proof" that intersample peaks can be arbitrarily high. However, these are extreme test signals, so why bother?
Want to have a look at the "true peaks" of your own audio files? While there are other software capable of this task, let's do it for free with foobar2000. It has two bundled resamplers: SSRC and PPHS. SSRC has a long history and can be traced back to the pre-2000 Winamp era. PPHS started to appear at 2004. However, my favourite resampler is the SoX-based foo_dsp_resampler because it is very fast. Below are screenshots showing how to enable true peak scanning.
|foobar Advanced settings.|
|Turn on true peak scanning.|
|foobar ReplayGain playlist and scan.|
2x upsampling:Intuitively, some may suspect a super fast resampler like SoX "must" be of lower quality and unreliable.
SoX normal 0:31
SoX best 0:43
PPHS ultra 9:36
We can have a look on Infinite Wave, a well-known (but not frequently updated) SRC benchmarking website, which uses single tone sweep tests for comparison. If you select the SoX graphs, you will see that the performance is in fact excellent compared to many other resamplers.
RightMark Audio Analyser's IMD swept frequency test is even more complex and can be used as an excellent indicator of SRC quality. The underlying signal looks like this:
Compared to the single tone sweep of Infinite Wave, we can expect "double trouble" if a resampler's quality is bad!
So, here is an experiment. Let's take a 32-bit float 44.1kHz RMAA test signal, upsample it to 76543Hz (prime number, ensuring the impossibility of integer resampling) and downsample it to 44.1kHz again. (SSRC is disqualified here since it does not support 76543Hz.)
Check out the results comparing the original signal with PPHS and SoX resampling:
|An atypical upsampling experiment - conversion to 76.543kHz in foobar...|
As we can see, the original signal and SoX (best setting) are basically overlapped in the IMD sweep graph, resulting in the same numerical amount of distortion (0.00001%). It means the distortion levels are below RMAA's measurable limit. SoX (normal) is not only much faster, it also has much lower distortion than PPHS. These results are impossible to differentiate in analog loopback tests.
Let's give SSRC a chance with a 44.1k --> 96k --> 44.1kHz test. Apart from some tiny differences above 19.5kHz due to the 95% bandwidth setting in SoX, SSRC and SoX (best)'s differences are below RMAA's measurable limit. Again, the great benefit is that SoX (best) remains much faster than SSRC - ~4.5x the speed! A nice example of the importance of software implementation.
|Comparison of SSRC with SoX (44.1 --> 96 --> 44.1kHz resampling).|
|ReplayGain results - RHCP's Californication & Metallica's Death Magnetic.|
Now take a look at this Gradius Ultimate Collection album:
|Example Gradius soundtrack True Peak.|
It is necessary to understand that the intersample headroom in a DAC can also be considered as a form of digital volume control. Aftermarket DACs and digital audio devices with built-in SRCs (samplerate conversion) exhibit differences in filter characteristics and clipping thresholds of these intersample peaks (remember that there are also "special" DACs that can be NOS, may implement atypical filters like MQA, or advertise unique characteristics like Chord's million tap filter). Some may oversample with a fixed amount of attenuation to provide some intersample overhead [Ed: such as the -4dB I added to the "Goldilocks" setting], but this could unnecessarily reduce dynamic range and would be unfair to music genres like chamber and new age as they are less likely to ever exhibit intersample clipping.
Unless we are using a standalone device like a disc player, in this era with computing power easily accessible, we usually have other means to adjust digital volume to avoid clipping before audio data reaches the DAC. For example, many software players offer DSP features like upsampling and EQ in the floating point domain and will generate real, non-intersample data above 0dBFS which are addressed with clip protection. Realize that even if we don't use any DSP features, some forms of clipping may still show up despite using digital volume control and/or having intersample headroom in the DAC. Let's take a look at one possible example...
Take a look at this track:
|"Hey, Guys!" from A Little Snow Fairy Sugar soundtrack.|
However, apart from the original FLAC, all lossy files still have positive peak values! What does it mean? The peaks are no longer "inter" sample, they are real sample data, not estimated. I converted the lossy files to 32-bit float and examined them with Adobe Audition. As we can see, the sample points are indeed beyond 0dBFS.
Modern lossy codecs are based on complex algorithms involving floating point math. However, everything outside of the playback software uses fixed-point (integer) formats, either in 24 or 32-bit. Unless we have a specialized playback software + specialized device driver + specialized hardware controller + specialized DAC chip, every peak with positive value will be clipped or dynamic compressed at some point after leaving the playback software. To avoid this, either use volume management systems within the playback software like ReplayGain, and/or use the built-in volume control of playback software, with protocols like WASAPI exclusive or ASIO to avoid Windows mixer hijacking the playback software's volume control. ReplayGain is always internal and cannot be hijacked. Even the non-audio specialized MPC-HC encourages internal volume attenuation. In the era of commercial streaming music services, there is really no excuse for the audio software/apps to not implement a floating point compliant processing pipeline when preparing masters and during playback.
[Ed: This is an important point and worth considering when thinking about the sound quality of streaming services... Even if lossy bitrate is the same, one can wonder about the quality of the encoder used, and whether the playback app handles peaks properly as Bennet suggested above.]
Scroll to bottom:
"it is often recommended to keep your limiter ceiling at -0.5 or -1 dB to prevent clipping from a potential .mp3 or AAC conversion later"
"Mastering against -3dBFS"
"If your master is louder than -14 dB integrated LUFS, make sure it stays below -2 dB TP (True Peak) max to avoid extra distortion."
To see what will happen I am going to use Spotify's suggestion and this song as an example:
18 - 6.63 = -11.37LUFS can hardly be called loudness war if we compare it with Californication and Death Magnetic. However, it is louder than -14LUFS, so it should be reduced to -2dBTP. We can immediately notice the -2dBTP version has a lower bitrate despite using the same FLAC codec and compression settings. Of course, FLAC is an integer format, reducing volume without padding to a higher bit-depth means throwing away data, even when dithering is applied. In this case, the file is reduced by -3.93dBFS to achieve -2dBTP, let's copy and paste the following line in Google:
That means the available 16-bit values (-32768 to 32767) in the "lossless" master are reduced to +/-20843 to make it compatible with people who don't use software volume control/management. Would it then make sense to release this as a 24-bit version to preserve the lower bits and selling it at a higher price with bloated file size?! Profitable plan perhaps!
To see if I have something else mastered in a similar way, here are some real examples in my collection:
Whether this phenomenon is a direct consequence of "fear of clipping" or not, translating potential clipping to absolute clipping, or wasting integer headroom permanently to accommodate the file level without improving perceived dynamic range is never a good thing.
To end this lengthy rant in a happier way, let's appreciate how elegant a floating-point data format is. Here is a result of the volume fade from -150dBFS to -infinity, stored as 32-bit float using WavPack:
|32-bit floating point fade from -150dBFS.|
|32-bit integer fade from -150dBFS with dither.|
|32-bit integer fade from -150dBFS without dither.|
Yes, with the formula 10^(-150/20)*2^31, there are "only" +/-68 quantization steps in 32-bit integer files from -150dBFS. All good modern software players need to adjust volume internally at least in 32-bit float, if not 64, and translate to 24 or 32-bit integer at the end of the processing pipeline, when it needs to communicate with the device driver.
Thanks Bennet! I was not expecting that :-).
Quite the whirlwind of experiments in volume control, trials of resamplers, discussions on bit-depth, multi-format lossy encoding differences, intersample overloading / "true peaks" detection and the differences between fixed-point/integer and floating point math and how this affects signal resolution! My sense is that if you, dear readers, can follow all of what Bennet ran us through above, you're way ahead of the vast majority of audiophiles out there (and I bet also ahead in the understanding of how digital audio works compared to the majority of audiophile reviewers :-).
Over the years, I have been asked by a number of readers and on forums about digital volume control. I know some audiophiles maintain a bias against digital volume manipulation (just like some complain about any kind of DSP being applied!). I agree with Bennet... Months ago, I spoke of the benefits of 24-bit high-resolution DACs and mentioned one of them being the ability to use digital volume control in a perceptually "lossless" fashion which is reiterated here in even greater detail with other nuances to keep in mind like the benefits of floating point processing pipelines in playback software.
Remember that these days, good high-resolution DACs are plentiful. We can measure excellent resolution with low noise floor beyond 16-bits quite easily. And when we look at modern computer software players and the high DSP precision available (JRiver 64-bits, Roon 64-bits, Audirvana licences 64-bit iZotope algorithms, HQPlayer 64/80-bits...), for all the calculations done internally before being reduced to 24/32-bits integer to feed the DAC, unless there's some kind of software bug or driver issues, one can be quite confident that there's really nothing to be worried about with reputable programs like these. Note that I'm not trivializing this by saying that any software that claims to process with 32-bits or 64-bits "must" be good. There are many technical details that need to be considered (as discussed in the ProTools 48-bit Mixer white paper Bennet linked to above). Much still rests with the implementation of the software just like the differences we saw above with resampling using SoX compared to the others.
One final and essential point. I assume we are all human beings reading this article :-). As a "mere mortal", just how many dB's of dynamic range do we think we "need" anyway to experience the pleasures of "high fidelity" audio? Mitch Barnett (aka mitchco on Computer Audiophile / Audiophile Style) published an experiment on himself back in 2013 where he demonstrated his audibility threshold at around -70dB using some dynamic rock tracks. I tried a similar test awhile back and I scored about the same. I highly recommend all "golden ears" to try the experiment themselves. You might be humbled at just how many "bits" of resolution you really can appreciate. In fact, peripheral evidence such as the fact that so many audiophiles find the limitations of vinyl playback to not be an issue adds to this realization that maybe we don't need "that many bits" after all. Remember that only some of the very best LP's can even surpass 70dB worth of dynamic range (70dB can be encoded in "only" 12-bits digitally).
Suppose we hooked a device like a 9 year old Squeezebox Touch to an amp that's set to play extremely loudly and we had to use the Touch's digital volume control to get the sound to a comfortable level. Let's give ourselves some margin and say 80dB is all the dynamic range we really need. This means that even a device like the old Squeezebox Touch, measured to be capable of ~105dB dynamic range can accommodate 25dB of digital attenuation before I suspect many of us would notice a substantial quality loss. (Recently, I received a PM asking about the SB Touch in fact...) A higher resolution DAC with even lower noise floor will allow one to attenuate even further while maintaining excellent quality. I see Resonessence published a similar discussion and comparison with analogue volume control awhile back.
Of course, I'm not recommending that audiophiles should compromise on sound quality; by all means, try to maintain >100dB potential resolution through the whole hi-fi system! I'm just suggesting that we remain realistic about what we "need" without going extremely overboard with claims. As I discussed before, technical "transparency" is what I seek as an audiophile aiming for high-fidelity from my audio system(s). As humans, we might be surprised that given our inherent biological limitations, transparency can be achieved with DACs and even amplifiers without great difficulty these days.
As per Bennet's article, be aware of the limitations of 24-bit integer vs. 32+-bit floating point. Digital volume processing (like ReplayGain) can be a very good thing in the playback pipeline. Know that there are nuances like "true peaks" and "intersample overs" in the world of digital production and reproduction that could worsen playback quality if unaccounted. But remember that there are things even more important like the distortions of our transducers (speakers and headphones), comparatively lower-than-hi-res-DAC resolution from our amplifiers, limitations of room acoustics, and of course, a reasonable level of humility around possessing "golden ears"...
Enough for this week. Thanks again Bennet!
Go enjoy the music!
BTW: For a nice discussion of theoretical limits and bitdepth resolution, check out John Siau's post this week at Mark Waldrep's blog. Remember, consider the discussion items in context and think about how "real world" (or simply idealistic!) some of the examples are...