Saturday 8 June 2019

GUEST POST: Why We Should Use Software Volume Control / Management by Bennet Ng. (Plus discussions on resampling options, true peaks, etc...)


I received an invitation from Archimago to write something about volume control. While I am of the opinion that digital and analog volume controls can coexist to achieve an ideal gain stage, this article is mainly about PCM digital volume control.

The basic conclusion we can say regarding digital volume control is that as long as the playback device has higher bit-depth than the file source, it is possible to losslessly reduce the volume of a file until the playback device's bit-depth limit is reached. For example, with an ideal 24-bit device, it is possible to playback a 16-bit file 48dB lower without losing quality, because one bit has about 6dB of dynamic range (the exact formula of bitdepth and dynamic range is 6.02*n-bits + 1.76).

No DAC is ideal however. Remember that state of the art DACs these days actually "only" have about 21 bits of dynamic range at analog output. However, even within the digital domain, applying 48dB attenuation to a 16-bit signal to 24-bit output directly will result in some error. For example, here are some results of an original 1kHz sine wave, 16-bit (green) with -48dB gain using foobar2000, Adobe Audition and Reaper, directly converted to 24-bit output:

Comparison of -48dB undithered gain (ie. volume reduction) using foobar, Adobe Audition, and Reaper compared to 16-bit dithered version.
As you can see, all files with -48dB applied exhibit artifacts after volume reduction. While artifacts at these levels will be masked by analog noise of DACs, and inaudible anyway unless we filter the main 1kHz tone and amplify the residue noise by 50dB or so, such an operation cannot be called lossless.

Of course, such artifacts can be avoided by using dither. foobar2000 does not support dithering at 24-bits, this is understandable as it is an audio player rather than a production tool. In the case of DAW software, both Audition and Reaper, and even Audacity can internally convert the 16-bit signal to a higher bitdepth (like 32-bits) then support 24-bit dithered output.

-48dB attenuation to 16-bit signal. Output from Audition and Reaper at 24-bits with dither.
If we are really picky, we can see that the yellow noise level is slightly higher than the red one suggesting that the amount of dithering added to the signal is slightly higher in one program over the other; this small difference of course does not necessarily indicate better or worse quality. Dithering is a process that adds randomization and how this is done varies depending on the software. By definition, digital volume control is not "bit-perfect" once we add dithering which results in a cleaner signal as seen above. Some of you probably have seen ESS's presentation on digital volume control in 2011. Here is also a white paper from an old version of Pro Tools in 2005 which is much more detailed than ESS's presentation.

So bennetng, do you mean digital volume controls can never achieve perfection and should never be used?

No, in fact it is the opposite. People who refuse to use digital volume control can actually be experiencing lower fidelity and higher distortion, not only on their systems, but such a mindset can even affect how recorded music is produced and distributed.

Archimago talked about intersample overload last year with a "malicious" test signal at +8dBTP (dB True Peak). iZotope has published a "proof" that intersample peaks can be arbitrarily high. However, these are extreme test signals, so why bother?

Want to have a look at the "true peaks" of your own audio files? While there are other software capable of this task, let's do it for free with foobar2000. It has two bundled resamplers: SSRC and PPHS. SSRC has a long history and can be traced back to the pre-2000 Winamp era. PPHS started to appear at 2004. However, my favourite resampler is the SoX-based foo_dsp_resampler because it is very fast. Below are screenshots showing how to enable true peak scanning.

foobar Advanced settings.
Turn on true peak scanning.
foobar ReplayGain playlist and scan.
Here are some benchmarks of ReplayGain scanning of about 10 hours worth of music:
No upsampling:
  0:17 (seconds
2x upsampling:
     SoX normal 0:31
     SoX best 0:43
     SSRC 3:15
     PPHS 1:17
     PPHS ultra 9:36
Intuitively, some may suspect a super fast resampler like SoX "must" be of lower quality and unreliable.

We can have a look on Infinite Wave, a well-known (but not frequently updated) SRC benchmarking website, which uses single tone sweep tests for comparison. If you select the SoX graphs, you will see that the performance is in fact excellent compared to many other resamplers.

RightMark Audio Analyser's IMD swept frequency test is even more complex and can be used as an excellent indicator of SRC quality. The underlying signal looks like this:


Compared to the single tone sweep of Infinite Wave, we can expect "double trouble" if a resampler's quality is bad!

So, here is an experiment. Let's take a 32-bit float 44.1kHz RMAA test signal, upsample it to 76543Hz (prime number, ensuring the impossibility of integer resampling) and downsample it to 44.1kHz again. (SSRC is disqualified here since it does not support 76543Hz.)

An atypical upsampling experiment - conversion to 76.543kHz in foobar...
Check out the results comparing the original signal with PPHS and SoX resampling:


As we can see, the original signal and SoX (best setting) are basically overlapped in the IMD sweep graph, resulting in the same numerical amount of distortion (0.00001%). It means the distortion levels are below RMAA's measurable limit. SoX (normal) is not only much faster, it also has much lower distortion than PPHS. These results are impossible to differentiate in analog loopback tests.

Let's give SSRC a chance with a 44.1k --> 96k --> 44.1kHz test. Apart from some tiny differences above 19.5kHz due to the 95% bandwidth setting in SoX, SSRC and SoX (best)'s differences are below RMAA's measurable limit. Again, the great benefit is that SoX (best) remains much faster than SSRC  - ~4.5x the speed! A nice example of the importance of software implementation.

Comparison of SSRC with SoX (44.1 --> 96 --> 44.1kHz resampling).
Another advantage of the SoX plugin is flexibility. While SSRC's quality is high, it has a very steep (over 99%) fixed bandwidth which tends to over-read intersample peak values for typical DACs, if the audio files have such extreme frequency contents.

ReplayGain results - RHCP's Californication & Metallica's Death Magnetic.
Here are some results of two "loudness war" albums. While the dBTP values are not particularly high (+1.53dBTP in Californication and +2.29dBTP in Death Magnetic), the suggested Album Gain are -12.49dB and -13.86dB. foobar2000 uses -18LUFS as reference, that means these two albums have integrated loudness at 18 - 12.49 = -5.51LUFS and 18 - 13.86 = -4.14LUFS respectively.

Now take a look at this Gradius Ultimate Collection album:


Example Gradius soundtrack True Peak.
While the suggested Album Gain is much lower, a lot of tracks have more than +3.5dBTP, with the highest one at +5.08dBTP! I checked the file with Adobe Audition and found the interpolated waveform indeed has a similar peak value. I also used CUETools to make sure the tracks are accurately ripped. Also, none of the tracks in this album have high amounts of baked-in, consecutively clipped samples and I cannot hear any inherent distortion in this album. That means intersample overs are not necessarily related to loudness war and poor mastering. Of course, I cannot hear any "ringing" as well, I am not a child or teenager.

It is necessary to understand that the intersample headroom in a DAC can also be considered as a form of digital volume control. Aftermarket DACs and digital audio devices with built-in SRCs (samplerate conversion) exhibit differences in filter characteristics and clipping thresholds of these intersample peaks (remember that there are also "special" DACs that can be NOS, may implement atypical filters like MQA, or advertise unique characteristics like Chord's million tap filter). Some may oversample with a fixed amount of attenuation to provide some intersample overhead [Ed: such as the -4dB I added to the "Goldilocks" setting], but this could unnecessarily reduce dynamic range and would be unfair to music genres like chamber and new age as they are less likely to ever exhibit intersample clipping.

Unless we are using a standalone device like a disc player, in this era with computing power easily accessible, we usually have other means to adjust digital volume to avoid clipping before audio data reaches the DAC. For example, many software players offer DSP features like upsampling and EQ in the floating point domain and will generate real, non-intersample data above 0dBFS which are addressed with clip protection. Realize that even if we don't use any DSP features, some forms of clipping may still show up despite using digital volume control and/or having intersample headroom in the DAC. Let's take a look at one possible example...

Take a look at this track:

"Hey, Guys!" from A Little Snow Fairy Sugar soundtrack.
The algorithm thinks it is not as loud as Californication and Death Magnetic (track gain only -10.6dB), but with a slightly higher +2.51dBTP. I then ran the file through some lossy codecs. As we can see there is almost no difference in track gain but there are some deviations in dBTP. In the case of Opus, it resamples to 48kHz internally. To make sure the built-in SRC is not the sole reason of higher dBTP I also ran the file through SoX to bypass the built-in SRC. With true peak scan disabled (no upsampling), the dBTP label is misleading as it is now dBFS.

However, apart from the original FLAC, all lossy files still have positive peak values! What does it mean? The peaks are no longer "inter" sample, they are real sample data, not estimated. I converted the lossy files to 32-bit float and examined them with Adobe Audition. As we can see, the sample points are indeed beyond 0dBFS.


Modern lossy codecs are based on complex algorithms involving floating point math. However, everything outside of the playback software uses fixed-point (integer) formats, either in 24 or 32-bit. Unless we have a specialized playback software + specialized device driver + specialized hardware controller + specialized DAC chip, every peak with positive value will be clipped or dynamic compressed at some point after leaving the playback software. To avoid this, either use volume management systems within the playback software like ReplayGain, and/or use the built-in volume control of playback software, with protocols like WASAPI exclusive or ASIO to avoid Windows mixer hijacking the playback software's volume control. ReplayGain is always internal and cannot be hijacked. Even the non-audio specialized MPC-HC encourages internal volume attenuation. In the era of commercial streaming music services, there is really no excuse for the audio software/apps to not implement a floating point compliant processing pipeline when preparing masters and during playback.

[Ed: This is an important point and worth considering when thinking about the sound quality of streaming services... Even if lossy bitrate is the same, one can wonder about the quality of the encoder used, and whether the playback app handles peaks properly as Bennet suggested above.]

MPC-HC warning...
Yes, I know what you are thinking. "I don't use lossy formats and don't care about it, right?!" Take a look at these links:
iZotope RX - SRC and peak levels 
Scroll to bottom:
"it is often recommended to keep your limiter ceiling at -0.5 or -1 dB to prevent clipping from a potential .mp3 or AAC conversion later"
AES Paper: 0dBFS+ Levels in Digital Mastering 
Page 11:
"Mastering against -3dBFS"
Spotify for Artists FAQ  
"If your master is louder than -14 dB integrated LUFS, make sure it stays below -2 dB TP (True Peak) max to avoid extra distortion."

To see what will happen I am going to use Spotify's suggestion and this song as an example:

Katamari Mambo.


18 - 6.63 = -11.37LUFS can hardly be called loudness war if we compare it with Californication and Death Magnetic. However, it is louder than -14LUFS, so it should be reduced to -2dBTP. We can immediately notice the -2dBTP version has a lower bitrate despite using the same FLAC codec and compression settings. Of course, FLAC is an integer format, reducing volume without padding to a higher bit-depth means throwing away data, even when dithering is applied. In this case, the file is reduced by -3.93dBFS to achieve -2dBTP, let's copy and paste the following line in Google:

10^(-3.93/20)*2^15

That means the available 16-bit values (-32768 to 32767) in the "lossless" master are reduced to +/-20843 to make it compatible with people who don't use software volume control/management. Would it then make sense to release this as a 24-bit version to preserve the lower bits and selling it at a higher price with bloated file size?! Profitable plan perhaps!


To see if I have something else mastered in a similar way, here are some real examples in my collection:


Whether this phenomenon is a direct consequence of "fear of clipping" or not, translating potential clipping to absolute clipping, or wasting integer headroom permanently to accommodate the file level without improving perceived dynamic range is never a good thing.

To end this lengthy rant in a happier way, let's appreciate how elegant a floating-point data format is. Here is a result of the volume fade from -150dBFS to -infinity, stored as 32-bit float using WavPack:

32-bit floating point fade from -150dBFS.
Here is the same thing stored as 32-bit fixed integer, with dither:

32-bit integer fade from -150dBFS with dither.
...and without dithering:

32-bit integer fade from -150dBFS without dither.

Yes, with the formula 10^(-150/20)*2^31, there are "only" +/-68 quantization steps in 32-bit integer files from -150dBFS. All good modern software players need to adjust volume internally at least in 32-bit float, if not 64, and translate to 24 or 32-bit integer at the end of the processing pipeline, when it needs to communicate with the device driver.

-----------------------

Archimago's Postscript:

Thanks Bennet! I was not expecting that :-).

Quite the whirlwind of experiments in volume control, trials of resamplers, discussions on bit-depth, multi-format lossy encoding differences, intersample overloading / "true peaks" detection and the differences between fixed-point/integer and floating point math and how this affects signal resolution! My sense is that if you, dear readers, can follow all of what Bennet ran us through above, you're way ahead of the vast majority of audiophiles out there (and I bet also ahead in the understanding of  how digital audio works compared to the majority of audiophile reviewers :-).

Over the years, I have been asked by a number of readers and on forums about digital volume control. I know some audiophiles maintain a bias against digital volume manipulation (just like some complain about any kind of DSP being applied!). I agree with Bennet... Months ago, I spoke of the benefits of 24-bit high-resolution DACs and mentioned one of them being the ability to use digital volume control in a perceptually "lossless" fashion which is reiterated here in even greater detail with other nuances to keep in mind like the benefits of floating point processing pipelines in playback software.

Remember that these days, good high-resolution DACs are plentiful. We can measure excellent resolution with low noise floor beyond 16-bits quite easily. And when we look at modern computer software players and the high DSP precision available (JRiver 64-bits, Roon 64-bits, Audirvana licences 64-bit iZotope algorithms, HQPlayer 64/80-bits...), for all the calculations done internally before being reduced to 24/32-bits integer to feed the DAC, unless there's some kind of software bug or driver issues, one can be quite confident that there's really nothing to be worried about with reputable programs like these. Note that I'm not trivializing this by saying that any software that claims to process with 32-bits or 64-bits "must" be good. There are many technical details that need to be considered (as discussed in the ProTools 48-bit Mixer white paper Bennet linked to above). Much still rests with the implementation of the software just like the differences we saw above with resampling using SoX compared to the others.

One final and essential point. I assume we are all human beings reading this article :-). As a "mere mortal", just how many dB's of dynamic range do we think we "need" anyway to experience the pleasures of "high fidelity" audio? Mitch Barnett (aka mitchco on Computer Audiophile / Audiophile Style) published an experiment on himself back in 2013 where he demonstrated his audibility threshold at around -70dB using some dynamic rock tracks. I tried a similar test awhile back and I scored about the same. I highly recommend all "golden ears" to try the experiment themselves. You might be humbled at just how many "bits" of resolution you really can appreciate. In fact, peripheral evidence such as the fact that so many audiophiles find the limitations of vinyl playback to not be an issue adds to this realization that maybe we don't need "that many bits" after all. Remember that only some of the very best LP's can even surpass 70dB worth of dynamic range (70dB can be encoded in "only" 12-bits digitally).

Suppose we hooked a device like a 9 year old Squeezebox Touch to an amp that's set to play extremely loudly and we had to use the Touch's digital volume control to get the sound to a comfortable level. Let's give ourselves some margin and say 80dB is all the dynamic range we really need. This means that even a device like the old Squeezebox Touch, measured to be capable of ~105dB dynamic range can accommodate 25dB of digital attenuation before I suspect many of us would notice a substantial quality loss. (Recently, I received a PM asking about the SB Touch in fact...) A higher resolution DAC with even lower noise floor will allow one to attenuate even further while maintaining excellent quality. I see Resonessence published a similar discussion and comparison with analogue volume control awhile back.

Of course, I'm not recommending that audiophiles should compromise on sound quality; by all means, try to maintain >100dB potential resolution through the whole hi-fi system! I'm just suggesting that we remain realistic about what we "need" without going extremely overboard with claims. As I discussed before, technical "transparency" is what I seek as an audiophile aiming for high-fidelity from my audio system(s). As humans, we might be surprised that given our inherent biological limitations, transparency can be achieved with DACs and even amplifiers without great difficulty these days.

As per Bennet's article, be aware of the limitations of 24-bit integer vs. 32+-bit floating point. Digital volume processing (like ReplayGain) can be a very good thing in the playback pipeline. Know that there are nuances like "true peaks" and "intersample overs" in the world of digital production and reproduction that could worsen playback quality if unaccounted. But remember that there are things even more important like the distortions of our transducers (speakers and headphones), comparatively lower-than-hi-res-DAC resolution from our amplifiers, limitations of room acoustics, and of course, a reasonable level of humility around possessing "golden ears"...

Enough for this week. Thanks again Bennet for the excellent work!

Go enjoy the music!

BTW: For a nice discussion of theoretical limits and bitdepth resolution, check out John Siau's post this week at Mark Waldrep's blog. Remember, consider the discussion items in context and think about how "real world" (or simply idealistic!) some of the examples are...

16 comments:

  1. Hi Archimago,

    Thanks for your invitation and proofreading. In fact, it is the longest English article I ever wrote. As a software volume control/management user for 13 years I never clip a single file when doing serious listening tests.

    Maybe Bob Stuart should publish a study about audibility of (a)typical filters in Gradius music :-)

    ReplyDelete
    Replies
    1. Thanks again Bennet!

      You did a marvelous job for the longest English article you've ever written :-). It is great to see the experiments and thought that went into this and just the fund of knowledge "out there" among the audiophiles!

      Take care and looking forward to even more discussions and your thoughts ahead...

      Delete
  2. Great article although I didn't understand everything at first read. :-)

    One of my doubt with modern DACs is digital volume control. I tend to prefer ADI-2 DAC as my future DAC purchase but I have doubts about its volume control. As far as I know it has special hybrid volume control with hardware reference levels but my main source is CD. I don't want to loose too much bit from 16. RME's user manual says nothing about upconverting of the 16 bit source so I don't know if its volume control manipulates the original 16 bit signal or it upconverts to higher bitrates first. I would prefer the second approach. Folks on RME forum sometimes use passive attenuator at the DAC's output if output level too low. But it highly depends on the power amp's input sensitivity also.

    ReplyDelete
    Replies
    1. Hi and thanks Sipi,

      The specs say this:

      https://www.rme-audio.de/en/products/adi_2-dac.php#9
      Output level switchable +19 dBu, +13 dBu, +7 dBu, +1 dBu @ 0 dBFS

      IMO these 4 levels are cleverly selected and will be compatible with most amps without using too much digital attenuation.

      As for the worries about losing bits... RME is not that dumb, their hardware volume control is safe to use for playing 16-bit lossless CD content. Just make sure the playback software is correctly configured, for example, use their native ASIO driver as output.

      Delete
    2. Hi Sipi,
      Yup, Bennet is right. IMO, the hybrid approach RME uses in my experience with the ADI-2 Pro is excellent. It was designed to maintain as high a dynamic range as possible for high resolution for a wide output level. 16-bit playback would be trivial with this system and you will not lose any resolution. Not sure why anyone would use a passive attenuator rather than drop the output level further.

      BTW, this hybrid system is fantastic for measurements on the ADC side as it maintains high SNR whether the input level is +4dBu (~1.2Vrms) or +24dBu (~12.3Vrms). By using the analogue setting, one can maintain a relatively uniform SNR and "look deeper" into the signal at the lower settings when the input signal is soft.

      Delete
  3. This sort of thinking enters "angels dancing on the head of a pin" territory fairly quickly.

    Between here and Audio Science Review, it is plain that a lot of "audiophiles" spend way too much time straining to discover differences at the cost of listening to the damned music! I applaud your work in this regard, because it assists the understanding that listening to gear is not listening to music. Which was kinda the point to begin with.

    ReplyDelete
    Replies
    1. Thanks jsrtheta,

      You're right. Listen to some good music, stop worrying about this, worrying about that!

      Delete
    2. Well said jsrtheta,
      The resolution we have these days allows us to talk about things which are very much esoteric and of interest really only to the "hard core" audiophiles and audio DIY guys who want to squeeze the most out of the soft/hardware playback systems.

      It speaks to the maturity of the technology and I do hope that in discussing this, there is even less "mystery" that some audiophiles feel needs to be "solved".

      Delete
  4. Thanks for a very interesting and enlightening post. I admit to adjusting volume control the traditional way because I understood that to do so digitally was 'wrong'. I guess the only thing wrong is if it adversely affects what you hear.

    As a thought, for me fidelity is a function of volume. I would never listen to music at 48dB less than a level that gave me a desired level of enjoyment. For a start, I'm pretty sure that transducers (e.g. speakers) don't respond linearly across input levels, and I'm also pretty sure that rooms don't too. Or to put it another way, if I'm listening at a level of dB a lot less than I normally would, I'm not listening for fidelity, it's background music at that point.

    Maybe I've raised a mute point.

    ReplyDelete
    Replies
    1. Well, this article is aimed at encouraging people to use software volume control and explain what is happening behind the scene. Readers themselves will make decisions so talking about right or wrong here is unimportant.

      The -48dB demo is just an illustration and proof of concept. If I don't use a high amount of digital attenuation, artifacts will be minimal on that screenshot.

      Delete
    2. Hey Unknown,
      It's certainly true that low SPL playback will not give us full appreciation of the nuances in the music... In the real world, our ambient noise levels at home will be something like 30dBA if we have a reasonably quiet sound room. Closed/noise reduction headphones will give us better isolation.

      Even with a good "reference" volume, of say 85dB peaks, that's "only" 55dB over ambient noise of 30dB SPL. This I think puts into perspective again just how much potential dynamic range we have at our disposal whether 16-bits or 24-bits digital (this in itself says something about whether in the real world 24-bit hi-res is likely or unlikely to be audible).

      Yeah, it's not about "right" or "wrong" when the margins are so wide! So long as one doesn't hear a problem... There is no problem :-).

      Nonetheless, it's best to be aware of and maintain optimal technique and settings when we can. Hence the suggestions and thoughts Bennet shared.

      Delete
  5. As I had said previously I'm playing (using both English meanings of the verb) with DSD.
    I couldn't believe that the volume of my Audiolab MDAC+ could work when the DAC is converting DSD (obtained on the fly by JRiver Media Center).
    But it does work! and I don't understand how.
    ESS doesn't provide any information on that. I have found a seminar held by them where they explained how the noise level obtained digitally can be better than the noise level obtained by most potentiometers.
    Anyway: it works! when I control the volume by the MDAC+ music has more contrast, more dynamics.

    ReplyDelete
    Replies
    1. Hi, Teodoro,

      You may see how John Siau describes a ESS DAC:

      https://www.audiosciencereview.com/forum/index.php?threads/review-and-measurements-of-benchmark-dac3.3545/post-90540

      So it has a lot of bits to enable DSD volume control perhaps?

      Delete
  6. This is very interesting!

    Somewhere far back in the murkiness of my mind, I have always felt that splitting the volume level between the source and the amplifier might be a better choice. That might be due to the old saying, that amplifiers generally sound their best with the volume knob at around 12 o'clock. That seems to not be the only reason this sounds better.

    I am no tech geek, but I just now adjusted my Oppo settings from Fixed output level to Variable, and then turned down the volume from 100 to 55, all the while turning up the amplifier volume accordingly. The CD sounds much better this way!

    Thank you :)

    ReplyDelete
  7. This comment has been removed by the author.

    ReplyDelete
  8. https://blog.szynalski.com/2009/11/an-audiophiles-look-at-the-audio-stack-in-windows-vista-and-7/

    Win7 has started to use 32-bit

    https://archimago.blogspot.com/2019/06/guest-post-why-we-should-use-software.html?m=1

    foobar2000 is even just 16 bit?

    ReplyDelete