Back in 2019 I wrote the article "Why We Should Use Software Volume Control / Management" which not only addressed about intersample overs, but also floating point induced clipping which cannot be solved by having intersample headroom in the DAC. (For a quick listening test, try the files in this thread on ASR). However it seems that some people are still not happy with the proposed solutions (e.g. ReplayGain) and resort to some other methods. For example in some of the recent blog comments someone mentioned about BitShiftGain.
Here is a snippet from the BitShiftGain website:
Digital audio is like some crystalline structure: it’s fragile, brittle, and suffers tiny fractures at the tiniest alterations. There’s almost nothing you can do in digital audio that’s not going to cause some damage. But as long as you stick to 6 dB steps and rigidly control the implementation (BitShiftGain doesn’t even store the audio in a temporary variable!), you can chip away at that least significant bit, and the whole minutes-or-hours-long crystalline structure of digital bits can remain perfectly intact above it.
Which suggests digital gain adjustments can only be performed in 6dB interval or quantization error will occur, which is incorrect, the volume steps in fact can be much smaller. I have prepared some tools to prove it, please download the archive below:
https://1drv.ms/u/c/3eddeece58efc1fc/EdDtxFZWNcZKvz0VvB6FcxABkuHexDUFO-kX4K_26XoSEw
Which contains 3 files: 6b8b.cmd, 16b24b.cmd and RG-snap.html.
The proof also involves the use of SoX command line tool. Most search engines will show the very dated version on SourceForge, but here is an ongoing discussion with some recent compiles:
https://hydrogenaudio.org/index.php/topic,128033.0.html
Put 6b8b.cmd in the same location as sox.exe and double-click to execute it, the commands will generate some noise-like test files. Here are some brief explanations about the commands:
sox -n -c2 -b8 "test.flac" synth 4 sin 0.125 remix 1v0.7 1i vol 0.95 dither -p6
Generates a reference test file, which is a 6-bit signal in 8-bit container. Pay attention that because this command involves dither, the resulting reference file will be generated differently every time.
sox "test.flac" "76percent.flac" -D vol 0.76
Sets the volume to 76% and saves as 8-bit flac, "-D" disables automatic dither.
sox "test.flac" "75percent.flac" -D vol 0.75
Same as above but using 75% volume.
Now listen to test.flac and compare it with 76percent.flac and 75percent.flac, we can clearly hear distortion in 76percent.flac but not in 75percent.flac.
We can see the distortion as an irregular-sized step in the low bit-depth file. For example, here's a portion cropped between 0.25-0.9s where we can see the distortion in "76percent.flac" subtly highlighted:
![]() |
Click to enlarge image. |
To understand why, open RG-snap.html and set the bit-depth to 2.
Web browsers offer a tool called Developer Console where users can perform some math operations using JavaScript. For example in Firefox it can be accessed by the F12 key on the keyboard.
![]() |
Attenuation bit-depth of 2 means there are 4 levels of attenuation - 25%, 50%, 75%, 100%. |
[a] A 6-bit signal can be represented in a range of -32 to 31 and -128 to 127 for 8-bit, therefore after a 2-bit shift (* 4) and a multiplier of 0.75, the result will be an exact 8-bit integer. On the other hand in [b], the result is not an 8-bit integer, and this explains the distortion at 76%.
[c] In fact, the math is even simpler if we do it programmatically instead of using SoX, just multiply 31 by 3.
[d] If we look at the item index (192), 3/4 is a simplified form of 192/256, so gain reduction of about 2.5dB only requires 2 bits.
[e] To undo the volume change programmatically, just divide it by 3.
[f] For SoX's "vol" command, undo the 2-bit shift by dividing the value by 4, then apply the undo multiplier. The rounding error looks disturbing, however, since the file format is integer, so the final result will be rounded to the nearest integer, which is the original value (31). The command is in the fourth line of 6b8b.cmd:
sox "75percent.flac" "undo.flac" -D vol 1.3333333
If we use foobar2000's Bit-compare tool to compare test.flac and undo.flac, the audio data are identical. It is recommended to use foobar2000 x64 instead of the x86 version because the x64 version supports up to 64-bit float precision in audio processing and bit-comparison, while the x86 version only supports up to 32-bit float. The x86 version's Bit-compare tool cannot reliably compare 32-bit integer (aka fixed-point) files. Another side note is FLAC started to support 32-bit integer since version 1.4 in 2022, but many older software and especially hardware still do not support 32-bit FLAC files.
Now let's set RG-snap's bit-depth to 8 and focus on indexes 080 and 081:
Index 081 is the entry closest to 10dB attenuation without introducing quantization error (Ed. 1V attenuation by -10dB is 0.316V or about 80/256 with 8-bit attenuation), which requires a shift of 8 bits, as well as index 080 which only requires 4 bits and produces smaller FLAC files. Put 16b24b.cmd in the same location as sox.exe, as well as a 16-bit FLAC file (can be anything from test signal to music, speech and so on with a duration of a few minutes) and rename it to "original.flac", then double click 16b24b.cmd to get the converted files. You can also try to modify the commands and create a drag-and-drop script, but it is out of the scope of this article.
The first line reduces "original.flac" by 10dB and saves as a 24-bit.
sox "original.flac" -b24 "sox10dB.flac" -D vol -10dB
The second line undo the volume change and reverts to 16-bit.
sox "sox10dB.flac" -b16 "sox10dB-undo.flac" -D vol 10dB
The remaining lines correspond to the adjustments in indexes 080 and 081:
sox "original.flac" -b24 "sox080.flac" -D vol 0.3125
sox "sox080.flac" -b16 "sox080-undo.flac" -D vol 3.2
sox "original.flac" -b24 "sox081.flac" -D vol 0.31640625
sox "sox081.flac" -b16 "sox081-undo.flac" -D vol 3.1604938
[g] is an example of a 16-bit value with index 081 applied while [h] is the same thing without involving floating point numbers, both operations result in the same value which can be exactly stored as 24-bit integer.
Again, in [i], with SoX's "vol" command, the undo operation will produce some error, but the final result will still be the original value (-32323) because the file format itself cannot store non-integer values.
[j] calculates the multiplier of 10dB attenuation using 64-bit float precision.
[k] 10dB attenuation however does not fit into 24-bit integer and will produce distortion in the converted 24-bit file.
[l] However, if we take the rounded (distorted) 24-bit value (-2616686) and convert it to 16-bit again with 10dB gain, the result will still be the original value after rounding (-32323). This is important because I've seen some people use the undo experiment as a claim that dB-aligned adjustments do not introduce quantization error, which is incorrect because the errors are in the higher bit-depth, volume reduced audio stream, and this stream is what we are sending to the DAC or other external hardware. Please refer to [a] and [b] above which I did using a much lower bit-depth (6/8-bit) experiment to demonstrate the distortion.
The values in RG-snap can also be applied using foobar2000's "Gain/Scale" effect:
https://www.foobar2000.org/components/view/foo_dsp_utility
Even though the input text box appears to be truncating and rounding the input strings, according to my own tests the resulting audio data are identical to what SoX produced except when using dB-aligned adjustments.
At this point it is pretty clear that because foobar2000's Bit-compare tool cannot be used to compare the original 16-bit file against the gain-reduced 24-bit file, another comparison tool is needed, it can be done using DeltaWave. For existing DeltaWave users, make sure it is using the default settings by choosing Edit > Reset all settings as these are the settings being used in this article.
These are the results using dB-aligned adjustments:
These are the results using RG-snap's near -10dB settings:
And last, a classic 2-bit shift (about -12dB):
For those who wonder about the meaning of terms like "Correlated Null Depth", I posted a question in ASR regarding some anomalies I found when comparing 24 and 32-bit file pairs, please scroll down and read the latest replies:
https://www.audiosciencereview.com/forum/index.php?threads/beta-test-deltawave-null-comparison-software.6633/post-2350684
Also, for those who want to try some 24/32-bit tests, the 32-bit files must be saved in integer format.
The proof ends here, hopefully it is easy enough to understand. Of course, for operations involving continuous gain adjustments like fades and pans, as well as operations requiring dB-aligned adjustments, dithering is still relevant.
Final thoughts:
The "6dB step" claim has been made many times and I've heard about this for more than a decade. Also, the purpose of this article is not to brag about the method used in RG-snap sounding "better". I myself don't care about 24-bit truncation during playback as I hear no difference. On the other hand I often find some marketing materials interesting if not overly dramatic! Some say they have a better hardware volume control while some others advertise intersample headroom.
For example, in Stereophile I found two DAC measurements, both of them are using ESS ES9028Pro DAC chips while one of them claims to have higher intersample headroom. When I look at the measurements, the one advertised as having more intersample headroom looks like this with a -90dBFS test signals:
...but the other product looks like this:
I am not going to reveal the names of the products here. However, since both products are using the same DAC chip but the one without advertising intersample headroom does not exhibit such low level distortions, it is reasonable to suspect the distortion is caused by a forced hardware attenuation to create the extra headroom. As mentioned, I cannot hear 24-bit truncation, but because the product in question is selling at 4-digit USD, it is reasonable to expect a better result.
---------------------
Hey there Bennet, thanks again for you exposition on digital volume controls using the test files, and the meticulous examples showing the underlying mathematics that go into the attenuation process to demonstrate distortion situations. Ultimately, the idea with modern digital attenuation using say 32-bit and 64-bit precision calculations, there's really nothing for the audiophile to worry about. And no need to be pedantic or fear anything other than the "6dB step"!
It's amazing all the advertising for products out there that likely will not add any special quality to audio playback. High quality audio volume control being an example of not-rocket-science certainly in the 2020's! Having said this, can anyone tell me what's so special about LEEDH Processing which is touted by LUMIN and also by Audirvana? 🤔
I asked Bennet to speak a little more about the RG-snap calculations, here's an extra bit from him that some readers, especially those into gaming, will find interesting:
The method described is by no means unique and definitely not limited to digital audio. For example, many old video game consoles like PlayStation 1/2, as well as Nintendo SNES, 64 and GameCube only supported very low resolutions like 256*224, 320*240 and 640*480. However many game console emulators are able to take advantage of modern hardware (faster CPU/GPU, more RAM...) and output higher resolution videos.
Now take SNES (256*224) as example, if we output the video directly on a 1920*1080 screen it will look very small. If we multiply the output by 4 (1024*896) the size will be bigger albeit still not filling the whole screen. However it doesn't mean we have to make the scale factor snapping to power of 2 (2, 4, 8...). For example if we have a WUXGA (1920*1200) screen we can use 5x output (1280*1120) to have a better fit. Take this images as an example:
Many fans of these vintage games often demand pixel accurate rendering and they don't like interpolating pixels or doing some fancy AI scaling. Using integer multipliers ensures the sprites look as sharp as possible at the expense of not filling up the screen completely. Actually, SNES (released in 1990) is intended to use with 4:3 CRT TVs with aspect ratio of about 1.333 but 256/224 is about 1.143, so we can even use 6x on horizontal but 5x on vertical resolution as 1536/1120 is about 1.371 which is closer to 4:3.
Happy August everyone. I see Bennet ended off with comments about DACs with different intersample headroom. Let's talk more about that next week and consider how much we "need" with actual music!
Enjoy the music, dear audiophiles...
No comments:
Post a Comment