Archimago's Musings: Further Explorations into "Intersample Overs" - Resampling/Downsampling & De-Clip by Charles King

Saturday, 10 October 2020

Further Explorations into "Intersample Overs" - Resampling/Downsampling & De-Clip by Charles King

Greetings everyone. It's great to interact with some of you over the years around contents I've posted in this blog. As you perhaps know, recently, I talked about resampling hi-res audio files in my article on "Post-Hi-Res" with the idea that the vast majority of albums we download as supposedly high-resolution content simply do not warrant the file size or bitrate. As such... I routinely just bring them back to 16/48 or 16/44.1.

Here's an interesting comment by Charles King on this and his explorations of the topic:

---------------------

Hi Archimago,

I was a bit taken aback on reading your 25 July post in which you talked about a need to guard against intersample overs when downsampling hires files. I've collected quite a few albums in hires over the years, often to check if I could hear any difference (I can't, and have given up on that) or to see if they provided better mastering (occasionally true, though in some notable cases the mastering is audibly worse). Since I don't want to litter my long-term storage with gigabytes of useless data I end up downsampling these to 16/48 in Adobe Audition (which is rated as having one of the better resamplers) and then compressing to variable-rate AAC (which is transparent to me).

I'd always assumed the whole issue with true peaks and intersample overs was just an issue with interpolated samples produced by upsampling and I didn't need to worry about it, so your post made me worried. If you look at the artificial signals commonly used to demonstrate intersample overs these all seem fine when downsampled. For instance the one from bennetng here produces a sample peak of +4.61dB when upsampled to 88200Hz, but only -7.70dB on downsampling to 22050Hz. Likewise the corresponding peaks from the example Rescator posted here are +12.22dB and -6.14dB. What I hadn't considered is that both these signals are produced using phase-shifted high-frequency tones which are filtered out when downsampling, thus removing the problem. When I tried the same using high-intensity white noise I found that downsampling did indeed generate sample peaks over 0dB. It appeared that the culprit was the filtering process itself, as I was able to recreate the sample overloading by using any form of reasonably steep low-pass filter.

So I decided to do some tests using real music. Luckily my collection has been analysed in JRiver, so it was fairly easy to export the data as XML and search for the highest true peak value (though the dBTP values reported by JRiver seem to be inflated). I found one track with a true peak of +2.86dB, the David Aude remix of Coldplay's "Charlie Brown" (I'll insert a pause here in case you want to insert any jokes about Coldplay not really being music ...) [Ed: no problem man! I consider musical choice absolutely a subjective matter and your affair :-]. This is a 44100Hz file, and downsampling it to 22050Hz in iZotope RX7 using the filter settings you showed results in a sample peak of +2.16dB. I was only able to get a reasonable result with a prior gain reduction of -2.3dB. RX7 reports a true peak of +2.86dB for this file, but obviously you want to use the smallest gain reduction possible. Zeroing in on the optimal gain took a bit of trial-and-error, which would be a PITA if you had to do it on a routine basis. Then I realised that the RX7 resampler has a built-in limiter which is specifically designed to prevent this problem, so I decided to do some comparisons. Since I was looking for a workflow that didn't require intervention I also tested using RX7's DeClip module (DeClip with threshold at 0dB and no make-up gain, then normalise to 0dB, then resample).

Here are the amplitude statistics: Original file, resampled in Audition (quality 100% with pre/post filtering), resampled in RX7 with no limiter, post limiter checked, gain of -2.3dB (and no limiter), and the DeClip steps described above. I measured these in both RX7 and Audition to check, but I think the measures from RX7 are more reliable (Audition's true peak measures are sometimes a bit suspect). I didn't bother using crude measures like DR.

	Loudness (LUFS)	Loudness Range (LU)	True Peak (dBFS)	Sample Peak (dBFS)	Max Momentary (LUFS)	Max Short-term (LUFS)	Dynamic Range (dB)	Dynamic Range Used (dB)
RX7
Original	-5.5	9.4	2.86	0	-2.4	-3.2
Audition	-5.6	9.6	2.45	2.23	-2.7	-3.4
No Limiter	-5.6	9.6	2.43	2.16	-2.7	-3.4
Post Limiter	-6.0	9.3	-0.01	-0.01	-3.4	-4.0
Gain -2.3dB	-7.9	9.6	0.13	-0.14	-5.0	-5.7
DeClip	-11.6	9.4	0.03	0	-8.6	-9.3
Audition
Original	-5.60		2.78	0			84.29	39.3
Audition	-5.70		2.23	2.23			84.28	39.3
No Limiter	-5.70		2.16	2.16			84.25	39.2
Post Limiter	-6.09		0.46	-0.01			89.51	38.85
Gain -2.3dB	-8.00		-0.14	-0.14			84.25	39.2
DeClip	-11.72		0	0			71.96	39.00

Interestingly, there's a slight increase in the loudness range when resampled without a limiter, though the [Max Momentary] - [Integrated Loudness] value does show a small reduction (3.1LU in the original vs 2.9LU for no limiting and 2.6LU when limited). RX7 does report a small true peak overload for the Gain -2.3dB file, though Audition doesn't.

Since I'm a fan of sample-peeking (I find you can often identify problems just by looking at the waveform) I constructed some comparison images showing a couple of sections of the track with the samples aligned. Original waveform (44.1kHz) is grey, resampled waveform (22.05kHz) is blue:

Section 1:

Section 1: Adobe Audition

Section 1: iZotope RX 7 No Limiter

Section 1: iZotope RX 7 with Post Limiter

Section 1: iZotope RX 7 with gain -2.3dB

Section 1: iZotope RX 7 with DeClip

Section 2:

Section 2: Adobe Audition

Section 2: iZotope RX 7 No Limiter

Section 2: iZotope RX 7 with Post Limiter

Section 2: iZotope RX 7 with Gain -2.3dB

Section 2: iZotope RX 7 with DeClip

While I had high hopes for the workflow using DeClipping, it looks like it's a bust, yielding aberrant leaks that lead to excessive gain reduction. Overall, I think the post-limiter built into the RX7 resampler does very well. While you might get a more precisely perfect result by applying gain reduction, this requires careful calibration that's frankly more trouble than it's worth: too little is useless as you'll still get clipping, and too much will result in losing resolution on converting to 16bits. So, while I understand a reluctance concerning limiters (since reckless limiting is what causes this problem in the first place), I think it's a useful tool here and I'll be leaving it on in future.

Charles King

--------------------

Addendum:

There’s an important point that I only realized after writing the email, though. If you use volume levelling (through ReplayGain or some other mechanism) that operates in the digital domain then the problem of intersample overs causing distortion in the DAC goes away automatically. Highly compressed tracks that are more likely to have a true peak over 0dB will also have a high loudness and thus will be subject to negative gain in your playback system (and this will happen without resolution loss as a decent player does its DSP at a high bit depth). So by the time the signal hits your DAC it’s already been scaled down sufficiently that intersample overs won’t be a problem. The best state-of-the-art DACs manage 21 bits of DR, so you can drop a 16-bit signal by 30dB ((21-16)*6) before you run into the noise floor.

Just in case anyone's interested in why the DeClip workflow caused so much gain reduction, I prepared another image showing one of the errant peaks it produced. Original audio waveform is dark green, DeClipped, normalised and resampled waveform is lighter blue:

iZotope RX 7 DeClip errant peak.

The RX7 DeClipper was voted the best in a blind comparison test last year, but clearly there are problems, which are probably innate across all such algorithms.

Charles

--------------------

Thanks Charles for all the work! IMO, all that time you've spent to investigate and confirm the potential for overloading while downsampling definitely deserves to be published for access by others who might be wondering about the same thing.

Your work indicates to us that the Post Limiting feature in iZotope RX 7 functions well with really no practical concern and I will likely start turning the setting on in my resampling as well. Also interesting to see the DeClip function effect; I've limited my use of it for the few albums I have that are clearly severely clipped so even if errant peaks are created, it would be in the context of "impaired" sounding albums already.

I know... For many readers, this article and that "Overload!!!" top graphic with Marty McFly being blown away by Doc Brown's speakers is more than a bit dramatic (classic 1985 Back To The Future scene). What we're talking about here belongs to the OCD "perfectionist audio" subculture of technical audiophiles ;-). We're talking about the potential clipping of probably a handful of samples here and there on loud tracks when even for standard CD, there are 44,100 samples being converted per second.

However, there is also a message here worth considering. This is that digital audio (including hi-res) is just about DATA. Data can be manipulated by digital processing whether through DSP "correction" filters, volume leveling, or resampling... There's nothing magical about any of this. When one understands the process involved, one can anticipate what these changes might mean audibly and can explore implications of such changes in the frequency and time domains.

These days on audiophile forums, I continue to see individuals expressing concerns about lossless compression causing sonic differences (highly highly unlikely even years ago!!!) or that even the mere copying of bit-perfect files will result in different-sounding versions (here's a rather depressing article from Cookie Marenco as recent as 2017 that deserves an official retraction)! Seriously folks, if you know what you're doing, and you're using modern high resolution DACs, there is simply no difference beyond what's encoded in those bits.

BTW, for those interested in even more technical discussions on dynamic range calculations and compression with good background in stats, check out this paper by blog reader Pietro Coretto from Universita di Salerno in Italy - "Nonparametric Estimation of the Dynamic Range of Music Signals" (available as a PDF preprint on Arxiv). Very interesting Section 6 on comparing their MeSDR (Median Stochastic DR) calculation with typical TT-DR and in Section 7 they looked at different masterings of The Wall's "In The Flesh?" comparing MFSL and EMI releases. Great to see the academic work behind some of these things we talk about as audiophile consumers...

When Charles previewed this article, he also suggested another link to check out: 'Dynamic Range' & The Loudness War. Excellent article discussing not only the history, nature of dynamics (peak levels, crest factor...), and actual examples of what's happening over time. Excellent read.

As usual. I wish you all safety and enjoyment of music... Despite curtailment of social circles this year, I'd like to wish a happy Thanksgiving 2020 to fellow Canadians this weekend.

17 comments:

Danny11 October 2020 at 00:15
Cookie Marenco claims that her conclusions result from double blind listening tests, and they seem to be consistent over time and with multiple releases. Comment?
ReplyDelete
Replies
Anonymous11 October 2020 at 00:20
Hi Charles,

This article is a great example of "when in doubt, try it yourself". No expensive hardware or software (free resamplers, players and plugins are widely available) is required for doing these experiments.

As for the comment about ReplayGain and DNR of good DACs these days, some audiophiles often worry about "bit-perfectness" or losing DNR due to the use of software volume adjustments. This kind of thinking actually encourages the so-called "loudness war". No loudness normalization means consumers need to adjust volume manually when playing different tracks. Not very practical and can even be dangerous in some listening scenarios (e.g. driving). Practioners of loudness war take advantage of this and apply severe compression and limiting to "win" the competition. Prevention of intersample overload for these inherently distorted tracks have little meaning. Of course, the wimpy output of some portable devices, and the way of listening to music (in noisy public places) may also be courses of these kinds of mastering practices.

With a reference level of for example -18LUFS target playback loudness, the worst offenders of loudness war may be played with an attenuation of about 12-15 dBs and this won't even degrade a not so state-of-the-art DAC. Also, for highly dynamic music, with appropriate settings, ReplayGain will look for the peak level of audio files to avoid clipping. In case you disagree with the loudness algorithm, you can also manually edit the stored gain level of individual tracks.
ReplyDelete
Replies
Anonymous11 October 2020 at 02:26
Although I find this article by Charles highly interesting to read, I have to admit that my interest in hi-res files, such as SACD, DVD-A etc. is fast declining. That goes for upsampling too. I have tried to upsample some of the music I really like, but in the rare cases there's an audible benefit, it is so tiny that it isn't even worth the time doing it. An example of a bad hi-res file is a MOFI edition of Santana's "Caravanserai", an album I have loved since the day I bought it 45 years ago. There's no clarity, details are drowned, there's no slam, hardly any bass or treble and so on. That's a real shame IMO, because the album is a masterpiece music-wise, and the redbook is just as good/bad as any remasters I've ever heard. Perhaps the LP is a bit better than any digital file, I am not sure.

As such, I have stopped entirely worrying about resolution, because when it comes down to it, all that matters is the skills of the sound engineers. In my collection, there's more than one example of a plain 16/44 CD which sounds clearly better, cleaner and more detailed than 99% of my hi-res files/discs, BluRays included.

I am not letting it spoil my day, but sometimes I do feel deeply depressed that so many great recordings have been utterly destroyed by some half-deaf nitwit behind the mixer console. Most recordings aren't that bad, but still not as good as they could have been.

I am too lazy to downsample the relatively few hi-res files I have, and HDD space is cheap, but if there were an audible benefit to it, I might try it.

Thx :)
ReplyDelete
Replies
VladHV12 October 2020 at 04:12
Hi Archimago
I am interested in your opinion about the further release of space on HDD using conversion to AAC, because it is lossy digital audio compression.
ReplyDelete
Replies
Anonymous12 October 2020 at 09:10
I have used Sony Sound Forge from version 10? and now up to 14 now that they are owned by Magix. I have always found it easy to use, but have tempered what I think it can do by the price of generally $59 even with upgrades. I have not tried the Pro versions yet, but I may this year.

My issues are that I have no idea how accurate the metering is in the software so always insure that I am at least -3db down as I do want to turn on the last bit. I do always record in the highest bitrate I can, which is 24/192 or at least 24/96, but never in my computers, but using Tascam SDHC card recorders. I only trust computers for mastering and editing. I lost an entire concert once, so never again.

I have never messed with upsampling once I took a redbook file and recorded it in my Tascam DR680 MK2 at both 2496 and 24192 and I could not hear an improvement. If I was archiving some music I would save it at 24/192, but not for just listening. I would leave the redbook files alone and worry about getting a better playback DAC.

I am not sure if the normalizing process is destructive or transparent, but I never normalize over -1db as again the metering accuracy might be an issue. I do wish the program would normalize each channel independently as I often have to go back and deal with the overall loudness of each channel individually on occasion. Channel imbalances do bother me and can affect the stereo perspective.

Most of what I do are WAV files, but most of the downloads I do buy are FLAC and sound excellent to me as I buy most of those in 24/96. I have not experienced any overs in any of my downloads. I am not a Mac person so no comments on their file format.
ReplyDelete
Replies
Unknown24 October 2020 at 09:57
Hi Archi.
Adobe Audition: "which is rated as having one of the better resamplers"
Do you have any sources for this?
Better over which one?

Cheers mate!
ReplyDelete
Replies
darrenyeats16 December 2020 at 04:22
I modified LMS so that volume control is done by SoX. The full workflow is
- Bit shift down (= ~6dB)
- Re-sample (in my case, to 110.6kHz which the natural rate of my DAC)
- Volume adjustment, taking into account the bit shift already done
- Dither added at 24 bits (in my case, the bit depth of my DAC chipset)

By doing the bit shift before re-sampling I can avoid inter-sample overs, but obviously a bit shift maintains bit perfection.

Reducing volume by a whole ~6dB is not wasteful in my case, because I'm using digital volume control anyway.

But this still might be of general interest. I use 'gain -6.02059991327962390427' in Sox for a bit shift. I've tested it by shifting up and down, and comparing checksums - it worked.
ReplyDelete
Replies

Add comment