Saturday 10 October 2020

Further Explorations into "Intersample Overs" - Resampling/Downsampling & De-Clip by Charles King


Greetings everyone. It's great to interact with some of you over the years around contents I've posted in this blog. As you perhaps know, recently, I talked about resampling hi-res audio files in my article on "Post-Hi-Res" with the idea that the vast majority of albums we download as supposedly high-resolution content simply do not warrant the file size or bitrate. As such... I routinely just bring them back to 16/48 or 16/44.1.

Here's an interesting comment by Charles King on this and his explorations of the topic:

---------------------

Hi Archimago,

I was a bit taken aback on reading your 25 July post in which you talked about a need to guard against intersample overs when downsampling hires files. I've collected quite a few albums in hires over the years, often to check if I could hear any difference (I can't, and have given up on that) or to see if they provided better mastering (occasionally true, though in some notable cases the mastering is audibly worse). Since I don't want to litter my long-term storage with gigabytes of useless data I end up downsampling these to 16/48 in Adobe Audition (which is rated as having one of the better resamplers) and then compressing to variable-rate AAC (which is transparent to me).

I'd always assumed the whole issue with true peaks and intersample overs was just an issue with interpolated samples produced by upsampling and I didn't need to worry about it, so your post made me worried. If you look at the artificial signals commonly used to demonstrate intersample overs these all seem fine when downsampled. For instance the one from bennetng here produces a sample peak of +4.61dB when upsampled to 88200Hz, but only -7.70dB on downsampling to 22050Hz. Likewise the corresponding peaks from the example Rescator posted here are +12.22dB and -6.14dB. What I hadn't considered is that both these signals are produced using phase-shifted high-frequency tones which are filtered out when downsampling, thus removing the problem. When I tried the same using high-intensity white noise I found that downsampling did indeed generate sample peaks over 0dB. It appeared that the culprit was the filtering process itself, as I was able to recreate the sample overloading by using any form of reasonably steep low-pass filter.

So I decided to do some tests using real music. Luckily my collection has been analysed in JRiver, so it was fairly easy to export the data as XML and search for the highest true peak value (though the dBTP values reported by JRiver seem to be inflated). I found one track with a true peak of +2.86dB, the David Aude remix of Coldplay's "Charlie Brown" (I'll insert a pause here in case you want to insert any jokes about Coldplay not really being music ...) [Ed: no problem man! I consider musical choice absolutely a subjective matter and your affair :-]. This is a 44100Hz file, and downsampling it to 22050Hz in iZotope RX7 using the filter settings you showed results in a sample peak of +2.16dB. I was only able to get a reasonable result with a prior gain reduction of -2.3dB. RX7 reports a true peak of +2.86dB for this file, but obviously you want to use the smallest gain reduction possible. Zeroing in on the optimal gain took a bit of trial-and-error, which would be a PITA if you had to do it on a routine basis. Then I realised that the RX7 resampler has a built-in limiter which is specifically designed to prevent this problem, so I decided to do some comparisons. Since I was looking for a workflow that didn't require intervention I also tested using RX7's DeClip module (DeClip with threshold at 0dB and no make-up gain, then normalise to 0dB, then resample).

Here are the amplitude statistics: Original file, resampled in Audition (quality 100% with pre/post filtering), resampled in RX7 with no limiter, post limiter checked, gain of -2.3dB (and no limiter), and the DeClip steps described above. I measured these in both RX7 and Audition to check, but I think the measures from RX7 are more reliable (Audition's true peak measures are sometimes a bit suspect). I didn't bother using crude measures like DR.


Loudness (LUFS)
Loudness Range (LU)
True Peak (dBFS)
Sample Peak (dBFS)
Max Momentary (LUFS)
Max Short-term (LUFS)
Dynamic Range (dB)
Dynamic Range Used
(dB)
RX7


Original
-5.59.42.860-2.4-3.2

Audition
-5.69.62.452.23-2.7-3.4

No Limiter
-5.69.62.432.16-2.7-3.4

Post Limiter
-6.09.3-0.01-0.01-3.4-4.0

Gain -2.3dB
-7.99.60.13-0.14-5.0-5.7

DeClip
-11.69.40.030-8.6-9.3

Audition







Original
-5.60
2.780

84.2939.3
Audition
-5.70
2.232.23

84.2839.3
No Limiter
-5.70
2.162.16

84.2539.2
Post Limiter
-6.09
0.46-0.01

89.5138.85
Gain -2.3dB
-8.00
-0.14-0.14

84.2539.2
DeClip
-11.72
00

71.9639.00

Interestingly, there's a slight increase in the loudness range when resampled without a limiter, though the [Max Momentary] - [Integrated Loudness] value does show a small reduction (3.1LU in the original vs 2.9LU for no limiting and 2.6LU when limited). RX7 does report a small true peak overload for the Gain -2.3dB file, though Audition doesn't.

Since I'm a fan of sample-peeking (I find you can often identify problems just by looking at the waveform) I constructed some comparison images showing a couple of sections of the track with the samples aligned. Original waveform (44.1kHz) is grey, resampled waveform (22.05kHz) is blue:

Section 1:
Section 1: Adobe Audition

Section 1: iZotope RX 7 No Limiter

Section 1: iZotope RX 7 with Post Limiter

Section 1: iZotope RX 7 with gain -2.3dB

Section 1: iZotope RX 7 with DeClip


Section 2:
Section 2: Adobe Audition

Section 2: iZotope RX 7 No Limiter

Section 2: iZotope RX 7 with Post Limiter

Section 2: iZotope RX 7 with Gain -2.3dB

Section 2: iZotope RX 7 with DeClip

While I had high hopes for the workflow using DeClipping, it looks like it's a bust, yielding aberrant leaks that lead to excessive gain reduction. Overall, I think the post-limiter built into the RX7 resampler does very well. While you might get a more precisely perfect result by applying gain reduction, this requires careful calibration that's frankly more trouble than it's worth: too little is useless as you'll still get clipping, and too much will result in losing resolution on converting to 16bits. So, while I understand a reluctance concerning limiters (since reckless limiting is what causes this problem in the first place), I think it's a useful tool here and I'll be leaving it on in future.


Charles King

--------------------

Addendum:

There’s an important point that I only realized after writing the email, though. If you use volume levelling (through ReplayGain or some other mechanism) that operates in the digital domain then the problem of intersample overs causing distortion in the DAC goes away automatically. Highly compressed tracks that are more likely to have a true peak over 0dB will also have a high loudness and thus will be subject to negative gain in your playback system (and this will happen without resolution loss as a decent player does its DSP at a high bit depth). So by the time the signal hits your DAC it’s already been scaled down sufficiently that intersample overs won’t be a problem. The best state-of-the-art DACs manage 21 bits of DR, so you can drop a 16-bit signal by 30dB ((21-16)*6) before you run into the noise floor.

Just in case anyone's interested in why the DeClip workflow caused so much gain reduction, I prepared another image showing one of the errant peaks it produced. Original audio waveform is dark green, DeClipped, normalised and resampled waveform is lighter blue:

iZotope RX 7 DeClip errant peak.

The RX7 DeClipper was voted the best in a blind comparison test last year, but clearly there are problems, which are probably innate across all such algorithms.

Charles

--------------------

Thanks Charles for all the work! IMO, all that time you've spent to investigate and confirm the potential for overloading while downsampling definitely deserves to be published for access by others who might be wondering about the same thing.

Your work indicates to us that the Post Limiting feature in iZotope RX 7 functions well with really no practical concern and I will likely start turning the setting on in my resampling as well. Also interesting to see the DeClip function effect; I've limited my use of it for the few albums I have that are clearly severely clipped so even if errant peaks are created, it would be in the context of "impaired" sounding albums already.

I know... For many readers, this article and that "Overload!!!" top graphic with Marty McFly being blown away by Doc Brown's speakers is more than a bit dramatic (classic 1985 Back To The Future scene). What we're talking about here belongs to the OCD "perfectionist audio" subculture of technical audiophiles ;-). We're talking about the potential clipping of probably a handful of samples here and there on loud tracks when even for standard CD, there are 44,100 samples being converted per second.

However, there is also a message here worth considering. This is that digital audio (including hi-res) is just about DATA. Data can be manipulated by digital processing whether through DSP "correction" filters, volume leveling, or resampling... There's nothing magical about any of this. When one understands the process involved, one can anticipate what these changes might mean audibly and can explore implications of such changes in the frequency and time domains.

These days on audiophile forums, I continue to see individuals expressing concerns about lossless compression causing sonic differences (highly highly unlikely even years ago!!!) or that even the mere copying of bit-perfect files will result in different-sounding versions (here's a rather depressing article from Cookie Marenco as recent as 2017 that deserves an official retraction)! Seriously folks, if you know what you're doing, and you're using modern high resolution DACs, there is simply no difference beyond what's encoded in those bits.

BTW, for those interested in even more technical discussions on dynamic range calculations and compression with good background in stats, check out this paper by blog reader Pietro Coretto from Universita di Salerno in Italy - "Nonparametric Estimation of the Dynamic Range of Music Signals" (available as a PDF preprint on Arxiv). Very interesting Section 6 on comparing their MeSDR (Median Stochastic DR) calculation with typical TT-DR and in Section 7 they looked at different masterings of The Wall's "In The Flesh?" comparing MFSL and EMI releases. Great to see the academic work behind some of these things we talk about as audiophile consumers...

When Charles previewed this article, he also suggested another link to check out:  'Dynamic Range' & The Loudness War. Excellent article discussing not only the history, nature of dynamics (peak levels, crest factor...), and actual examples of what's happening over time. Excellent read.

As usual. I wish you all safety and enjoyment of music... Despite curtailment of social circles this year, I'd like to wish a happy Thanksgiving 2020 to fellow Canadians this weekend.

17 comments:

  1. Cookie Marenco claims that her conclusions result from double blind listening tests, and they seem to be consistent over time and with multiple releases. Comment?

    ReplyDelete
    Replies
    1. Sure Unknown,
      Let's talk about Cookie Marenco and her claims:

      1. Nothing I say is about the *quality* of the recordings she achieves. I've heard some demos and awhile back I got her Blue Coast Collection SACD which sounds great. Good for her for achieving some excellent results!

      2. Having said (1), I can also say that when we analyze the DATA on the Blue Coast Collection SACD, the noise level is higher than the limits of 16-bit audio. And the high frequency roll-off does not suggest any "need" for sample rates >48kHz. Furthermore, in every track I looked at (probably 3 or 4 of them on the demo SACD), there were noise anomalies as I showed in the "Looking for a Home" FFT on my "Post Hi-Res" post. Combined with the technique she described, there is simply no need for digital "hi-res" encoding since using the larger "bit bucket" will not improve sound quality.

      3. As for her claims about sonic degradation with simply copying digital data, it seems highly unlikely in the context of how digital works, right? Digital encoding is based on engineering principles and the devices we use are the results of human design. Where else in the world of digital audio or video (except among some purely subjectivist audiophile corners) do we persistently see claims that bit-perfect copies result in unexpected variation directly coming out of the replication process? Does the printed page look different from a copy of a Word document? Does sending a PowerPoint look worse? When Netflix streams movie "copies", do they appear increasingly distorted?

      To say she conducted "double blind listening tests" is rather meaningless as a statement of fact if there's no documentation of such a thing nor has anyone ever replicated the experiment and likewise documenting such a test. A Flat-Earther could gather a bunch of friends and claim some experiment was performed and everyone agreed that "Yup, there is no curvature to the earth!" Obvious we know people say all kinds of false things all the time; the wise must engage their ability to discern truth.

      If Marenco did perform such a blind listening test: Who participated? With what music? What hardware? How was the data transmitted/copied/manipulated? What controls were used? How was it blinded? How did the data collection take place to ensure there was no bias?

      Only with that information can we develop some "faith" that she knows what she's talking about in the face of a claim that counters basic engineering principles about the nature of digital data.

      Delete
    2. BTW: Before someone picks this up ;-).

      When it comes to Netflix streaming, obviously it's lossy so each stream could look different... Let's suppose there's no data rate limitation. Would the same error-free stream look different in Hong Kong as it would in New York City given the likelihood of the data servers located in different places (but with the exact same copy of the movie) around the world and how the data would have been routed across the globe?

      On a related note. Remember the old "Intercontinental Internet Audio Streaming Test" from years ago: :-)
      http://archimago.blogspot.com/2015/02/measurements-intercontinental-internet.html

      Delete
  2. Hi Charles,

    This article is a great example of "when in doubt, try it yourself". No expensive hardware or software (free resamplers, players and plugins are widely available) is required for doing these experiments.

    As for the comment about ReplayGain and DNR of good DACs these days, some audiophiles often worry about "bit-perfectness" or losing DNR due to the use of software volume adjustments. This kind of thinking actually encourages the so-called "loudness war". No loudness normalization means consumers need to adjust volume manually when playing different tracks. Not very practical and can even be dangerous in some listening scenarios (e.g. driving). Practioners of loudness war take advantage of this and apply severe compression and limiting to "win" the competition. Prevention of intersample overload for these inherently distorted tracks have little meaning. Of course, the wimpy output of some portable devices, and the way of listening to music (in noisy public places) may also be courses of these kinds of mastering practices.

    With a reference level of for example -18LUFS target playback loudness, the worst offenders of loudness war may be played with an attenuation of about 12-15 dBs and this won't even degrade a not so state-of-the-art DAC. Also, for highly dynamic music, with appropriate settings, ReplayGain will look for the peak level of audio files to avoid clipping. In case you disagree with the loudness algorithm, you can also manually edit the stored gain level of individual tracks.

    ReplyDelete
    Replies
    1. The DAC I use only has a wimpy 20 bits of dynamic range and linearity, but then it only cost £33 (the TempoTec Sonata HD Pro). So yeah, you aren’t losing much if you don’t want to fork out for a really top-range DAC ;). I suppose this is the real benefit of the ‘HiRes’ spec-war that happened over the past couple of decades: we can now buy devices that have enough headroom to apply significant amounts of DSP on the user’s end without sacrificing any meaningful dynamic range.

      My personal view on the loudness war is that the problem isn’t compression per se, it’s _bad_ compression. Compression is an essential element of the ‘studio as an instrument’ revolution that began back in the ‘60s. It changes the texture of the music and if an artist wants to use large amounts of it to get the sound they want, we should respect that. The problem comes with sloppy mastering engineers who brickwall the track so it clips (and then typically lower the gain by 0.3dB so they can pretend it’s not clipped and will pass iTunes’ automated checks). You don’t need golden ears to hear the spray of distortion products that that this causes even on a short peak. There are many ways to maximise loudness without causing clipping, from simply increasing the attack time (using look-ahead) to more sophisticated methods of band-limited compression and EQ which maximise the spectral density while keeping the overall envelope within the legal range. Of course, this means you have to know what you’re doing...

      Charles

      Delete
  3. Although I find this article by Charles highly interesting to read, I have to admit that my interest in hi-res files, such as SACD, DVD-A etc. is fast declining. That goes for upsampling too. I have tried to upsample some of the music I really like, but in the rare cases there's an audible benefit, it is so tiny that it isn't even worth the time doing it. An example of a bad hi-res file is a MOFI edition of Santana's "Caravanserai", an album I have loved since the day I bought it 45 years ago. There's no clarity, details are drowned, there's no slam, hardly any bass or treble and so on. That's a real shame IMO, because the album is a masterpiece music-wise, and the redbook is just as good/bad as any remasters I've ever heard. Perhaps the LP is a bit better than any digital file, I am not sure.

    As such, I have stopped entirely worrying about resolution, because when it comes down to it, all that matters is the skills of the sound engineers. In my collection, there's more than one example of a plain 16/44 CD which sounds clearly better, cleaner and more detailed than 99% of my hi-res files/discs, BluRays included.

    I am not letting it spoil my day, but sometimes I do feel deeply depressed that so many great recordings have been utterly destroyed by some half-deaf nitwit behind the mixer console. Most recordings aren't that bad, but still not as good as they could have been.

    I am too lazy to downsample the relatively few hi-res files I have, and HDD space is cheap, but if there were an audible benefit to it, I might try it.

    Thx :)

    ReplyDelete
    Replies
    1. Hey Duck,
      I actually think the "interest in hi-res files, such as SACD, DVD-A etc. is fast declining" is actually a good thing because I hope it also means that audiophiles can refocus on what is really important rather than the minutiae that companies sometime prefer that we be distracted by (typically these distractions like Hi-Res and MQA are for self-serving financial gain of course).

      No Hi-Res/SACD/DVD-A/Blu-Ray would ever fix that MoFi Caravanserai you speak of if the mastering was not good in the first place. It's become clear over time as we've all experienced "hi-res" for ourselves that in fact, it brings nothing to the table quality-wise unless specifically produced with hi-res techniques from start to finish - and even then we are unlikely to hear a difference when downsampled (as per Mark Waldrep's recent blind test and AES presentation).

      Yeah, no need to downsample given the cheapness of HD space especially if there are not many files. I've done quite a lot already and have plenty of space left over now :-).

      I agree, it is indeed sad speaking as enthusiastic audiophiles to think that so much music has been produced and "remastered" over the years in ways that have limited potentials of fidelity and I believe emotional impact. Alas, this has been the case for a long time. Hopefully we will see brighter days ahead... Sure would be nice to see more of a pressure on artists and the production side to place "more natural" sound quality higher up in the list of priorities.

      Delete
  4. Hi Archimago
    I am interested in your opinion about the further release of space on HDD using conversion to AAC, because it is lossy digital audio compression.

    ReplyDelete
    Replies
    1. Hi Vlad,
      I'm pretty agnostic about lossy encoding as well ;-).

      Remember that this blog in 2013 started with a blind test of high bitrate ~320kbps MP3 vs. Lossless - results here:
      http://archimago.blogspot.com/2013/02/high-bitrate-mp3-internet-blind-test_3422.html

      Basically, the data suggested people did not perceive the sound of MP3 to be inferior, in fact there were suggestions that some even preferred/liked it more.

      Since AAC can encode equivalent sound quality with lower bitrates like 192kbps - some evidence here in one of my old experiments:
      http://archimago.blogspot.com/2013/05/measurements-do-lossless-compressed.html

      I anticipate that the music library would sound great at 256-320kbps AAC.

      As a "perfectionistic" audiophile, I'm happy to keep my content as lossless compressed FLAC... But I certainly would not disparage anyone who prefers to keep their music as high bitrate AAC!

      Delete
    2. HD space may be cheap, but 30+ years of buying music means my collection is rather large. Having spent a lot of time organising my collection and getting it all tagged correctly I’d be really annoyed if I had to recreate it from the sources. If I stored my library losslessly it would amount to over a terabyte, which would be really inconvenient to upload to an off-site backup given the asymmetric nature of home high-speed broadband. So most of my music is compressed using high-rate VBR AAC. I’ve done numerous ABX tests and can’t tell the difference (especially now with my 55yr-old ears). I can’t see any rational reason to worry about it, and it makes managing my library a lot easier.

      The one issue is that compression will worsen any problem with intersample overs in tracks that aren’t strictly legal. But this just means you end up having to apply a little more gain reduction (through RPG) to bring them back in line, and these days you can get cheap DACs that provide more than enough headroom to accommodate this. Tracks that are properly recorded will have no problems with compression at all.

      Charles

      Delete
  5. I have used Sony Sound Forge from version 10? and now up to 14 now that they are owned by Magix. I have always found it easy to use, but have tempered what I think it can do by the price of generally $59 even with upgrades. I have not tried the Pro versions yet, but I may this year.

    My issues are that I have no idea how accurate the metering is in the software so always insure that I am at least -3db down as I do want to turn on the last bit. I do always record in the highest bitrate I can, which is 24/192 or at least 24/96, but never in my computers, but using Tascam SDHC card recorders. I only trust computers for mastering and editing. I lost an entire concert once, so never again.

    I have never messed with upsampling once I took a redbook file and recorded it in my Tascam DR680 MK2 at both 2496 and 24192 and I could not hear an improvement. If I was archiving some music I would save it at 24/192, but not for just listening. I would leave the redbook files alone and worry about getting a better playback DAC.

    I am not sure if the normalizing process is destructive or transparent, but I never normalize over -1db as again the metering accuracy might be an issue. I do wish the program would normalize each channel independently as I often have to go back and deal with the overall loudness of each channel individually on occasion. Channel imbalances do bother me and can affect the stereo perspective.

    Most of what I do are WAV files, but most of the downloads I do buy are FLAC and sound excellent to me as I buy most of those in 24/96. I have not experienced any overs in any of my downloads. I am not a Mac person so no comments on their file format.

    ReplyDelete
    Replies
    1. Hi Jim,
      Can't speak to Sound Forge, but certainly it's good to maintain as much of the resolution as possible although of course for recording, we just need to make sure not to clip.

      When we're doing audio editing at 24/32/even 64-bit resolution, the normalizing process would be completely transparent. Absolutely no reason to be concerned... If someone thinks otherwise, they need to be subjected to a blind test :-).

      Likewise, I can't imagine anyone claiming that 24/192 is going to be different from 24/96 these days! I know... Folks like Neil Young tried and maybe some hi-res promoter still want us to be excited about 24/192 streaming and stuff like that.

      Take care and stay safe.

      Delete
  6. Hi Archi.
    Adobe Audition: "which is rated as having one of the better resamplers"
    Do you have any sources for this?
    Better over which one?

    Cheers mate!

    ReplyDelete
    Replies
    1. Hi Unknown,
      Perhaps Charles can weigh in as well since this was part of his text.

      What I can say is that if you go into SRC Comparisons:
      https://src.infinitewave.ca/

      If you have a look at the Audition test results, they definitely look excellent and function very close to the "ideal filter".

      Delete
    2. According to https://src.infinitewave.ca/, dBPoweramp resampler looks great.

      If this is what is inbuilt into Foobar (in Foobar it's called "dBPoweramp/SSRC resampler", then the free Foobar player is all you need to perfectly upsample/downsample audio.

      Delete
  7. I modified LMS so that volume control is done by SoX. The full workflow is
    - Bit shift down (= ~6dB)
    - Re-sample (in my case, to 110.6kHz which the natural rate of my DAC)
    - Volume adjustment, taking into account the bit shift already done
    - Dither added at 24 bits (in my case, the bit depth of my DAC chipset)

    By doing the bit shift before re-sampling I can avoid inter-sample overs, but obviously a bit shift maintains bit perfection.

    Reducing volume by a whole ~6dB is not wasteful in my case, because I'm using digital volume control anyway.

    But this still might be of general interest. I use 'gain -6.02059991327962390427' in Sox for a bit shift. I've tested it by shifting up and down, and comparing checksums - it worked.

    ReplyDelete
    Replies
    1. Instead of that crazy number in decibels, use "vol 0.5" directly. But if the volume adjustment is followed by resampling and dither, then there is nothing to be gained by shifting an exact number of bits.

      Delete