This article came about after I received an E-mail from an audiophile friend who saw this Audiophile Style thread in praise of "math and magic". It links to a piece of software by a site called remastero, and the program itself is called "PGGB" (Pan Galactic Gargle Blaster), obviously referring to The Hitchhiker's Guide To The Galaxy with the main author named Zaphod Beeblebrox (who in the book is also the ex-president of the Galaxy). Cute, and of course the number "42" features prominently here and there.
In the past, we have talked about "audiophile" software that supposedly affect sound quality. Years ago, we talked about bit-perfect players (Windows, Mac) and really how "bit-perfect" is simply "bit-perfect" regardless of what software is used. We discussed questionable programs like JPLAY. Then there are the OS tweaks like Fidelizer. Neither JPLAY nor Fidelizer made any difference in my testing or listening.
That is not to say software doesn't make a difference at all. With the computing power we have these days, we can certainly perform highly precise filtering and DSD-PCM transcoding - like with HQPlayer.
The idea with PGGB is that this is software that will take (in batch) various tracks you have and convert these to upsampled versions like 24/384 or 32/705.6 or even higher. In the process, applying very strong filtering (eg. on the order of >200M-taps sinc filter for some of the tests we'll run here, very impressive big number, right?). Furthermore, the website states that the software can apply settings for various levels of "transparency", apply HF noise filtering, uses noise shaping, adjusts gain monitoring for intersample overs, deal with convolution filters, and an apodizing setting. That's a bit of stuff so I won't promise that we'll hit on all these here. My intent is to at least have a good look at the foundation of the upsampling effect and the EQ function.
So with that E-mail, my friend told me he downloaded the software (he started with the "Whittaker-Shannon Edition", but more recently the "Equalizer Edition", v2.0.42 as shown above) and has been running it on a 1-month trial license converting some of his favourite tracks. He seemed to enjoy the software and noticed some differences in sound so wondered if I would have a look at this and/or suggest some testing.
Well, since we live a bit of a distance apart, I thought about this for a bit and sent him some of my test signals and music tracks to see if he could run it through his machine and upload the data to me to have a peek. We E-mailed and chatting back and forth over a week, sharing files and ideas on what to test. This article is a culmination of the test results and ideas.
I. Let's talk "audiophile" audio processing...
To start, I think it's important for us to consider as "audiophiles" just what kind of processing we can do that would be beneficial.
Are we talking about processing that would add certain "euphonic" benefits to the audio? For example, a vinyl DSP plugin like iZotope Vinyl might be enjoyable by some but I don't think those of us interested in "high fidelity" would consider doing something like this (or even vinyl playback itself) as beneficial.
For certain situations like with headphones, we might want to process audio through a cross-feed DSP to get the sound "outside the head". A little while back, someone suggested I try 112dB Redline Monitor for example. (BTW, here's Linkwitz hardware if you want to build one.) Over the years, I've been an advocate of room correction DSP; the difference this makes is also obvious.
If we look at the website, PGGB purports to stay "true to Nyquist-Shannon sampling theorem". It claims to use long filters and states "the longer the filter, the higher the reconstruction accuracy and the more transparent the sound". Sure, that can be true. And linking this with subjective experience, it claims "what this means to you is better depth and layering, improved resolution, a cleaner leading edge, and more accurate timbre". Again, sure, accuracy and resolution of the sound correlate but at some point, that link breaks down as to how much this matters "to you". What I'm implying is the concept of diminishing returns. For example, the audible improvement in resolution going from 8-bits to 16-bits is obvious. But 16-bits to 24-bits is typically marginal at best (assuming you can even hear a difference). Yet both are +8-bit increments.
So then, if the goal of PGGB is to maintain "accuracy", this automatically locks us into an implied type of sound. Given the excellent resolution of today's DACs, logically, the effect of this algorithm is unlikely to change the sound too much if we're already starting with good resolution content (ie. good 'ol 16/44.1 is already great resolution). We should not experience significant frequency response changes for example. Likewise, we should not hear changes to the dynamic range of the music unless we actually think the software expands dynamic range somehow (this would not be faithful to the source, right?).
So, the bottom line is that unless we're saying that the DAC has very poor digital filtering to begin with, logically, it would be wise to caution again unrealistic expectations of massive improvements using PGGB. In other words, it's probably best to avoid the word "magic". Some might call this "closed minded", but I see it as simply being realistic before we examine anything, especially if we're thinking of spending money. Open minded enough to give it a chance with testing and listening, but not so open minded as to have our brains spill over to embrace fantasy (as per this).
Let's start our exploration rationally with examination of the facts then!
II. Objective Measurements: Digital
While I agree that ultimately, it's about "how something sounds", I have always felt that subjective descriptions are not that useful most of the time unless I'm sitting in the room with a friend and we can both "compare notes" having heard the same sound. This is even more important for evaluation of something like this in that the software touts all kinds of technical claims like mega-tap filtering, huge levels of precision, "accuracy", etc. but interestingly, I don't see any details on the website like even simple measurements or diagrams of the filters implemented.
I started by downloading the software but did not activate it just to see what it offered and how to proceed (screenshots are of activated trial from my friend's computer). We can see that it's a batch processing program where you select directories ("Input Folders") for it to process, and a target directory ("Output Folder") for the "upsampled" / "remastered" data like this:
Notice in that picture, the Yosi Horikawa track is being processed using 234M taps - that's a big number!
My friend is using a 12-core Intel i9-10920X with 32GB of RAM and claims the conversion did not take too long, something like 5 minutes to convert around 20 minutes of audio by his estimate. I'm sure this fluctuates depending on the complexity of the filters the program chooses to use. That's still likely a faster CPU than what most people have though!
The program provides some settings to supposedly fine-tune the sound. For our testing, we decided to try 2 settings which we'll call "DEFAULT", and "ALT" to see what kinds of differences this makes:
DEFAULT is basically what the program started with and ALT is with changes made as you can see. I'm not sure how "Natural" varies from "Front Row", and I can't say what "Transparent" is or what "Dense" sound means if both are aiming for "accurate". The guide doesn't provide clear technical explanations for these vague subjective terms either.
As you can see, the input audio signal is being upsampled to "705.6/768"kHz, 32-bits, and automatic gain has been applied for both settings. You can see the amount of physical memory, logical cores and max threads (not sure why the limited number of threads) available on the machine.
Since this is about upsampling and filtering, let's start with the impulse response shall we?
As described on the website, indeed very long tap-length sinc filtering has been applied. The converted impulse response file says 11M-taps specifically for the above signal. Notice morphologically that it's a typical linear phase filter for both settings with the expected long symmetrical pre- and post-impulse ringing.
So how well do these filters function in the frequency domain? Let's bring out the "Digital Filter Composite" (DFC) graph I've been using for years. Note that the difference here is that I will derive these in Adobe Audition using a 65k-point FFT and there's no "Digital Silence" plot which I normally show in my measurements since this is done digitally and silence would be "infinitely" quiet:
We see the result of that very high tap-length digital filter. I tried to keep the scaling the same for both graphs. Notice that the "ALT" setting shows lower dynamic range from signal peak to noise floor. I'm not sure which of the settings caused this and didn't try to specifically chase this down ("Dense" presentation maybe?).
I noticed a couple of other interesting findings on the DFC. Let's zoom into the "brick wall" sharp corner up around 20kHz:
Apologies for the axes not exactly lining up. Basically, what we see is that the filter cuts off right around 21kHz. Despite the different options used, I was a little surprised that the "corner frequency" did not change at all; I would have thought that maybe words like "Transparent", "Dense", changing "apodizing" might correlate in some way. Given that this is originally a 44.1kHz signal, we're actually losing the content from 21-22.05kHz. I agree that this is not audible, but just the same, we are missing something in this "maximal transparency" upsampling process.
The other thing I noticed was that the difference between the 0dFS and -4dBFS noise signals is certainly not 4dB as intended. The program obviously added an automatic gain to the signal, in this case the -4dB signal has been boosted by +3dB relative to the 0dBFS. To show you what the level difference should be like unprocessed, here's an overlay with the original wideband white noise and 19 & 20kHz sine:
As you can see, the PGGB "remastered" version is missing the remainder of the audio signal >21kHz. Also, the levels of the wideband noise signals have been automatically shifted so they're closer than the intended original 4dB delta.
BTW, don't worry about the 19 & 20kHz peaks being narrower for the "Original" signal. This is simply because the FFT remained at 65k-points for both 44.1kHz (original) and 705.6kHz (PGGB processed).
Okay, so the PGGB filter does have some effect, but does this matter to the sound quality of music?
There is one way to find out that's quite easy to do objectively... Let's use Paul K's DeltaWave Null Comparator currently at version 1.0.70b!
Years ago, I created a short test file consisting of a few seconds of Pink Floyd, Rebecca Pidgeon, The Prodigy, and Rachel Podger for use with the old Audio Diffmaker program which I called the DiffMaker Audio Composite (DMAC). Since some of the material originated as 24-bits, I created the file as 24/44.1, in total about 30 seconds of music.
So let's take that 24/44.1 file and upsample it with something common like Adobe Audition 2021 (version 14), and compare this output with the same file processed by PGGB. Does PGGB upsampling with all its mega-taps and other "magic" processing differentiate much from Adobe Audition?
For completeness, here are the settings - notice that we used the "DEFAULT" PGGB preferences but left Gain set to 0 as there's no point fooling around with volume control since I know the test track will not clip. As for Adobe Audition, let's use the highest 100% "Quality" upsampling to an integer 32-bit, 192kHz WAV file. Speedwise, Adobe Audition upsampling to 192kHz took <2 seconds on my AMD Ryzen 9 3900X, my friend reported 30 seconds using PGGB upsampling to 705.6kHz with his Intel i9-10920X:
III. Objective Measurements: Analogue from DAC
|Frequency response, linear scale. With DEFAULT setting, corner at 30kHz, ALT setting up at ~45.7kHz.|
IV. Subjective Listening
"Hey Arch, fun gettin' this stuff to ya.
Okay, here are my thoughts on this program.
Like we discussed, it takes alot of computer power and memory to run. When running PGGB, I see all cores being used at different times in the process and the machine dips into the VM [virtual memory], requiring more than 32GB RAM especially with longer tracks. So far, I've resampled about 5 albums. The install also downloaded the MatLab runtime which took up >3GB disk space. I don't love the UI which can be a bit slow even on a fast machine.
It sounds good to me. I notice in A/B listening that the PGGB files do seem to be louder than the original sometimes and this will affect what I think of the sound. I have seen some of the comments on the AS thread but I'm not noticing as much as some of those guys say. Using my Chord Hugo TT2 with upsampled music to 32/768, it's good but I really don't think I hear a huge difference in dynamics or transient resolution. Chord already talks about using a long '98,304-taps' filter for the TT2. I'll keep listening but I'm just not sure these huge files are worth keeping even if they sound a little better!"
Notice the difference between low and high DR files. After conversion, Tracks 1-3 have their peaks pinned to 0dBFS with reduction in the RMS level to account for the intersample overs that are almost bound to happen with these dynamically compressed, low-DR, "loud" tracks.
V. PGGB-EQ Handling of Room Correction Filters
|One of my room filters, "optimized" for PGGB, "imported and on".|
|My 2-channel room correction filter applied to a REW sweep. As we can see, PGGB-EQ does apply frequency equalization to the channels independently. 10Hz start and 20kHz end frequencies as per settings. And confirmation that phase correction is off.|
I'll leave you to decide if this removal of phase correction is a "good" thing. Personally, I'm with Bernt and I know Uli Brueggemann of Acourate also believes this is a bad idea. In my experience, time-domain correction improves the sense of expansiveness in the sound stage as well as more precise positioning of voices and instruments. So by doing this, PGGB-EQ seems to be discounting these potential benefits, making the room correction a purely frequency-domain affair.