Despite all the examinations of what often amounts to subtle differences in sound when we compare different hardware devices, I think as audiophiles we too often neglect the very significant differences that mixing and mastering makes.
Recently, I received my copy of Tears for Fears' The Tipping Point, the Blu-Ray only available for order online here. As you can see in the image above (screenshot of the menu), the disc includes both lossless DTS-HD Master Audio 5.1 and Dolby TrueHD-Atmos mixes done by Steve Wilson.
I believe you can already stream the Atmos mix over Apple Music "Spatial Audio" but that would be in lossy Dolby EAC3+Atmos. As you can imagine, the multichannel mix sounds quite different from the 2-channel CD/lossless stream!
To appreciate the differences, I ripped the DTS-HD MA version (no access to an Atmos digital decoder) to my computer and converted it down to 2-channels with dBPowerAmp. You can see a comparison of the overall waveform below in Adobe Audition:
Compared to the Steven Wilson multichannel mixdown, clearly we can see the CD version is highly dynamically compressed! Notice that on the DTS-HD MA version, the track gets louder near the end of the song, notice that much of this dynamic potential has been completely squandered in the CD version.
As you can see, the DR value from the CD is dramatically reduced (halved!) compared to the Wilson 5.1 downmix. If you've ever done your own listening with highly compressed audio vs. a less compressed version, no doubt you'll have an idea of what low-dynamic range music sounds like. Noise level tends to be accentuated and at times some recordings are not just peak-limited, but also grossly clipped resulting in harsh noise. When zoomed in, I'm seeing some "soft clipping" from the use of a peak limiter rather than "hard clipping" at 0dBFS thankfully.
For my own music server library, I have a copy of the downmixed 5.1 version which IMO sounds better than the CD mix.
For a demo of how the more dynamic version sounds compared to the CD, what I did was selected a 1-minute clip from 1:45 to 2:45 from "No Small Thing" for you to listen to. The 48kHz Blu-Ray rip was downsampled to 44.1kHz to be consistent with the CD. I normalized the Blu-Ray rip to 100% peaks, and decreased the average volume of the loud CD version to match the Blu-Ray Wilson Downmix using Adobe Audition 2021.
Here's the result:
|Average RMS amplitude as calculated in Adobe Audition equalized for both tracks. These days we can also use other algorithms to measure loudness like ITU-R BS.1770-3, resulting in -13.80 LUFS CD, -13.53 LUFS for the 5.1 mixdown. Regardless, we're looking at <0.5dB average amplitude difference. (See this article/video for more info about various music level measurements.)|
What do you hear?
There are obviously differences. For example, the CD version sounds like it has a bit more reverb compared to the "dryer" sound of the 5.1 mixdown to me. I wonder if that impression of more reverb in the CD version might not just be the high compression pushing up what should be subtle reverb trails, making them overly significant. The 5.1 mixdown as a result sounds more focused with vocal and instrumental parts better separated. The voice and acoustic guitar are better situated in the center of the soundstage (the multichannel mix makes good use of the center channel). Later on around 30 seconds, the 5.1 downmix has the electric guitar situated clearly to the right, easily isolated with good contrast from the other instrumental layers while with the CD version, the guitar appears more embedded within the sound of everything else.
As has been expressed by Mitch Barnett in his article "Dynamic Range: No Quiet = No Loud" (2017), the decades of dynamically compressed sound like on the CD version has resulted in "wimpy loud sound" across all popular genres. Given the prevalence of strong compression, this is sadly an overarching legacy of 21th Century popular music sound quality (thus far).
I think as audiophiles, while we may obsess and claim that "everything is important", it's wiser to focus on the things that make the most difference. For most of us, I think the Big Three are: Room, Speakers, and Recording Quality.
As I had expressed previously in the "Good Enough" article a few years back, if we are to spend money proportionate to what contributes most to high-fidelity sound, we should clearly be devoting resources to improving the room acoustics (could be very expensive - like buying a house with an adequate sound room!), and excellent speakers instead of focusing too much on the little hardware things like cables, power conditioners, or related tweaks.
Of the Big Three, "Recording Quality" is the one we have least control over, which is why when it comes to building a music library, it's good to keep an eye out for the best sounding version of favourite albums. Sometimes we might need to seek out "first press" releases with better dynamic range, perhaps look for audiophile editions from MoFi or Audio Fidelity (sadly now defunct), instead of the latest mainstream remaster. We might even need to seek out vinyl versions that have been mastered without as much compression. And yes, even though the Steve Wilson 5.1 Mixdown isn't the "official" 2-channel version, this is clearly my preferred way of listening to The Tipping Point when not played back in actual 5.1 multichannel. (The multichannel mix would be my "definitive" version of this album.)
IMO, the absolute connoisseur of audiophile reproduction is not just the guy who picks out the "best", often very expensive, DAC, amplifier, speakers, audiophile cables, and all the right tweaks; but the one who also knows which is the best release of their favourite music!
By the way, while I used the lossless DTS-HD Master Audio version of The Tipping Point in this demo, the lossless TrueHD-Atmos version sounds great on my system. Tracks like "The Tipping Point", "My Demons" and "End of Night" incorporate some really nice surround with height channel content. At times, I'm tempted to look over my shoulders just to double check if there is somebody making sounds back there! The last track "Stay" is a slower tune which makes good use of the height channels and creates a feeling like I'm floating amidst the vocals not just with perception of width and depth, but also the vertical dimension.
As a synthpop/rock album, the sound is obviously not "natural" so Wilson has leeway to construct the mix creatively (as opposed to trying to re-create some kind of surround "live" sound). As such, it's absolutely appropriate that the surround channels be used for all kinds of esthetic effect so long as not overly aggressive for most listeners (I don't like when some mixes are too "rear heavy" for example), or sounding gimmicky (eg. excess "ping-pong" effects isolated to single speakers). Steven Wilson has done a great job with his mixes over the years and this one is no exception! Here's hoping that many more albums can be given this kind of treatment as streaming services like Apple Music continue to host "Spatial Audio" content.
|A look at the complex 5.1 multichannel original Steven Wilson mix from which the 2.0 fold-down was derived.|
I want to continue to encourage audiophiles to expand their listening experience into multichannel audio. There are a lot of nuances in those 5.1 channels shown in the image above that gets lost when folded down to 2.0. Yes, there are costs and sacrifices to having even more speakers in the room; obviously the need for extra real estate. Yes, it costs money for those speakers, amps, processors/receivers. However, just like 2-channel audio, there's no need to be extreme in order to achieve fantastic quality; a few good speakers and a nice receiver should not need to be too expensive. Yes, one might obsess and spend even more time with speaker placements and tweaks. I guess that's what audiophiles do. ;-)
A few weeks back, I integrated 2-channel and multichannel content (typically 5.1 FLAC rips) into my Roon library. This is already very nice although unfortunately Atmos content is not supported through Roon. A "bitstream-to-HDMI" feature would be fantastic and I suspect RAAT is advanced enough to handle this. Ideally, the ability to decode Atmos, apply volume leveling and DSP would be "the promised land"! One method to decode Atmos appears to be using the Dolby Media Encoder which costs US$400/yr as discussed here. I don't think that's a reasonable cost just for playback especially given the paucity of lossless Atmos music content currently. Give the proprietary nature of Atmos (and dts:X), it's unlikely that there will be a free/open-source reverse engineered object-based decoder any time soon (for example, incorporated into software like FFmpeg). Considering the tenacity of coders out there, you never know...
Multichannel has the potential to create a more realistic sound field compared to 2-channel playback (see research like this in the literature). While wide soundstage and good depth can be achieved with 2-channel stereo sitting in the "sweet spot" facing forward with well-placed speakers, it's hard to render detailed, stable sounds far in the corners or behind the listener across wide audible frequencies if one were trying to reproduce a real acoustic space. Obviously the imaging also fails if we're off the sweet spot or have the head turned too much.
[BTW: A review here with some history on "immersive audio" from before Y2K; much of the computing limits from back in those days have been overcome. And here's one from 2021 for comparison demonstrating progress over the last 2 decades.]
While we can trick the mind into hearing the impression of a voice over the shoulder or incidental dog barking behind us (as in QSound-processed Amused To Death) with 2 channels, it's hard if that "spatialized" sound is complex like say a choir with multiple voices we might want to pick out from behind us, or the sound of an audience with multiple individuals in specific locations clapping or cheering. Technologies like QSound (developed back in the 1980s) can be impressive although the effect doesn't completely sound natural to me. This is not to say we absolutely need accurate surround reproduction in order to enjoy music! For some music lovers, mono is good enough. Rather, technology and art have gone hand-in-hand over the millennia to allow for expression of human creativity. It's good to have the surround technology available so that artists, audio engineers, and producers can utilize a wider spatial palette with which to expand the auditory canvas, and potentially further enhancing the listener's emotional response.
Over the last while I've been listening to some live concerts in multichannel (without the video). On the Eagles' Farewell I Tour: Live in Melbourne (2005, DR14), if you have a multichannel system, among others, make sure to listen to the last track "Desperado". The 2.0 version gives a valiant effort but simply cannot reproduce the same enveloping soundstage, stability of vocals, instruments (simple solo piano to start, building with synth, percussion, guitar) and detailed audience sounds recorded in the Rod Laver Arena that night. Also, Phil Collins' Live at Montreux 2004 (DR11) brings up good memories of some music I grew up with.
With the death of Glenn Frey in 2016 (it was good to have attended the Long Road Out of Eden stop here in Vancouver in 2010) and Collins recently saying goodbye to live performances, it is great that we will always have these multichannel recordings to simulate artists entertaining the crowds. (Oh yes, don't forget Leonard Cohen or George Michael...)
Hope you're also able to enjoy the results of artists releasing multichannel content thus far!