Sunday 1 March 2015

MUSINGS: Audio Quality, The Various Formats, and Diminishing Returns - In Pictures!

Let me be the first to say that graphs and charts where audio formats are plotted out in terms of unidimensional sound quality ratings are ridiculously oversimplified and can be very misleading! However, they can be fun to look at and could be used as bite-sized "memes" for discussion when meeting up with audio friends or for illustration when people ask about audio quality.

Since they're out there already, let us spend some time this week to look at these visual analogies as a way to "think" about what the authors of these works want us to consider/believe. I'm going to screen capture without permission a couple of these images to explore. As usual, I do this out of a desire to discuss, critique, and hopefully educate which I consider "fair use" of copyrighted material; as a reminder to readers, other than a tiny bit of ad revenue on this blog (hey, why not?), I do not expect any other gain from writing a post like this.

First, here's PONO's "Underwater Listening" diagram released around the time of the 2014 SXSW (March 2014):
PONO: Underwater Listening

Others have already commented on this of course (here also). I don't know what ad "genius" came out with this diagram, but it is cute, I suppose. I remember being taken aback by this picture initially as it's so far out of "left field" (creative?) that I felt disoriented when I first saw this thing...

How audio formats would evoke a desire to compare underwater depths remains a mystery to me. Obviously, there's a desire to impress upon the recipient two main messages - a direct correlation between sampling rate (from CD up) with quality, and to make sure the MP3 format gets deprecated as much as possible (1000 ft?!). On both those counts, this image gets it so wrong, it's almost comedic. Clearly, one cannot directly correlate samplerate and bitrate with audio quality because the relationship isn't some kind of linear correlation. Why would CD quality be "200 ft", and 96kHz "20 ft"? Surely nobody in their right mind would say that 96kHz is 10 times perceptually "better". Sure, there is a correlation such that a low bitrate file like 64kbps MP3 will sound quite compromised with poor resolution, but without any qualification around this important bitrate parameter, how can anyone honestly say that all MP3s sound bad? I might as well say that Neil Young's a poor-sounding recording artist because the Le Noise (2010) and A Letter Home (2014) albums are low fidelity.



I suspect that the PONO camp must be a bit ashamed of this diagram since I don't see it around anymore and I don't find it on their website (might have missed it). I don't think the "underwater" diagram made many friends nor sold many machines in any case...

Here's a more recent chart from Meridian circa late 2014:
Meridian: History of audio quality & convenience?!
From this, we "learn" that "downloads" have poorer quality than CDs (always?!). Also, I "learn" that LPs sound significantly better than "DVD-A/SACD" (and by extension high-resolution audio). But the most important point is that current streaming audio sounds worse than cassette tapes in quality. Does that make sense to anyone? Is this saying that streaming Spotify, Tidal, Qobuz, etc. customers are so hung up with convenience that they're willing to listen to sound quality worse than an 80's Walkman?

Of course this is the myth that they primarily want to perpetuate because guess what... Buy this "revolutionary" Meridian MQA and that'll make streaming sound awesome!

While in some cases, sure, we can say a very poorly encoded 192kbps MP3 download (like something done in 1999 with XING MP3) could sound significantly worse than CD and a 64kbps stream can be worse than an excellent cassette copy, like the PONO "artwork" above, there are some truly horrible gross generalizations here! Many LPs sound poor due to low quality pressings, many downloads are qualitatively superb, and clearly any reasonable music streaming service sounds better than a cassette tape - who's kidding whom?! Furthermore, a high resolution digital master (like with high-res downloads or encoded on DVD-A/SACD) has the capability to be more accurate than reel-to-reel tape, but of course subjectively, analogue tape can add its own unique signature/color/distortion that can be preferred... (To be able to mix in the digital domain without generational fidelity loss compared to analogue tape is obviously a big plus.)

Of course, it's easy for me to just criticize without putting something forward... Therefore, please allow me to add for your consideration my submission to the "overgeneralized sound quality vs. audio format graph":

It's a graph of the law of diminishing returns in terms of audio technology and sound quality. I think it's important to take into account the fact that hearing ability is obviously NOT infinite. Due to biological phenotypic variation, there's probably a bell-shaped curve to hearing ability as well as moment-to-moment fluctuations in acuity which is represented by the "Zone of max. auditory acuity" gradient [See comments: probably more of an asymmetrical negatively skewed distribution]. Depending on a person's maximum hearing ability, the 100% point will shift up or down relative to another but let's keep this graph simple and say that for any individual, we can only hear up to 100% based on how we're endowed. Day to day, our hearing acuity changes - everything from current stress level affecting the ability to attend to the sound, to ear wax, to allergies, to sinus/ear infections, to noise induced hearing loss, to tinnitus, to age will result in a decline in the maximum acuity (some of this sadly irreversible). Obviously, mental training can help improve how well we attend and pick up subtle cues.

The Y-axis therefore represents the "Perceived Fidelity" up to 100%. Exactly how fidelity is measured is not important in this simple diagram but obviously will consist of frequency response, dynamic resolution, low noise floor, low distortion including timing anomalies using the same mastering of a recording of superb quality for all formats. On the X-axis, we have "Effective Uncompressed PCM Bitrate" as a measure of approximately how much data is used for encoding the audio. This is a proxy for how much "effort" is needed to achieve the level of fidelity. Note that the scale is logarithmic, not linear to correspond to the logarithmic perception of frequencies and dynamic range. More data, more storage, more "effort" is needed to achieve any improvement to perceived quality as we go towards the top of the plateau to the right of the graph.

As you can see, the curve plateaus since we obviously cannot hear beyond around "100%". At some point, it really does not matter how much data we use to encode the sound, there just will not be any significant perceivable difference and all we've done is wasted storage. The big question of course is at what point along this curve do we place the capabilities of the various audio formats.

Starting with good old CD, we know that scientific research has shown little evidence to suggest in controlled trials that higher resolution sounds much better (see discussion here). Therefore, I think it's reasonable to put it at point (1) which is quite far along the curve already - this would correspond to the 16/44 stereo PCM bitrate of ~1.5Mbps. It's very close to the 100% point - I don't think it's unreasonable to say around 95% so there is a possibility for some improvement. Where exactly this lies is not that important, it could be 90% for example; the main idea being that qualitative gains beyond the CD format are not going to be really massive. As we go higher to 24/96 (~5Mbps, point 3) and 24/192 (~10Mbps, point 5), we achieve essentially 100% perceived quality and for all the effort in terms of bitrate/file size, relatively little is gained. Although mathematically these high-resolution formats can capture more frequencies and greater dynamic range, the actual auditory benefits are limited.

Where does DSD sit in this? Realize that 1-bit DSD isn't as efficient as PCM (a description I've seen calls each bit of DSD an "additive" refinement to the sound, versus a "geometric" refinement with multibit PCM). Furthermore, noise shaping shifts the quantization noise into the higher frequencies resulting in non-uniform dynamic range across the spectrum; this is generally not a problem because hearing sensitivity also drops in the higher frequencies. From what I have heard and through examining DSD rips, I think that DSD64 is better (more accurate) than CD but not much more (I personally think 21-bit/50kHz PCM, about ~2Mbps, is good enough for DSD64 conversion and avoids encoding all that excess noise) whereas DSD128 is just short of 24/96 but very close. Note that this inefficiency in DSD encoding screams for the use of compression which I have argued should really be implemented in DSD file formats a couple years back.

So what about lossy compression in terms of perceived fidelity? Considering that there has not been good data to demonstrate that many people can differentiate high bitrate MP3 from lossless PCM, I have no issues placing it just shy of CD quality. To keep the graph clutter-free, I just used a single line to denote the MP3 320kbps quality even though I recognize that there could be a wide range to the fidelity depending on quality of the encoder and demands of the music. There are special cases, usually containing high frequency content that can demonstrate limitations with high bitrate MP3 but these are rare and generally will not be evident in actual music. You might ask "why is 320kbps MP3 equivalent to ~1.5Mbps uncompressed PCM!?" The answer is due to psychoacoustics techniques employed. Sure, there is significant data reduction, and yes, taken out of context of the rest of the audio, you can hear the difference (as in "The Ghost in the MP3" project). However the data removal was done with sophisticated algorithms informed by models of human hearing. As encoding algorithms have improved, so too have the sonic quality of the resulting MP3 over the years. This is a good example of how you cannot compare bitrates directly; the way the data is being encoded is obviously very different! And sadly PONO advertising doesn't seem to understand this when they keep using diagrams like this:

Just because a lot of data is used doesn't mean there's much benefit even if the recording were done in true high resolution. By the time we get to 24/192, we're way into the zone of diminishing returns and may in fact as some have suggested entered a point where the sound quality suffers because of potential intermodulation distortion from ultrasonic frequencies and some DAC's may no longer be functioning in an optimal fashion. The fact that technologically we can get this far into the curve is also a reflection of the state of maturity of audio science. Personally I remain partial to 24/96 as a "container" for the highest resolution humans will ever need; one which is already standard on both recording and playback equipment.

Finally, as I indicated in a previous post, vinyl has limitations. Yes, it can of course sound great but there are limitations to accuracy (including differences for outer grooves vs. inner grooves), higher overall distortion, and material imperfections. As a result, there will be a wide range to the sound of LP playback as identified in the graph. Perceived fidelity compared to the original source would be lower but also remember that just like the reel-to-reel tape discussion above, some of the distortion and coloration could be "euphonic" as well - hence preferred by some (many?).

I'm sure a graphics artist could produce a much more pleasing image than what I kludged together above :-). Like the PONO and Meridian pictures, it's simplistic but I think compared to the others, a more realistic representation.

Notice that the Meridian graph above tries to suggest that there has been deterioration of potential sound quality over time (especially when they suggest streaming quality is like cassette tape!). I've seen a number of people parrot this same idea in magazines and forums. I think this is nonsense. Consider that even free Spotify is streaming with Ogg Vorbis 160kbps on the desktop (still very good!). With a premium account, you get 320kbps. And sites like Tidal already do lossless 16/44 FLAC. We're looking at quality either reasonably close or identical to CD quality. Here's my version of the chart:


As you can see, I don't believe there has really been any inverse correlation between sound quality and convenience over time. Note the drop in convenience from CD to DVD-A/SACD which I don't think is a big deal since many DVD-As play in regular DVDs and are easy to rip now (dead format anyway), plus SACDs are often hybrid and play on standard CDs (and can also be ripped these days with some inconvenience). The shift from physical media to "virtual" digital data storage has been tremendously convenient although it brings with it a new skill set - file management, proper tagging, and of course managing backups. Now the shift towards streaming has become even more convenient and "mobile" through wireless data networks (but there's limited ability to customize and tag one's collection and the sense of "ownership" of the music - a problem if one is a "collector"). As far as I'm concerned, the only real qualitative decline was from LP to cassette tapes where convenience in terms of portability improved (can listen in cars and Walkmen, less need for cleaning, but no random access song selection which is why I gave LP a 50, and cassette only an increase to 60 overall). I believe streaming just needs a little more bandwidth and if we can reliably get 24/48 FLAC streaming, we will achieve a quality and convenience beyond what most music lovers and audiophiles would feel they "need" (we'll see if MQA really offers much more). Of course, there's always that desire to have physical artwork and booklets to thumb through while listening to the music - vinyl remains the "king" of album art in that regard.

One final comment to those who feel that just because folks like myself do not believe high bitrate MP3 sounds substantially different from lossless 16/44, that I'm somehow "advocating" for lossy audio. That's not exactly true since I don't think anyone would deny that lossless formats are superior for the best accuracy / fidelity. I still prefer FLAC as my archive format because then I can convert to whatever other format I want without multigenerational lossy degradation. However, I do believe MP3 is the way to go with cars and portable audio even if they support lossless and high resolution. High bitrate MP3's are quick to transfer, take up less space, and there's just no way I will be able to hear a difference in my car or walking down the street. I personally find high-resolution lossless files (or God forbid uncompressed DSD) on a phone or portable device extremely wasteful even if storage size were not an issue. MP3 (and similar formats like AAC, WMA, Vorbis...) has its place as a tool for high quality compression and there are many applications where it's all one ever needs to get the job done completely. Plus MP3s are universally supported.

Bottom Line: Remember the principle of diminishing returns as we're dealing with mature audio technology and limitations of the hearing apparatus. It's important to keep this in mind when assessing the promise of "new technology" and manufacturer claims such as the diagrams above.

(Did anyone see any critical comments from the audiophile press about PONO or Meridian's ad material above? How about Sony's 64GB "Premium Sound" SD card recently? There sadly seems to be a lack of critical thinking in much of the audiophile reporting these days, which only serves to isolate this hobby and solidifies the concept of the pejorative "audiophool".)

----------

Regretfully, I missed a live performance by Cécile McLorin Salvant here in Vancouver last weekend. A friend went and thought the performance was amazing! She seems to be channeling a young Ella...

Check out her albums Cecile (2010) and the Grammy-nominated Womanchild (2013) if you like jazz vocals.

Enjoy the music...

17 comments:

  1. This comment has been removed by the author.

    ReplyDelete
    Replies
    1. It may depend on what people mean by 'stream'.
      If they mean youtube than I agree the SQ is below that of a cassette tape and the convenience is high.
      At least, I find this to be true for the majority of youtube material.

      Delete
    2. I suppose if YouTube is what Meridian is pointing a finger at or maybe some free Internet radio stations. I don't think they should in any case because MQA is targeted at services like Tidal with higher bitrate streaming (they need 24/48 bitrate container after all). But even YouTube audio these days isn't so bad! I've saved a few audio clips recently and it's certainly quite usable...

      Here's a site that claims YouTube is streaming out at VBR ~126kbps AAC - better than old 128kbps MP3 in any case (which I always thought was better than typical consumer cassettes played on a Walkman).

      http://www.h3xed.com/web-and-internet/youtube-audio-quality-bitrate-240p-360p-480p-720p-1080p

      Delete
  2. I enjoy your blog immensely, so apologies if my comment today is a bit picky.

    1] If human hearing ability was a bell-shaped function, one is accepting near-infinitely high levels of hearing ability for very few people. This is IMO not the case. I don't know the correct name but I think the curve has a longer-but-not-very-long left tail that hits zero for complete deafness but still is well off zero sample points as there is a significant population of fully deaf people, and a short right tail where no-one can hear above a frequency not far above 20 kHz and below a SPL threshold that is easily measurable. Therefore the curve is far from Bell but more tilted with left side never getting close to zero population and right side hitting zero samples quite quickly with a small standard deviation, if that is the right word in this case.

    2] Your text description of X and Y axes is back to front. :)

    ReplyDelete
    Replies
    1. Thanks for the feedback Grant. Oops... Fixed the X/Y mixup :-). Was attending a "function" that evening when I wrote much of this, apparently my proof-reading skills were down... You'll see what function next week!

      Just as a reminder to everyone, we could conceptually plot the curve a number of ways... I saw it as a curve for any *one* individual. On the X, as we go to the left and hit 0 kbps, the sound quality would be 0 for "no sound" at all which would apply for everyone. As we go to the right, I have it as plateauing towards 100% to indicate simply that we all have a maximum 100% rate... At some point to the right it of course really doesn't matter because it's buried DEEP in the orange haze of the "zone of maximum acuity". Even if the sound quality could be an improvement objectively/absolutely, and even if the cochlea could be sending some "improved" signal to the brain, the mind would not capture the significance.

      I think this is why people typically cannot easily tell the difference between lossless and high bitrate MP3... The lossless *is* better but by the time we hit say 192kbps, psychoacoustic limitations already put us within striking distance of the "haze" and depending on the day and time of the day, that haze could be greater or smaller (eg. tired from the night before... audio acuity goes down and less able to detect difference). I may actually be too conservative and 320kbps MP3 could actually be more to the right even because I'm quite sure that the quality is well within that zone for essentially everyone.

      Where the "bell shaped curve" comes in is in the population as a whole... This I'm actually not representing on the graph. The bell shape I think applies to where the "100% point" sits for any one individual. If I had average hearing ability, you could be hearing at "105%" compared to me (because say you're younger and can hear more high frequencies), but for the purpose of this graph, we're both only able to perceive up to "100%". I should have made myself clearer... (** This is an important point as well because as we get older, we'll "migrate" down this bell curve. Important to keep that in mind as we read subjective reviews and consider factors like the age of the reviewer.)

      Grant, I think you're right that it's NOT really a "bell shaped" symmetrical curve. It's more of an asymmetical negatively skewed distribution in biostats for those with "normal" hearing:
      http://www.usmlebiostatistics.com/Biostatistics-for-USMLE-for-IMG-and-medical-student/Skewed-curves-and-asymmetrical-curves-and/index.html

      There are more people "to the left" to account for all the myriad of ways to have less-than-perfect hearing and very few folks with superior "golden ears". As you noted, there's a bimodal population of deaf and severely impaired (eg. bone conduction only, or severely sensorineural damaged folks) individuals which makes it more complicated on the extreme left side (near 0 hearing) as well...

      Delete
    2. Ah, negatively skewed, that is the term I was looking for. And thank you for the extensive reply.

      Delete
    3. Poisson is the distribution you are looking for. And I think you may be right, there is a very long tail on one end (young and people who have taken care of their hearing) and then there are the rest of us who ignored our mother's entities to "turn it down"!

      Delete
  3. I think one of the major problems here is that technology brought so many new members seeking "better sound" which unfortunately got tied in with old "robin-hood" style audio sales. For years cables were sold within the law of diminishing returns, but it didn't matter because to the people who bought them, the investment was chump change.

    ReplyDelete
  4. It can be argued that the curve reaches a peak somewhere between CD and before 24/96 .
    The higher frequencies can even be a liability in some systems . A lot of amps and speakers don't like the ultrasonics so they can intermodulate and create actual problems where we can hear the sound . And the studio equipment like microphones etc are not made for a good response there and as no one can hear it it's not monitored ( who knows what's there ). And the ultrasonic cruft DSD leaves...
    I'm referring to consumer playback ,there are all kinds of good reasons to record with higher resolution . These subjects get routinely confused it's not the same topic .

    what I think please give us well mastered good sounding recordings in any format , that's the hidden variable everyone misses . If it's sounds bad it will do so even in glorious 24/356

    http://xiph.org/~xiphmont/demo/neil-young.html

    ReplyDelete
  5. I think your diagram is very much along the right lines.

    However, if I had to bet, it would be that CD is audibly indistinguishable from all the high res formats. That's what the theory says (the only difference being an inaudible amount of noise and some inaudible higher frequencies), and as far as I am aware there is no reliable evidence at all to the contrary. If so, CD is transparent and that is that.

    However, I am not convinced that 320 kbps MP3 is transparent with all signals, so I would draw a clearer separation between it and CD.

    I am open to Mynb's argument that ultrasonic s**t can make its way into the audible range through intermodulation distortion so best to avoid recording it or producing it within the encoding method itself. I had not thought about the issue of studio gear not being designed for ultrasonics anyway, so "who knows what's there?" - that is an excellent point. This might suggest that CD should be placed *higher* than some of its high res rivals.

    (That also set me off thinking about the Ooashi effect and similar. If there really is anything to it at all, could the experiments be repeated with synthesised ultrasonic 'excitation' loosely derived from CD quality recordings? Or even just some constant ultrasonic 'wash' sprayed out over the listeners from a super-tweeter to see if it puts them in a good mood? Might the best of all worlds be CD quality recordings supplemented with a super tweeter spraying out synthesised good mood juice?).

    ReplyDelete
    Replies
    1. Ardchimago didn't say 320 kbps is transparent with all signals. But it *generally* is -- for most tracks, to most listeners. Which is what he wrote.

      Delete
  6. I'm pretty much in agreement with this, but surprised just how far it goes.

    I recently purchased a famous "audiophile" recording ("Cantate Domino") in double DSD resolution - a whopping 4GB of glorious DSD download! And it sounds great, don't get me wrong. But, as a test, I decided to resample it to 256k AAC (using Apple's "Master for iTunes" droplet). The resulting (100MB) output is indistinguishable (to m, anyway) from the 4GB double DSD download!

    And this is through a nice quality DAC with native DSD support (iFi Micro iDSD) and good headphones (HiFiMan HE-500s) :/

    Not sure I'm going to stop buying "hi-res" downloads (after all, the remastering may very well be of some benefit), but surely opened my eyes a bit.

    ReplyDelete
    Replies
    1. Thanks for the data point jhwalker.

      It is amazing what we realize once we try a few tests! And you're doing this with gear that costs likely at least $1000 before taxes (unless you got a heck of a deal!) with support for native DSD.

      I'm blown away when I hear people claim they can hear remarkable differences between high-res/DSD with CD-quality (or even high bitrate lossy as in your test). Even more blown away when they claim their spouse can hear the difference "from the kitchen" :-).

      Delete
  7. With respect to ultrasonic energy, a quick way to eval one's system and ears is to try the IM tests that Archimago and Mnyb linked to above, specifically: Intermod Tests Download the test files and give it a listen :-)

    My own experiments comparing file formats of the same recording that contains significant ultrasonic energy 16/44 vs 24/192 Experiment was that I could not hear the difference, even the MP3 version. Included in the experiment are downloadable versions of the difference files as it illustrates sonically what the actual differences are - educational to listen to.

    As an ex 10 year recording/mixing engineer, I am disappointed in the hires industry for (mostly) reselling recordings, mixes, and masters from analog tape and calling it hires. Worse yet, analog tapes that are actually "safety copies" (duped copies with at least one generation loss) is typically what is sent to be mastered as it was forbotten to send the originals... Finally, some of the hires so called remasters are actually are worse than the original masters as they have succumbed to the loudness war.

    While there are specialty hires recording outfits, these are dwarfed by what I call studios in a box. Any musician armed with a computer, some software, and a handful of mics, can record, mix and master their own recordings. And put it out independently without a record label involved and in many cases do better than what the label can offer. Point being, given the mass commoditization of recording gear in a computer box, it is unlikely the recording quality is going to go up, so the need for ever increasing sample rates is a red herring at best.

    Enjoy the music!

    ReplyDelete
    Replies
    1. Thanks for the info Mitch! Your experience in the "field" is always much appreciated... Hope you're keeping well these days.

      It's amazing how little the mainstream audiophile press emphasizes (or even brings up) the nature and quality of the recordings themselves... There's no way to interpret this other than as a reluctance to upset the apple cart (ie. lose potential advertising revenue including from places like HDTracks or PONO). That's bad enough, but as you indicated, even worse when the remasters end up being both inferior AND put into a massive high resolution file "container"... Then subsequently sold off at a premium.

      The Bob Dylan 24-bit release from a few weeks back was a really sad eye-opener for me when I got the tip from a reader. Here's a *new* recording, supposedly done in 24/192 based on some reports. Yet it ends up being just a (conveniently) minimal volume-boosted version of the 16/44! Surely, how can anyone be so incompetent!? Would it even be too bold to call this some kind of fraud? Boggles the mind...

      Delete
  8. So, your ostensibly more-accurate graph presents 16/44.1 CD as better than reel-to-reel. This is not the sort of absurdity I would have expected from you, but it sem there is no middle ground in this objectivist-subjectivist stance. Michael Lavorgna is rightly ridiculed for thinking that a listening test to assess the relative merits of two different SD cards, but you are also to be pilloried for your statements regarding your silly graphs. Michael L listens to stuff but unfortunately hears only his own confirmation bias, whereas you measure stuff and proclaim superiority thereby. The differences between cable etc., even should they exist, are only likely to be an artifact of minute degrees of impedance mismatch, and barely audible, if at all. yet the difference between analogue vinyl replay and CD is night-and-day, and readily heard by even the most un-Golden of ears. This analogue-low-bit digital difference is even heard with cassette, where the treble of cymbals in analogue cassette is considerably more realistic and real-sounding than the papery pastiche of 16-bit digital. Realism, resolution, treble extension -- if you can't hear the difference between A and D here Archimago then, like Mark Waldrep, you really have no business writing on audio matters. Like our jackets, not all out Ears are cut from the same Cloth...

    ReplyDelete