Hey everyone. It really gets dark in January in Vancouver; just the kind of weather to cozy up in bed with a nice book or maybe some spirits in front of the hi-fi system :-).
So, I thought I'd offer up a few miscellaneous thoughts this week as I looked over recent CES reports.
I figured it'd be good to spend a few moments on the new encoding techniques that came out or were announced in 2014. I think the biggest advancement in audio encoding (at least as it pertains to the home) this past year was Dolby's push for Atmos into the consumer space. Remember that Atmos has already been out for a couple years in the movie theaters since 2012's release of Pixar's Brave. This happened with announcements in June 2014 and the first Atmos-encoded Blu-Ray came out on September 30 with Transformers: Age Of Extinction. Not exactly a movie that will win Academy Awards (maybe in some technical categories), but appropriate to show off some sound effects I suppose.
With Atmos (and other techniques like Auro-3D and the upcoming DTS UHD/MDA), digital processing takes another step up... We all know about the role DSP's have played in home audio including bass management, then room correction (like Audyssey MultEQ), and now we have dynamic, object-oriented sonic rendering in 3D space adapted for one's speaker configuration on top of the typical multichannel (5.1/7.1) mixing. Cool.
I'm unclear whether this will have much relevance for audiophiles (unlikely I would think) given the relatively small numbers of "multichannel audiophiles", but it does represent a true technological step forward. The question is whether many folks will be able to have a home theater setup capable of experiencing a significant difference given the "need" to increase the number of speakers. Even having a dedicated sound room (the solution to the WAF issue of course!), I'm really not keen to cut holes in my ceiling for placing speakers up there and running even more wires to the audio rack for height channels. Dolby's push for ceiling-bounce speakers (like these Definitive Tech A60's [see review]) is an interesting though compromised solution when it comes to fidelity. From a physics perspective, there's only so much that can be done with what would be small up-firing speakers, room interactions, and the timing/phase issues that need to be accounted for. Even if there were compelling high-fidelity music to be enjoyed, I doubt many if any of these up-firing solutions would be acceptable for audiophiles used to very low distortions in their existing speaker systems. I see that Kalman Rubinson in January's Stereophile suggested the potential for 3D speaker arrays to be used for even more effective room correction. Hmmm, not sure if this would be worth it if those extra channels actually are of inferior sound quality... One could be creating more problems than it's worth.
Time will tell and certainly an interesting technical development to keep an eye out for if one has a multichannel setup now that the "800lb gorillas" (Dolby & DTS) seem to be seriously getting involved with their usual hardware partners (like Denon, Pioneer, Yamaha...).
The other "advancement" in encoding technique in 2014 comes from Meridian and their MQA. This certainly got lots of air time in the audiophile press late last year. Smart move throwing a fancy party (oooooohhhh... 69th floor! The audio must have been orgasmic!) and inviting your media friends of course - everyone loves a good party! Let us spend a bit more time on this one.
I don't know about you guys, but "Master Quality Authenticated" just sounds a bit pretentious to me and I'm feeling a bit confused about this declarative form of branding (how do others feel about this?). Perhaps I'm just too cynical, but when the web site is "www.musicischanging.com" and the word "revolutionary" keeps popping up, you really have to wonder how much is driven by advertising and how much is really technical achievement... Are we dealing with another "revolutionary" modulation technique (not just PCM, or 1-bit DSD?), new "revolutionary" algorithmic representation of music, something truly mindblowing?! Alas, no... Doesn't look like it as far as I can tell.
It looks like we have HDCD-on-steroids (HDCD 2.0? Fat HDCD?). Thinking back, I was quite impressed by the ideas that were incorporated into HDCD back in the late 1990's. Recognizing that generally music would not need the full 16 bits of dynamic range in a CD's 16/44 digital stream, the bright folks at Pacific Micronics decided to "borrow" the lowest 16th bit and encode instructions for their hardware decoder. Instructions for filter changes, peak extension, low-level gain control, etc. allowed the system to encode what would amount to ~20-bit dynamic range (in a "lossy" fashion of course since there's not enough bits there to be fully accurate if starting from a 20-bit source). Ingenious! Of course Microsoft bought the HDCD technology in 2000 and HDCD has long been superseded by true 24-bit PCM, DSD64+, and now high-resolution files. BTW, I spent some cash in my college days to buy a HDCD-enabled Harman Kardon CD player which sadly broke down after <2 years - the worst piece of hardware I've ever owned!
As far as I can tell from the audiophile press (I haven't seen articles from other mainstream sources yet), it looks like this concept is being reused for the next "generation" of streaming audio (or if there are storage limitations and you wanted to keep smaller files). Other than a few who have attended demos, articles here, here, and the comments here give us a glimpse at what Meridian is up to (a couple of patents linked in the comments as well - here and here). Although I admit I have not heard this CODEC in action yet (I'm sure it sounds great), I think it's worth thinking about what is likely being done, and to consider potential pitfalls.
Based on the best technical description from the Stereophile article by John Atkinson, it looks like instead of taking the LSB as the encoding carrier (as in HDCD), Meridian's technique will probably be harnessing the lowest 8 bits of the 24-bit audio data. I'm guessing about the 8-bits part because in the "Backward Compatibility" section of the article, it talks about non-MQA DACs playing the file as the equivalent of 16/48 (it's of course seen as a 24/48 file but the lower bits will essentially act as low-level noise), this is also suggested here by Dr. AIX. So the files will likely be 24/48 (2.3Mbps uncompressed bitrate), which can then be compressed down using your favourite lossless encoder like FLAC (maybe down to ~50% for a non-Loudness War classical track). Comments like "average bitrate of just 1Mbps" (Robert Harley - probably just parroting what he's fed) I believe is overly optimistic about the lossless compression ratio. Stereophile says "the data rate was not much more than the CD's 1.5Mbps" seems realistic.
Like HDCD, the MQA decoder will examine the bitstream, take the "encapsulated" data from those lower 8-bits and reconstitute an upsampled version of the music using the lossless 16/48 base and "filling in" the ultrasonic components to create what probably would be 24/192 data sent to the DAC, and maybe even up to 24/384 if the DAC can handle this level of upsampling (see picture in this article). While we can say that up to 16/48, this system is "lossless", I do not see how the overall algorithm can possibly be truly lossless for frequencies >24kHz. [See Addendum 2 below, the patents give us some hints different from what I'm considering here...] The "stuff" in the ultrasonic range >24kHz will have to be lossy encoded (or "accuracy-reduced") I would think in some fashion to fit into the limited bits available, using their "psychoacoustic" model although how one would psychoacoustically determine what is important/perceptible is questionable! I suspect they will use that orange triangle in Figure 3 of the Stereophile article, filtering out essentially everything by ~60kHz as the diagram suggests and keeping as much of what's in there as accurately as possible. Not unreasonable. As to how this is accomplished, I would guess one way would be similar to how lossy encoders like MP3 or AAC would do it; sub-band analysis and allocation of bits to those frequencies thought to be more "important" (in this case, more bits to the frequencies just above 24kHz and diminishing as we get to the ~60kHz apex of the triangle). We're looking at 768kbps encoding rate available in the lower 8-bits/48kHz stereo signal to create that ultrasonic facsimile. I would not be surprised if the encoding/decoding algorithm is simpler than MP3 since the psychoacoustic model could be less complex; after all, we're dealing with ultrasonic data of questionable audibility anyhow so the reconstituted waveforms above 24kHz do not need to be highly accurate. You would however need extra processor cycles because we're also upsampling. I also suspect the algorithm cannot be computationally intensive in order to accommodate low power and small portable devices. This kind of decoding scheme should be easy to implement as software to feed standard USB DAC's capable of high sampling rates with a computer of reasonable speed (I'd be curious how "open" Meridian would be in allowing third parties to write their own decoding software).
The final piece which I can see Meridian making a fuss about is of course their "Apodizing filter". Basically a minimal phase upsampling filter with perhaps some high frequency roll-off if there's a need to suppress the duration of the ringing. I had a look at this back in 2013. It's interesting that over the last few years, talk of pre-ringing and the use of minimum-phase filters seems to have died down. I would not be surprised therefore if Meridian will now resurrect this talk with MQA since the encoding method intrinsically employs upsampling and they can again show pretty pictures of impulse responses based on 192 or even 384kHz sampling rates (would the algorithm even bother to encode a single impulse when fed at 24/384 or would this be filtered out?). In fact, it looks like minimization of the impulse response is explicitly mentioned in this patent application which vaguely attributes long impulse response as "perceptually harmful" - the wording I suspect was purposely ambiguous. For those wondering about the significance of this, I would encourage everyone to play with minimal phase upsampling in SoX and see if you can tell the difference compared to standard playback or linear phase upsampling.
One somewhat interesting piece to the patents is around the potential for encryption (this patent). I wonder if this is in fact what could stimulate acceptance of MQA streaming. In effect, these files can employ a form of DRM (Digital Rights Management). Here's a possible scenario - without a MQA-capable decoding DAC/player, you'd be hearing essentially a 16/48 file but if you want "better" sounding high-resolution audio, you would need a device that can decode it but only with some kind of authentication if the content provider deems it. This would be like in the old days when DVD-A players would only output 48kHz digital off the S/PDIF rather than the full 96kHz data to prevent digital ripping (for full 24/96+ transport, you would need HDCP encryption through HDMI or another proprietary method). In an era of ubiquitous digital piracy, content providers would be pleased with this kind of control in place. Another example - you can advertise "high resolution streaming" exclusivity of MQA-encoded music before a high-resolution FLAC of the album can be purchased off HDTracks. Depending on how strongly the authentication mechanism is taken, an MQA-enabled device could conceivably not even play a file unless some kind of compatible key were provided and "Authenticated" (strong enforcement of DRM - essentially making the MQA data a form of inaudible watermarking). Also, I suspect some mechanism could be developed to tag the file - "This MQA data was streamed from Tidal on February 13, 2016 between 3:00-4:00PM PST" so piracy can perhaps be traced back to where "leaks" originated. I wouldn't even be surprised if the tagging could embed user information like IP address, or unique tokens which the server also keeps a copy of to prove origination (I've purchased PDFs from vendors already doing this for text documents). I'm speculating of course as I don't think anyone has discussed this, but it's worth keeping this potential motivation in mind. (If I were in business like Meridian, expecting significant financial benefits from licensing agreements, I'd be considering this to gather support from the content providers. And hyping this format big time to get as much market share as possible ASAP!)
I'm sure I'll be revisiting these encoding techniques, especially MQA in the days ahead. Reading over what I just wrote, I must say that I remain irked by the MQA acronym... They could have just called it HRS (High-Resolution Streaming vis-à-vis HRA for High-Resolution Audio) to signify that this is a format mainly for better streaming quality; that would actually mean something. Rather, this whole "Master Quality Authenticated" acronym just sounds grandiose to put it mildly (and here's a corresponding interview). Also, I hope I'm wrong about 16/48 being the actual base "lossless" component to the encoding technique. Although the blind testing last year did not show preference to 24-bit audio among audiophiles who participated, I think it would be nice to have true bit-depths down to at least 18-bits to feed our high-resolution DACs - at least this will ensure the "core" lossless data has potential for better-than-CD dynamic range (if I'm right about the DRM piece, I can also imagine content providers not wanting to openly go beyond CD-level quality). The lower 6-bits can then be used for whatever restoration of frequencies above 24kHz Meridian thinks is necessary; there's still >550kbps bitrate available...
Conceptually, it's interesting to note that this system stresses frequency range extension to that of 192+kHz sampling rate (there's also some vague discussion about dissociating the concepts of "frequency" and "timing" - we'll see about that!) as opposed to dynamic range extension with HDCD aiming for 20-bits resolution. I guess Meridian doesn't think we need >16-bits of dynamic range for high-fidelity playback! This seems like a step backwards...
Basically, other than some compression (the promise of 16?/~192kHz sound in a 24/48 container assuming you think that's better than a fully lossless 24/48) and Meridian employing their brand of upsampling DSP algorithm, there's probably not much else here for the consumer as far as I can put my finger on at this time. Certainly more "evolutionary" than "revolutionary" I think (that's if you even consider this a step forward at all)! There's likely no sonic improvement for those of us already listening to our favourite tunes in standard/flat/non-encapsulated 24/96+ FLAC. I can also see how this encoding technique can get in the way of audiophiles who are not streaming nor face storage space limitations. Imagine if your favourite album came out only as a MQA "high-resolution" download (assuming you have a MQA-capable DAC/player) and then later you have to consider re-buying when the record company finally issues a true lossless 24/96+ file up on HDTracks or equivalent. Other than convenience for online streaming, I'm also wondering why would I want to buy any hardware for MQA decoding knowing that MQA doesn't even seem to utilize >16-bit dynamic range if I (and probably most readers) already have a capable 24-bit DAC? It's maybe a feature to keep in mind if I consumed most of my music through on-line streaming, but I'm not sure if this is a "must have" otherwise.
I look forward to reports when this system gets out into the hands of actual reviewers beyond these early company demos - any decent recording played on >$30,000 Meridian DSP7200 speakers better sound good. I'd be curious how close my speculations are to the final product :-).
Remember, we need to be cautious what we hope for and consider the likelihood that what we have already is indeed better (ie. true 24/96+ FLAC "Studio Master"). When we're invoking the use of proprietary compression schemes with likely quality-reduced components to it (ie. accuracy of the >24kHz parts of the audio spectrum), we need to consider compromises with accuracy (whether audible or not) or freedom of use due to the proprietary nature. I hope things will clear up over the next few months.
Some more "explanations" about MQA here.
Notice again the insistence to use "revolutionary", "lossless", and the lack of meaningful specifics... There's also this piece about the DAC "lighting up" to mean that the file is "Authenticated". Oooo... Cool... Remember how HDCD also had an indicator light?
Thanks to the folks on the Squeezebox forum, I was directed to a more detailed patent description to look at. Indeed, the method described is one of lossless encapsulation but in a fashion done with some limitations which allows "97.6%" of 970 16/96 musical samples analyzed to fit in the 24/48 data space. That's pretty ingenious actually! So I guess we can call it "almost always lossless"? Presumably if there's a lot of detail and dynamic range in the original 16/96 file above 24kHz, then this technique will fail to be truly lossless.
However, I am concerned that this technique actually is only maintaining 13-bits of the original signal in a "lossless" fashion without an appropriate decoder (see the diagram)! So... Assuming this is the same technique being used in MQA, this means that without the decoder, it's more compromised than I had thought in terms of the "lossless core"; not even 16-bit lossless when playing these files through a standard DAC. Basically, for audiophiles, buying an MQA decoder is essential if you have a collection of these.
One more thing from the patent:
"We can conclude that a 16-bit 96kHz channel with appropriate noise shaping is entirely adequate as a distribution format, meeting audiophile requirements with some margin to spare."
Given the universal approval of MQA so far in the audiophile press I have seen, I guess we're all in agreement that 16/96 is all we need then? :-)
I still like my idea of 18/48 lossless core and use the lower 6 bits to encode a lossy ultrasonic facsimile with >550kbps data rate. Of course, with Meridian's insistence on the importance of temporal resolution, this would be anathema in their eyes (due to temporal smearing because of block lengths typically used in lossy encoding) even though I suspect it could sound better than their patented scheme; certainly would be more accurate with standard DACs and achieve >16-bit dynamic range. Hmmm, why would the audiophile world advocate going through this Rube Goldberg of a system?! And why should we consider this "revolutionary"? ("Biggest thing to happen to audio quality in decades!" - please think about what you're saying WhatHiFi?.)
I want to encourage everyone to read the recent "As We See It" editorial Audiophilia Nervosa in the February 2015 issue of Stereophile by Robert Schryer! Good article and nice to see this perspective articulated well. See, I'm not always critical of the mainstream audiophile press. :-)
... Just watch out for Synergistic Research bull droppings by page 28 among other testimonies of "magic"!
Have a great week everyone!