For today's MUSINGS, let us take a few moments to think about our sense of hearing. Though the auditory system is not as complex as the visual architecture, the extensive interconnections and association areas they intersect with of course provides us with an altogether unique experience integrating emotions and memories. The "drug of choice" for us audiophiles is in the auditory domain. The music we hear adds to the quality of life, and in so doing, satisfies at least to some extent a sense of meaning to our existence. Much of this is a mystery which perhaps in time with the development of neuroscience, secrets may be unlocked... But even if these remain impenetrable mysteries, there is of course no denying the value and truth to human subjectivity. That joy and meaning is wholly ours alone, it is our right as sentient beings to own and cherish.
As humans, we also recognize that we are limited in every conceivable dimension and sensory modality. We cannot smell as well as dogs, we cannot see as well as cats, we cannot hear as acutely as bats, our proprioceptive abilities pale in comparison to even more primitive primates swinging in the trees... Over the years, I have posted on the importance of being aware of these limitations so as to be insightful about our abilities especially when making claims about what is heard or not heard. For example, I think it's useful to try something like the Philips Golden Ear Challenge as a way to evaluate our own hearing acuity. Having an appreciation of dynamic limits also helps us appreciate the importance of sound levels and silence in the listening environment. Knowing the limits as a human being helps us to understand the importance (or unimportance!) of developments like high-resolution audio and what we should expect.
Part I: Physiological Mechanism of Hearing
Our brain is mapping the world. Often that map is distorted, but it's a map with constant immediate sensory input.As a review, remember that the hearing mechanism is analogue (note that this is not absolutely true in that neuronal action potentials do operate based on thresholds). While analogue allows in an ideal world infinite "levels" to represent a signal like a sound wave unlike the quantization steps in digital, in real life there are limits to our amplitude discrimination. These limits are a result of noise which "blurs" details and lowers resolving ability. Though our ears can implement "dynamic contrast" adjustments with activity in the tensor tympani and stapedius shifting, the limits of our hearing has a range of approximately 130dB from the absolute softest sound level to the limit of pain. Remember though that we actually do not need to be able to encode all of this because in truth nobody (in a healthy state of mind!) should be listening to music regularly at extreme levels as one would risk instantaneous hearing loss above 120dB - assuming for a moment one even has a powerful amp and robust speakers! Furthermore, remember that dynamic resolution varies through the audible frequency spectrum; this is the message of the classic Fletcher-Munsen contours:
---- E. O. Wilson
Notice that our hearing is optimal between around 200Hz to about 5kHz. This is the frequency range where high-fidelity audio really must "get it right". Beyond that optimal range, our ability to appreciate frequency nuances drop off significantly on either side. Notice also that the shape of the curves shift depending on loudness level, this is an important consideration in music production and also significant in terms of "reference" volume. Just some of the many considerations we all have to keep in mind when doing listening comparisons to make sure differences we're hearing may not just be physiological factors at work.
For the tiny sum of $15,000USD, I have seen claims of 100dB dynamic range from phono cartridges (let's just say I'm a little more than skeptical). A typical LP is unlikely to achieve better than 60dB, maybe slightly over 70dB max. Remember that 16-bits is already very reasonable with an effective dynamic range up to 110dB or so with dithering (this varies with the different algorithms and ones like noise shaped dithering takes advantage of lower high frequency acuity). High-resolution 24-bit PCM can encode down to the thermal noise floor. Absolutely overkill but great for studio work to maintain precision of course.
We also must be aware of the limits to hearing as we age. Occupational noise-induced hearing loss is an issue for some, but even without this, realize sadly that as most audiophiles are men, on average, we will deteriorate more than women:
|Graph of average loss of frequency acuity with age. It is unfortunate that we do not have more women doing reviews! See Roger Russell - Hearing and Listening.|
[If you're interested in the nitty-gritty physics of music and hearing, have a look at these Physics 406 "Acoustical Physics of Music" lecture notes from the University of Illinois.]
Probably the most difficult dimension to get a handle on is the time domain. Just what is the threshold in time for our ability to detect changes? I briefly alluded to this in my assessment of DSD decoding. There have been articles like this one from Kunchur (2007) and a related study by the same author in 2008 using different methodologies. In the latter article's introduction to the topic we see thresholds ranging from 2ms for "gap detection" in noise, to 200μs down to "2-16μs" as per Kunchur himself. Most studies seem to suggest a theoretical 10μs threshold value. If you look at the paper, experiments to determine this threshold are of various paradigms; dual tones with same spectra, different spectra, varying amplitudes, click and click-pairs. Both papers focused on a ~7kHz stimulus waveform either subjected to subtle lowpass filtering changing rise time or slight speaker misalignment. Specialized equipment was used to create and verify the very precise stimuli of course.
"We don't listen to test signals!" is a common argument I have often heard against measurements and objective testing. Recognize that these research experiments embody this criticism. Specially calibrated and constructed equipment is used to create test tones in the lab for listeners in blind testing analyzed for statistical significance. This is obviously not music.
[EDIT: I'll leave the original text above but with "strikethrough" for the sake of full disclosure. As Måns noted below in comments and I was corrected on the message forums, time domain performance for a signal below Nyquist is actually a function of bitdepth, not as I had originally written above a reflection of the need for even higher sample rates. For CD 16/44, we're looking at (1/(44,100 *2^16* 2Pi)) = ~60 picosecond resolution for signals well within the audible spectrum - thanks Adamdea. More than enough for the Kunchur estimates above... Nonetheless, it is interesting the ideas and claims from the MQA articles below.]
Despite the lack of clear demonstration that these experiments correlate with the need for higher time domain accuracy in digital audio, this has not stopped the audiophile world. Consider these series of articles about Meridian's MQA from the May/June 2015 The Absolute Sound. In them, Meridian (through the aid of Robert Harley) claims that "temporal blur" has now become the benchmark.:
"If Meridian were forced to characterize the quality of a digital audio system with a single metric, it would be how much temporal blur the system adds, measured in microseconds or milliseconds."As you can see in the text and Figure 1, 10μs has become the target for Meridian. Considering that few have actually heard an A/B comparison using MQA and as far as I know, blind testing results have not been released, I watch with curiosity the outcome in the days ahead now that MQA-enabled DACs like the Meridian Explorer2 have been released. It seems they're behind schedule with music roll-out since we're into the start of the forth quarter now (the article was looking at 2nd quarter and the only MQA-related news at IFA recently seems to be the Pioneer firmware announcement).
I anticipate that time domain resolution is going to be the big "push" in the days ahead thus the focus lately on digital filter measurements, and the blind test a few months back. Related to this as well have been the discussions on DSD which by virtue of very high samplerates have an edge over PCM in time domain performance.
Part II: Cognition and Listening
That is why I use these parables,
For they look, but they don’t really see.They hear, but they don’t really listen or understand.---- Matthew 13:13 (NLT)
"Hey... I told you to do that last week! Weren't you listening?!"Whether a reflection of wisdom from two thousand years ago or domestic demands of yesterday, we know intuitively that there is a difference between what our neural mechanisms hear, and whether we actually are listening to it - the stuff that actually makes it into our memory, subconscious, and of course conscious awareness. This leads us into the broad, complex, and marvelous domain of cognition/psychology in hearing/listening. This is a topic which should really be on the forefront of audiophile discussions but so often relegated to background chatter or even scoffed at when brought up as suggestions around potential explanation for certain subjective claims.
---- My wife the other day
To start, remember that the auditory memory buffer, also known as echoic memory is actually very brief. Echoic memory is where detailed unprocessed representation of what we just heard is temporarily stored and available for analysis and interpretation. Studies suggest this storage is in the primary auditory cortex itself, the duration of the "buffer" is about 4 seconds and it can linger around in memory for maybe up to 20 seconds without distraction. This is important because if we're doing blind testing, these limits suggest that snippets of audio should be brief, and we need to quickly switch between samples for best accuracy. Of course this does not mean we cannot listen to something clearly, process the impression, and then later compare based on the gestalt in longterm storage. This isn't difficult for clear or obvious differences, but subtle differences will not be so easily detected, remembered and recalled. This is important of course when we read reviewers talking about hardware comparisons of devices they used to own or have not heard in days/weeks/months/years.
As suggested above, distraction can be an issue. This leads us to how attention plays a major role in how we perceive and evaluate our world. Here's a video of an example which we should keep in mind as an analogy:
As per the title of the video, this is an example of selective attention. Of course this is in the visual domain but you can imagine a similar phenomenon when we evaluate audio for subtle changes. So often, we hear people commenting about how they "didn't notice" the presence of an instrument until some SUPER-USB-CLEANER tweak was inserted, or how ULTRA-SPECIAL-CABLES "made" the percussion seem like it was "30 feet behind the wall" or had "deeper bass". Realize that every time we listen analytically, we shift our attention (scan) to listen for changes and because there is no way to exactly recall the complexity of music (unless we're seriously doing a controlled test adhering to the limits of echoic memory to maximize detection of subtle sounds), it's no surprise that we report "noticeable" differences. Indeed it could be "true" that the person heard what they claimed... But the likelihood is that those sounds were always part of the playback; the only difference being whether the listener actually paid attention to them or not.
I would say that this kind of phenomenon is "utilized" quite a bit in audio shows especially with cable demonstrations. Inevitably, the demonstrator will ask the room whether participants heard a difference after some hardware change (which takes many seconds during which the sales rep probably talked about how much more expensive and theoretically better the new cable/device is) and someone's going to come out and say they heard some element clearer, or the bass seemed cleaner, or the soundstage seems wider, etc... Like I said, these impressions could be all "true" but it's not unreasonable to question whether the impression is only valid in the mind of the listener rather than an actual change in external reality. As much as some audiophiles make it a contentious issue, it is only when we account for variables in a more formalized fashion or repeat testing to verify that we can truly be sure...
Even more fascinating are multimodal perceptual interactions. Watch this:
As you can see in these examples, how we cognitively process sound can be highly influenced by other modalities such as what we see before us (and vice versa). Even when we know the "trick" such as the McGurk Effect or Shepard Tone, the mind subconsciously associates an interpretation which is extremely difficult (impossible?) to disentangle from. Remember, humans are primarily visual creatures. We dedicate way more neurons to visual processing and association than audio. Though perhaps not as easily tested or demonstrated, what happens when a reviewer is in front of an impressive looking sound system? What happens when he knows that it costs $200,000USD? What if I'm friends with the designer and he's personally showing off the gear in his room at the audio show? Do we not think that biases can be induced subtly if even the underlying physiology can be so plainly "fooled"? I know, I know, cognitive biases are things that happen to other people, right? Surely 2,000 audiophiles can't be wrong! (As I saw implied by a manufacturer recently.)
Part III. Wrapping Up...
Know Thyself.I think it is fair to say that with our perceptual and cognitive limitations, insight into truth is never complete. It is refreshing to see articles such as this on Computer Audiophile describing one's man journey, discovery, and perhaps not unreasonable to use the word wisdom.
---- Temple of Apollo at Delphi
As humans, we have remarkable cognitive ability. I would never make light of this... It is this ability to integrate feelings and cognition that gives us our fantastic subjectivity; the gift of understanding, insight, joys & griefs, sense of purpose and value to experiences. No machine (currently) can hope to even appreciate this magnitude of beauty, art, creativity - in sum, sentience.
But because we can be biased in our perceptions, even down to that infrastructure of our physiology and cognitive ability, I think it is wise to be mindful of our limitations. For example, an audio recording/analysis machine would never be fooled by the McGurk "bar" vs. "far" in the video above. It cannot be affected by physiological illusions, or emotional biases. And no doubt a decent modern recording device could "archive" audio with precision beyond the recollection of any human being's echoic memory or longterm memory storage. This is why objective measurements I believe is essential when we want to know just how accurate a piece of equipment is. This might not be all we want to know in a review but I do believe it's important as someone who cares about "high fidelity". Even more important, and as a corollary, objective analysis will also allow us to figure out if a device/cable/tweak made any difference at all. And if so, what magnitude of effect. I feel that this is an essential part of the evaluation of some devices and cables that have no other reason for existence other than claims of being able to impart sonic change based not on "evidence" beyond testimony.
On a related note, over the years, I have either heard or argued with folks who think that the whole purpose of high fidelity is "enjoyment" (therefore perhaps this discounts or reduces a need for objectivity). Sure, the hedonistic goal is important and I regularly sign off my posts with a wish that we all find enjoyment in our audio. However, I trust that the definition of "audiophile" is more than "music lover". We do not go to "audiophile shows" to chat with music salesmen nor do we typically read audiophile magazines to get the latest scoop on favourite artists and new albums of the week... No folks, an "audiophile" is more than a music lover, let's face it, "he" is a lover of the hardware and the technology. He is a practitioner of "high fidelity". While there are elements of art and design in hardware, we buy them for what they do. The technology and science inside the box is meant to produce "good" sound. While we may disagree what "good" sound is, I choose to use technical accuracy as my guide (the 'ideal') to what high fidelity means... Others may choose a more "euphonic" character and that's fine as well so long as we understand and can communicate goals. Art and science, subjective enjoyment and engineering virtuosity are complementary and together represent the fulfillment of this hobby (not to mention modern life!)...
Bottom line: As the saying goes "to err is human..." - therefore verify; especially ephemeral auditory impressions.
As usual... Happy listening everyone.
Oh yeah, one more thing. Realize that it is not only in audiophilia where reliability of the sensory system can be questioned once tests are held in controlled settings... Consider oenophilia. Cheers!
How I see the timing domain research done in 2008.ReplyDelete
What was basically done was using a 1st order low pass filter to create a timing difference.
That filter could be 'shifted' in frequency so a certain delay was obtained.
A 7kHz squarewave was used.
Also a Grado RS1 headphone that is reported to have a frequency response of 12Hz to 30kHz.
When we look at plots taken from these headphones we can see that 12Hz is down -20dB opposite 1kHz.
As measurements of headphones usually stop at 22kHz it could be fair to assume 30kHz is down -20dB as well.
-20dB is often used in the headphone world when no cut-off points are specified so you can publish 'impressive' frequency response numbers while not lying.
measurements I have seen suggest 21kHz is down -6dB already.
The RS1 seems bandwidth limited well below 30kHz when we consider a real bandwidth +/-3dB .
Anyway, a first order filter was used which could be either in line or not during the test.
to create 10us delay (which proved to be quite distinguishable) you need a 6dB/oct (1st order) filter that has a -3dB point of 12kHz.
At 7kHz that filter is down just ((but audible) 1.35dB
That same filter is down -3dB at 12kHz and -6dB at 20kHz.
Even using a 7kHz sinewave a level difference of 1.3 dB will be audible.
Might the perceived difference be amplitude related rather than timing in this test ?
I did not see any remarks about the amplitude difference also being 'corrected for'.
7kHz is -0.4dB.
Those that ever attempted to hear level differences (I did this for my own education) can tell you that when switched differences of 0.2dB can be detected so 0.4dB level difference can be heard.
such a filter is 6dB/oct and 23kHz (-3dB)
One would say... well 23kHz is well above the audible range BUT 10kHz is -0.7 dB already and 20kHz = -2.4dB
Not to mention the phase shift being present in the audible range I KNOW such a filter to give audible differences.
the audibility threshold in the test seemed to be closer to 4us:
7kHz = 0.2dB (what I found for my own hearing to be my limit as well when it comes to level differences)
10kHz = - 0.3dB, 20kHz = 1.3dB, the -3dB point of this filter is 33kHz (the headphone used in this test already is down over -20dB)
I would qualify such a filter to perhaps just barely be audible when A-B switching while using music (even with CD 44.1kHz)
The filters at 5us and certainly 10us WILL be audible when using music.
Just my POV and could have misinterpreted the test though.
Should they have played music using these filters it would most likely have been even more audibly evident and they might have come up with the same 'filter' as audible limit.
This largely disqualifies any 'slow roll-off' digital filters and non-filtered NOS DACs as good DACs that start to roll off in the audible range.
here's something to think about.
If all timing differences we can hear is around 4us then why worry about timing 'artefacts' smaller than 100ps ?
100ps = 0.1ns = 0.0001us
Yes ... timing differences 50,000 x smaller than those found in these test.
Then there is the question whether or not a single tone is easier to 'analyse' for a brain than a (complex) music signal and whether the brain is really fussed about the timing aspect in music (<10us).
In any case I don't subscribe the view that we need > 200kHz sampling rates (to reach 100kHz frequencies).
I think a flat frequency response up to 20kHz (-0.1dB) is all we really need.
The 3mm distance between drivers they found in the earlier test is similar to moving your nose 1.5mm to the left while listening..
Maybe we should all listen with our heads clamped firmly to our listening chairs as to not have a few microseconds of timing differences to upset our 'imaging' abilities of stereo recordings ?
Nice Frans! Excellent dissection of the papers and perspective on the "meaning" of what they actually did in the context of real-life listening!Delete
I agree, these tests suggest effects so *miniscule* that it'd be ridiculous to even really care... I'm still looking forward to see what becomes of this MQA story. Especially what the outcome sounds/looks like and how they convert those FFT frequency domain "triangle" images into actual waveforms and time domain claims in a "lossless" manner!? :-)
They are obviously abusing the term lossless to mean "psychoacoustically transparent." Since it's doubtful that anyone can hear those frequencies, they don't even need to store anything at all in order to meet that goal.Delete
You talk about timing and relate it to sampling rate. It is a common misconception that timing accuracy is limited by the sampling rate. This is not the case. At a given sampling rate, the phase accuracy of a tone below the Nyquist frequency is limited only by the sample precision, i.e. bit depth.ReplyDelete
Thanks for the feedback Mans! Appreciate input on this detail I neglected... Been reading over on HydrogenAudio some of the discussions on this. Great stuff!Delete
BTW: For those unaware, Mans here is the "man" who put DSD into SoX recently:
Since you're in the bowels of SoX... I have 5 more words to consider:
Open. DSD. Format. Taggable. Compressible.
* Pretty please * :-) ".dsf" and ".dff" SuX. We realy need something new.
Maybe JRiver might be interested in a new feature like this for version 21?! It's time for a "FLAC for DSD" IMO. Maybe Matt would be interested...
Adding some form of compression to DSF would be fairly trivial. There's even a header field called "Format ID" where only currently defined value is zero for "DSD raw."Delete
"Only" limited by bit depth? Surely you mean "also", and more to the point: that the timing requirement can be satisfied _either_ by having sufficient bit depth _or_ by having sufficient bandwidth. Because it's limited by both, and you can choose to relax either one.Delete
But this is beside the point, there's a bigger problem with this response to Kunchur: why are you assuming you always have the full bit depth at your disposal? Most transients in music actually _don't_ go all the way from a 0 sample to a 65536 sample, they use far less of the dynamic range. So what happens when you have a much smaller change in SPL, say +6 dB (1 bit) happening over 10 us? You won't be able to reproduce it from a 44.1k sampling rate, you will still need 100k, that's what happens. :) And since the later threshold discovered by Kunchur was 5 us, you might even need 200k sampling to cover everything that humans can hear.
Really the more relevant question here is how often these ultra-fast transients actually occur in music and are they masked by the rest of the content in any relatively busy track (or will you only be able to hear them in the absolutely most minimalist tracks, where one instrument is playing one note at a time with silence in between).
Let's take that example of a 1-bit 6dB change over 10us.
What frequency would that change be presenting as? If it's beyond the bandwidth of 22.05kHz, then it would not be encoded in the signal.
So indeed, if you believe that the subtle change is actually audible, and the only way to capture that change because it's part of a 90kHz transient component is by increasing sampling rate, then by all means use 192kHz.
The time domain accuracy of waveforms below Nyquist is still excellent and Kunchur doesn't have to worry :-).
If I believe? This is not a question of if I believe, it's first a question of does Kunchur's study prove that this subtle change is audible (and do I therefore believe it until some other study or solid argument contradicts it).Delete
Here you seem to be simply agreeing with him: if humans can hear a 5-us transient as being faster than a 10-us or 23-us one, perfect fidelity requires ~192 kSps sampling rates and CD-res is just not enough.
But above you corrected that part of the article as if you weren't agreeing with him. :) And this based on a sloppy counter-argument that seems to assume all transients always span the whole bit depth (or arbitrarily large portions of it). At least if I'm understanding correctly why the supposed real limit is 1/(44100*65536). It's also possible we're mixing a discussion of timing ('when' the transient happened) with one of speed (how steep the slope was). While CD-res is more than capable of capturing as fine a timing as we could ever want, Kunchur's 5-us contention is one about speed.
Actually, no... The idea is not the same donjoe.Delete
First of all, assuming that Kunchur's results are accurate (has anyone seen replication? I'm not asking for contradiction...) and there are instances when humans can hear at 5us temporal resolution, the argument is that one should not automatically equate that to samplerate. The argument is that already 16-bits and especially 24-bits are enough for all frequencies below Nyquist to resolve temporal resolution of 5us whether we're talking about "speed" or "timing" as per your definition.
I don't think there is any issue with not using up the full bitdepth. For 16-bit resolution, is there any evidence that a properly reconstructed 10kHz wave of 10-bits amplitude as opposed to the same wave using all 16-bits is temporally more ambiguous at the same samplerate?
To clear this up, and for the sake of relevance in audiophile discussions, I think there is a straight forward experiment that can be done. In a controlled, blind test environment with highly linear equipment, provide a 24/768 (even higher than 768kHz if you wish!) audio sample of actual instruments with these temporally refined "transients" you speak of. Show blinded listening results where at normal volume levels (of course noise floor different between 16 and 24-bits), an equivalent 16/48 downsample is inadequate to discern these "transients". It doesn't matter if the signal includes either "timing" or "speed" or both as per your comment above.
My suspicion is that when done properly, one would not be able to find significant differences even with highly skilled listeners.
"The argument is that already 16-bits and especially 24-bits are enough for all frequencies below Nyquist to resolve temporal resolution of 5us whether we're talking about "speed" or "timing" as per your definition."Delete
This is absolutely wrong. While bit depth can increase timing accuracy (placement of the transients on the time axis), it can't make the transients faster, a.k.a. steeper in slope. Steep slopes require high frequencies, you can't get away from that. Nyquist-Shannon doesn't care about your bit depth, so to speak. :)
"To clear this up, and for the sake of relevance in audiophile discussions, I think there is a straight forward experiment that can be done."
That would be great, but what we have now are only the experiments performed and documented until now. I can't discuss what hasn't been done, nor organize such an experiment myself. (In the meantime, BTW, I've been pointed to different errors in Kunchur's work, and I may have to dismiss his results based on those - and with them the whole "ultra-fast transients" hypothesis supporting the use of "hi-res" sampling rates above 48k. But I'm still only dealing with his first paper. To be continued.)
So when you're talking about "faster", you're referring to a shorter rise time which of course would be correlated to samplerate. But that's also correlated to maximum frequency of the "sound" you're digitizing.
It goes back then to the question of what frequency one believes the hearing mechanism is able to detect or one "needs" in a recording! If we believe that the human ear/mind can appreciate very high frequencies and we need to capture that "steeper slop", then have at it. However, generally most accept that 20kHz is the upper limit of hearing so there's only so much of a "steep slope" we actually need to capture accurately and 44.1kHz would be adequate for this. To have direct evidence otherwise would require a listening test like the 768kHz one I mentioned above. Kunchur's methodology is certainly interesting in his use of specialized custom gear, verification for linearity, and use of electrical/physical displacement to have listeners detect differences. But it is a jump to claim that this necessitates higher sample rates or bit-depth beyond 16-bits as typically assumed.
On a related note, a friend who runs a studio about a year ago recorded in hi-res impulse responses like popping a balloon and dropping marbles on the ground to capture that "instantaneous" sound. Given that sounds are mechanical vibrations, real-life "impulses" do not look like the straight, ultra-steep rise of a square wave. Let me see if I can dig up those files to show the impulse waveforms and FFTs of the frequencies. If I can find them, I'll make sure to put up a post.
Anyhow... Would love to hear what information you've been pointed to about Kunchur's work. Thanks for the discussion.
"It goes back then to the question of what frequency one believes the hearing mechanism is able to detect or one "needs" in a recording!"Delete
Quite right. At first I thought Kunchur was simply dealing with single/sporadic ultra-fast transients and avoiding the issue of continuous ultrasonic sine wave detection, but I read the paper more carefully yesterday and that's actually exactly what his experiment was doing: his subjects detected the difference 100% of the time between an unfiltered square wave and a low-pass filtered square wave with a first-order RC filter taking the signal down -3 dB at 28 kHz. So it appeared that subjects were able to detect the presence or absence of ultrasonic odd harmonics in a complex tone.
That is, until you look at how he dealt with the potential confounder of discrimination based on RMS loudness difference: his tones would differ by 0.25 dB RMS and he claimed based on a 1977 paper that the threshold for hearing such a difference at 7 kHz was 0.7 dB. This latter paper seems to be quite far from the truth - I at least was able to detect 0.4 dB at 7 kHz every time, in a quick test, without a particularly quiet setting and without headphones. So this seems to be his paper's fatal mistake. Nothing to do with time-alignment (not until his second paper from 2008 anyway).
"recorded in hi-res impulse responses like popping a balloon and dropping marbles on the ground to capture that "instantaneous" sound"
That would be fantastic to see in the context of this discussion, yes.
Indeed you cannot 'compress' a DSD stream, at least not in the way you can with PCM.ReplyDelete
The reason for that is that in the DSD stream there are no digital 'words' that can be 'written' in a different way, which describes the same 'word/value', but takes up less bits.
DSD is a constant stream of '1's and '0's constantly 'changing' its value and there are hardly any 'continuous' high or low values that could be 'written' differently so that the number of stored bits can be made smaller (compressed in the number of bits).
Compression factors would thus be very small and not worth the trouble at all.
Chopping the stream up in small 'chunks' and determine the average value of that chunk and then write that away with a smaller amount of bits would be like converting the stream to a sort-of PCM format and won't be lossless.
That stored value would then have to be reproduced by (about ?) the same 'bit pattern' again but the actual essence of DSD would already have been lost.
You'd be right if compression was as simplistic as you present it. Fortunately, it is not.Delete
At it's most basic, compressing a data stream amounts to splitting it into symbols of fixed or variable size, then encoding these symbols using variable-length codes chosen based on the statistical distribution of the symbols. Huffman coding is one commonly used method of mapping a symbol stream to codes. Arithmetic coding is another. Generally, this process is known as entropy coding.
To improve the efficiency, most compression algorithms perform some kind of reversible pre-processing of the input to create a more favourable distribution of symbols sent to the entropy coder. Gzip takes advantage recurring patterns in the input by having each symbol represent a back-reference distance and length pair, all in units of bytes in the data stream. More sophisticated pre-processing as used in e.g. LZMA gives better compression ratio at the expense of computing time.
It is correct that if single-bit DSD samples were directly input to an entropy coder, it would do very poorly since on average there are an equal number of zeros and ones. DST solves this by using a linear predictor on previous samples to estimate the next one and encoding the difference. If the predictor is good, there will be more zeros than ones in the difference stream. On decoding, the same predictor is applied to already decoded samples and the decompressed difference stream is used to correct the prediction and obtain the original sample values.
Typical compression algorithms operate on byte streams rather than bit streams, but despite the mismatch, they can still be used to compress DSD data. Gzip with default settings achieves a compression ratio of about 0.7 (30% reduction), LZMA about 0.6. As expected, this is not as good as DST at its best.
If we want to devise a new storage format for DSD, the first thing to do would be running a few tests with different combinations of predictor and entropy coder and find a suitable tradeoff between efficiency and complexity.
One problem with DSD is that the (high-frequency) quantisation noise introduces what is effectively a large amount of randomness in the stream making compression difficult. There's not much to be done about that in the general case.
I have no knowledge about the actual math involved with various compression techniques.ReplyDelete
The quick explanation is indeed rather simplistic but is essentially showing the problem.
The constantly changing, very high frequency and seemingly random 1's and 0's are an important part of the signal which cannot be left out nor are as compressable as PCM bytes can be due to the nature of the signal (noise).
Certainly not to file sizes similar to a FLAC file with a 'comparable perceived SQ'.
DSF is probably as far as one can go but it still remains a large file size.
Too bad you cannot remove that quantisation noise from a DSD file before it is stored and can only be removed in the analog plane afterwards .
Would different (less 'random') quantisation noise or very different compression techniques be able to yield much higher compression ratios ?
It's probably possibly to improve compression by coupling the sigma-delta modulator with the compressor. However, that's obviously not going to help for compressing existing files. There may also be a tradeoff possible between compression efficiency and SNR.Delete
This comment has been removed by the author.Delete
Here is another great article worth reading: http://www.weiss.ch/assets/content/41/Can-You-Trust-Your-Ears.pdfReplyDelete
Nice Brent. A classic from 1997!Delete
Dear Archimago how I can contact you by mail icant find the way for advice about dac that I want to buy. That's it.....,JovanDelete
You can try PM'ing me on Computer Audiophile or Squeezebox Forum. I'll check those places every once awhile :-).
I'm picking up on the part of your article about hearing loss with age. You suggested the following:
"This can change preference, for example some older individuals may like a slight high midrange (2-4kHz) boost, but this would be excessive to a 25 year-old"
I am going to suggest the exact opposite is true, and in doing so, I will offer a theory as to why so many audiophiles lust after analogue, or tube sound.
With hearing loss, the sounds that one lives with, every moment of every day, are those that have a 10 - 20 dB drop for any and all sounds in the kHz range. Looking at the graphs of hearing loss you posted, this means that for anyone over 40, normal hearing doesn't offer the sparkle or tizz that a 20 year old hears. It also means that older people have become habituated to a sound without those shiny sparkly bits. Think about going to a cinema on a bright afternoon; you come out into the late afternoon sunshine, and automatically seek shade, or sunglasses. Fortunately our sight adjusts, and 20 minutes late the bright afternoon sunshine becomes normal; unfortunately, as we get older, our hearing doesn't come back and that extra 3 or 6 dB boost simply grates, because it does not match what has become 'normal' to us.
No wonder that so many audiophiles (a demographic that includes a large number of people who bought Jimi Hendrix albums on their original release) enjoy analogue sound and tubes. The limited range and dynamics closely mimics what older people hear all the time. High Fidelity is absolutely not about accuracy, it is about faithfulness to the sound that you believe is normal; the sound that you hear and enjoy in every waking moment of every day.
Having said that (and nearly, but not quite being of the 'Jimi Hendrix first time around' age) I do believe that our hobby can be very enjoyable and rewarding, but it's also very personal, with damn few rights and wrongs. That's true irrespective of age, and we should respect each others differences, not deride those whose preferences don't match ours.
Another thought, and on a recent topic you wrote on: on room EQ, the preferred response curves seem to be ones with a slight bass boost and a steady falling off as the frequency increases. For sure, the evidence suggests that not many people like flat. Perhaps (and not surprisingly) that's because people are inclined to prefer the sounds they have become habituated to.
My theory is that as long as the degradation of the frequency response of hearing goes very gradually you simply aren't aware it is occuring. Only an audiologist can tell (and often he doesn't measure above 8kHz) if your hearing has become less sensitive than a benchmark.Delete
The way I see it the brain is (constantly) 'calibrating' itself to natural sounds we hear day in day out all around us. That becomes the 'reference' for our brain.
When a young and old person would stand in the exact same spot in front of a speaker, instrument, band, orchestra or whatever sound soundsource the exact same sound waves will reach their heads.
Both will 'hear' a real piano as a real piano, the live band as the live band, all sounds as the same sounds etc.
Yet, the 'signals' that are 'emitted' from both person's cochlea nerves will differ for many reasons.
The brain regards the 'incoming data' from (all) of our senses as 'this is reality' and the next time you hear sounds played back with the same spectral energy your brain will interpret this as 'correct'.
That is... providing the ears are not to damaged or too old.
Those that have ever had to have their ears rinsed out will have experienced that sudden hear loss is very noticeable.
Also as soon as the ear canal is opened again you seem to hear 'louder' and much 'clearer/brighter' than you remember hearing before your ear got blocked.
Strangely enough after a few hours that seems to have returned to 'normal'.
So I think a 'boost' here and there, because of old age, is just as sound-degrading for old and young people alike because it doesn't jive with our own 'built in reference'.
A boost (even in the upper treble) will still be perceived as a boost because our brain is referenced (used to) sound spectrums from reality.
That is IF the recording/playback is actually of high quality and 'personal preference for a certain sound' is left out of it.
Some people simply prefer a boost/dip here or there or have transducers which are anywhere but 'flat' and needs lifts or cuts here and there to come closer to 'reality' or their 'preferred reality'.
This, of course, depending on recording quality and even music genre as well.
I should better not openly speculate as to why most audiophiles are above 50 years old (in general) and say they prefer 24 bit over 16 bit or 192kHz or DSD256 over redbook and feel they need frequencies above 16kHz to 'enjoy' sound.
Yet, they should arguably have enough bandwidth available listening to old fashioned FM radio (bandwidth wise, disregarding the huge amounts of compression in that medium).
If I should speculate as to why more 'aged' people enjoy audiophile sound more I would guess it would have to do with 'experience' and 'appreciation' of differently paced music styles and recording quality when being exposed to 'higher end' sound reproduction over many more years.
This doesn't mean some younger people (with arguably better hearing) can't enjoy high-end either when they know what to 'look for'.
To me the prefered 'analog/tube sound' people seem to have has everything to do with it having 'pleasant coloration' and not with accuracy.
Excellent discussion and points Bob and Frans.Delete
1. On subjectivity: Yup, at the end of the day, whether one enjoys something or not is one's alone. I don't argue with people who state a preference about liking mellow sounds, are OK with high noise floors (eg. CD's with tube stage come to mind), or certain EQ settings. That's fine, so long as we all understand what's being done and that these things deviate from technical accuracy... Just as much as there is no reason to criticize someone's preference of musical genre.
2. On EQ: No problem here either... As Bob mentioned, as per the discussion on "targeting" a preferred curve when it comes to speaker/room correction (in fact, I was running some measurements with Acourate last night using -6dB at 20kHz curve talked about by Mitch and Bob Katz). EQ'ing is part of life and tonality decisions are made all the time whether by artists or producers, mixers, etc... No reason why the home listener can't tune it for themselves.
3. On aging and frequency response: This is of course complex. Habituation happens and we get used to the physiological change in hearing. My main point is of course that the hearing mechanism isn't perfect so as an "instrument" of sonic evaluation, we have to be mindful of what is happening over the years. Over time, it doesn't just become an adjustment to not experiencing the same "sparkle or tizz" - it eventually becomes the ABSENCE of "sparkle and tizz". When subjective reviewers of age are talking about how "extended" the high end is, we have to be mindful of the fact that the "extreme" high end could be 8-10kHz based on their hearing ability.
4. On aging and tonality: The other day I was listening to Telarc's 1812 and comparing it to my memory of it. I hadn't heard it in years but did enjoy it >10 years ago when it was first released on SACD. I noticed that the sound of the triangle in parts wasn't as "sparkly" or obvious as I recalled. Should I raise the treble on my system based on my memory? Could this be reflection of high-frequency hearing loss? I don't know... Maybe it's just a reflection of the EQ in my "house curve", or that my speakers don't have a titanium tweeter in them like my main speakers 10 years ago. At the end of the day, it doesn't matter for me but one could imagine in some circumstances, if I were in the studio mixing, I might be inclined to boost the instrument in the mix so I hear it better. If I were a reviewer listening to this SACD reporting on certain speakers, I might mention something like "these speakers do not deliver the ultimate high frequency extension compared to the best I have heard". Which brings us to...
5. Why do audiophiles like rolled-off? I think there is a euphonic component to this not just physiologically but also to do with recording quality which Frans hints at. Physiologically some roll-off and moderation decreases sibilance in vocals, the "Gundry/BBC" dip helps decrease harshness. But I also wonder if there's another reason which has to do with modern music and how it's made. "Loudness war" music often is clipped which creates all kinds of high frequency and unnatural distortions. Plus we have to wonder to what "standard" much of the newer music and remasters we listen to were produced in mind. If the producers target boomboxes and car audio rather than a good sound system capable of full range audio, perhaps they might want to unnaturally boost the bass. Or if we're targeting Apple earbuds and Beats Solo headphones, by gosh I'd want to boost the treble to spice up listening quality through these lacking transducers. Even much of the modern female vocals (which audiophiles like), say Diana Krall's recent "Wallflower" IMO is too harsh and forward when listened to "flat"; certainly rolling off the top and smoothing out the presentation even if ultimate resolution takes a hit could help. NOS DACs and tube output stages in CDs can also help with this beyond amplifier and speaker selection. I have never published some measurements I did on the ModWright Oppo BDP-105 mod for example, and this is exactly what I see; some people like this effect of course.
1. Remember the ear/brain mechanism and the limitations in acuity, age, and perception. We're free to discuss our perceptions of course but we just have to be mindful of how much of what we say is accurately perceived ourselves and how words would be interpreted by others.
2. Taking a "more objective" approach allows us to understand and take control. IMO it's always better to take a technically accurate, neutral device and tune it to our subjective tastes. In doing so we not only get to understand more about the hobby of high fidelity, but also ourselves - our own biases and limitations... As opposed to getting excited about the testimony of others and not sure of what kind of technical standard one is actually getting beyond advertising claims much of the time.
Archimago (and Frans),Delete
Many thanks for your insightful reply to my considered hunch.
Motivated by your post on room correction, and moving further on that journey that was initiated following a reading of Mitch's outstanding contribution on Computer Audiophile some time ago. I took to trying out room (or is that speaker) equalisation. For me it has made a big difference; not as much as a new pair of speakers, but more than say, a new CD player, and for sure much more than any other tweaky things like cables or other 'enhancement devices'.
I tried 3 different software packages, and for me the best (which was also the cheapest) has come through using MathAudio. EQ is now in my system, and it's going to stay. Interestingly, and perhaps not surprisingly, the room measurement curves I got from all three packages were very consistent, though they each offered different levels of detail.
MathAudio offers a very neat way of drawing your own preferred target response curve, and that has allowed me to easily try different curves. For starters, each of the packages I tried measured an overblown bass that was easy to solve (and quite audible, but how to fix it without EQ?). For a target response curve I looked at varying degrees of slopes with mid - high frequencies, and found preference with a faster roll off above around about 3kHz. I found that, for me in my system, a 3-4 dB slope between 3 & 10kHz sounds good, and above 10kHz I can do just about anything, but I simply carried on the same slope. Dare I say the sound now sounds like analogue (but without pops, clicks, wow and other such blemishes). It was that experimentation, and a discussion with an audiologist friend, that led me to the hypothesis I offered.
I would concur entirely with you, that accuracy is a prerequisite in any good system, but it is not the be all and end all; merely the platform from which people can find their own preferences. I think that tunes with Frans's observations too. However, I am pretty sure that preferences can be much more easily navigated through EQ than through any other means.
As a further, observation: those who refute ABX testing do so on the basis that living with a component for an extended period allows for a more meaningful evaluation. I'm pretty sure that 'habituation' plays a big part in that process, how much it distorts an honest evaluation, I’m not sure. I'm also pretty sure that listening to any reasonably well balanced system would bring countless hours of pleasure if only the act of reading the latest review or trend in HiFi didn't vandalise that enjoyment by causing the mind to question what one hitherto enjoyed listening to.
I started my HiFi journey in 197- something, and let the whole thing become nothing more than a background interest for 20 years. The last 10 years or so has seen my interest rekindle as computer audio took off. It’s time to let it become a background interest again. Room EQ has allowed me to find a pleasant listening experience, and I'm not sure I can afford to endlessly try the latest and greatest. Archimago's Musings will continue on my must reads though.
Hey Bob. Thanks for your well-reasoned input from decades of experience!Delete
I've been spending some time lately with Acourate and tweaking the parameters for my system. I agree that the effect is impressive and personally, as you say it's "going to stay".
I like this:
"I'm also pretty sure that listening to any reasonably well balanced system would bring countless hours of pleasure if only the act of reading the latest review or trend in HiFi didn't vandalise that enjoyment by causing the mind to question what one hitherto enjoyed listening to."
Finally, interesting comments on how over the years audio has been in and out of being a background interest... This has been the same way with me. And like you, I think the advent of computer based audio with all that it brings is like the next step in "punctuated evolution" of sonic technology... Every few decades a very significant advance brings to the forefront rapid development in the technology and brings it out to the consumer at large. Although sonic accuracy is one of the benefits, it is of course not the only one even though much of my posts is about this... Convenience, speed of music distribution and availability, music production technology have just been phenomenal beneficiaries of this growth.
Really good to see this topic on your blog Archimago. As you say, "a topic which should really be on the forefront of audiophile discussions". In case you have not listened to these, there are several Perceptual Audio Demonstrations that one can listen and tune into ones own auditory limits. Highly educational.ReplyDelete
The section on masking, which I feel is the most overlooked human auditory limitation, both in the frequency and time domains, is very good. For an additional explanation and demo where one can adjust the level to find one's masking threshold Masking a Tone by Noise Turn the volume down before clicking start.
Listening and learning determines at what decibel level one's masking threshold is. This is important as it puts into context what is audible and what is not, both in the frequency and time domains. For example in the frequency domain using music as the test signal, my masking threshold is around -70 dBFS This lets me know that any frequency or noise based artifacts that measures below -70 dBFS I am unlikely to hear in the presence of listening to music at normal playback levels (i.e. 85 to 105 dB SPL) as the artifact is being masked by the music program level.
What would be interesting is another one of your cool internet listening tests of both frequency and time domain masking thresholds using music as masking is frequency dependent. Not to make any work for you :-) Hope you write more on psychoacoustics.
Hmmmm... Sounds good Mitch... Maybe we can chat over PM about a test that we can run over the winter months through Christmas? I'm gonna be busy into mid-Nov so won't have time to work on it, but running some kind of "Masking Threshold Blind Test" could be fun for gathering data from audiophiles! It'll be an interesting glimpse mainly at the physiological threshold of the (most likely) computer audiophile cohort... Might be interesting to see also the difference between headphone users vs. speakers.Delete
Anyhow, I'll PM you in a few weeks for some thoughts on how to make this work!
I agree with Solderdude - we are constantly learning & refining our internal auditory reference model - the main learning is when we are very young but I believe, like him that we adjust to our failing capabilities as we age. And don't for a moment think that this reference is a collection of frequencies, amplitude & timing that we compare to - no, it is a high level abstraction which contains some signature which we can compare what we hear to.
One thing that is important to bear in mind - our auditory system is not about accuracy (none of our senses are) - it's about deriving enough information from the nerve impulses to fit our needs & these needs can range from flight or fight, instant reaction to a sound to communication through speech, through to relaxed listening to a a stream flowing or a piece of music. The sense have not developed based on how accurate a portrayal of reality they give us otherwise we would have far better senses than we have - eagle eye vision, etc. See Donald Hoffman for his research into this concept.
So our focus on accuracy in our hifi systems seems misplaced - how can we hear differences in amplifiers when our speakers are two orders of magnitude more distorting than our amplifiers?
The illusion of stereo reproduction is not about accuracy, it's about how well what we are hearing conforms to our internal reference - only then do we give ourselves up to being fooled by this fragile illusion. We have to get a lot smarter about this than assuming accuracy is the important factor when it comes to the goal of what we want our audio systems to achieve - a realistic illusion.
As a result we have to be a lot smarter about measurements & test signals used, particularly when it comes trying to measure/characterise our auditory perception. Interestingly, modern research in this field has moved towards fMRI, MEG, & other electronic means of measurement of what we are perceiving, as they are also using far more sophisticated test signals & even music signals.
I see the Fletcher-Munson curves cited above but when it comes to our perception of noise we have a different set of curves i.e different & more sensitive perception of noise - look at the ITU-R-468 standard which has been around since the BBC research days - there is about 12dB more sensitivity to noise & 6.5KHz is where our ears are most sensitive
The idea of masking is again used in such a simplistic way - the concept known as comodulated masking release should be looked into
Thanks Dweeb. Interesting stuff! Thanks for the mention of the ITU-R 468.Delete
Folks, here's an interesting demonstration of Comodulated Masking Release on-line:
I completely agree with you... Ultimately we want something better - beyond just what is afforded by our typical 2-channel accuracy! But to achieve this goal the whole music recording and delivery system has to change. As you probably notice, my perspective is that we're very much "already there" when it comes to accuracy. Accuracy is a start and the foundation for a good stereo sound system, and I want it because I want to have the opportunity to hear as much as possible with as little distortion as possible; but by no means do I expect a 2-channel recording bought in a store to sound "real" when played in even the best sound system and treated room. Of course in many ways this goal is impossible because so much of the music from a studio was never "real" other than decisions made through a mixer, DSP likely applied, and arbitrarily apportioned into 2 channels!
Of course I don't think we've reached the end of the audio road in terms of technology... I'm certainly a proponent of multichannel and look forward to other technologies in the future that can envelop the listener in a "realistic illusion" including formats like Atmos. Given what I've seen of the data from neuroimaging (fMRI, PET) in terms of resolution limitations inherent in these technologies (not for audio but in other areas of research), I won't be holding my breath awaiting something out of that domain to enter consumer space soon! :-)
I'd be curious if you have ideas about what we could see in the days ahead to keep an eye/ear open for?!
Here we go folks... Meridian and MQA talk on Stereophile starting with the time domain sales pitch:ReplyDelete
"CD smears timing for 4 milliseconds" according to Stuart. Really? Anyone know a reference where this is coming from? And what kind of digital filter are we looking at?
Also: "MQA reduces ringing and smearing dramatically, over ten times more than on 24/192 recordings." Hmmm... Is that right?
Then there's this precious nugget: "Silverman unequivocally states that encoding and decoding his master file with MQA produced a recording that was clearer than the original master." Wow! Pray tell how we get something sounding even *better* than the original master? That's some magical encoder we got going!
Yet again... No A/B comparison with encoded vs. non-encoded music. Not only does this seem odd but somewhat disturbing. This is gonna get interesting I think!
"CD smears timing for 4 milliseconds"Delete
" Anyone know a reference where this is coming from? And what kind of digital filter are we looking at? "
It's the time it takes most of the digital and steep analog filters to 'ring' after (or before and after) a Dirac pulse or squarewave edge to not be visible any more on the all to familar scope shots.
One should realise that it rings on even longer than you can actually see on a (linear scale) oscilloscope shot.
When the vertical scale of those typical pictures would be in dB it would seem to ring on even further in time than what most people believe.
The energy (before) and after a 'dirac' pulse or any other signal that excites the filter ringing thus is 'smeared' over about 4ms
That is ..... IF you want to have a sharp cut-off which eliminates the energy above 22kHz.
Of course the nice filters that do not exhibit (much) ringing all suffer from mirrored signals that may become audible if the amplifier behind that DAC has high amounts of IMD above 20kHz.
DSD doesn't 'smear' as well but is handily left out of the marketing BS.
Does that pre- and post-ringing matter to a real (bandwidth limited) music signal ?
Some believe it does, I don't belong to them.
"Does that pre- and post-ringing matter to a real (bandwidth limited) music signal ?Delete
Some believe it does, I don't belong to them."
Nor do I belong to that camp Frans.
Thanks for the note on the 4ms quote. Agree. I suspect that's what Meridian would be referring to... A sharp digital filter; tight transition band with frequency extension all the way out near Nyquist. Considering that the DAC is the determinant on how the ringing would manifest, I'd really like to hear how Meridian handles this in the context of "CD smearing" and what device they're calling the standard for this spec!
Regarding "clearer than the original master" claim, it seems some kind of apodizing is involved as hinted by J.A. in comments section here:Delete
When you have an hour or so of spare time I can highly recommend seeing this video.ReplyDelete
Even MQA is discussed.
Talking about MQA.....ReplyDelete
When excluding the promotional BS, the codec itself is quite a nifty way of 'compressing' a higher sample rate to a 48kHz sample rate while being backwards compatibel with 16/48 DAC's.
When properly decoded you end up with a 192/24 'quality' file in a 16/48 container.
I wonder if an MQA file can be 'FLAC'ed' and then decoded again and subsequently de-MQA'ed.
That may yield small files that contain 192/24 quality sound.
An MQA file can obviously be compressed with FLAC since FLAC is lossless.Delete
yes, that is what one would expect.Delete
Would have thought that they would have applied other types of lossless data reduction as well, other than the inventive way they used a 24/48 container as it is now.
Obviously MQA is mostly about data reduction anyway.
I can definitely relate to your comparisons between seeing and hearing.ReplyDelete
I work with at least 100 clients a month at my recording studio, http://cdmusicmastering.com.
I would say at least 20% don't have a great concept of sound. I always say, if i hold up a black card and a white card, if you can see, NO ONE will get that wrong. But if I asked a group of 10 people, which song has more bass, which is brighter, even which is louder, 2-3 of the people would have complete opposite answers than the rest of the group. AND, they would feel they were right, like its an opinion!
Hearing is not an opinion, just like seeing isn't! If one song is louder than the other, that's a fact. If you can't tell, it doesn't change this fact.