High-resolution music has been in the news over the past few days. Neil Young’s Pono, recently announced, is a new music player designed to play high-resolution music files. Pono will also have a music store; users will be able to buy high-resolution music files and sync them to the Pono Player, in a process that could be as seamless as using iTunes and an iPod.
High-resolution music files cost more than other digital downloads, and cost more than CDs as well. But are they worth the money? Can you hear the difference between a CD and a high-resolution music file?
The answer is most likely no. While there may be a small number of people who have the necessary audio equipment and good enough ears to hear this difference, those people are few and far between. Most people cannot even tell the difference between a high-bit rate MP3 or AAC file and a CD, let alone a high-resolution file.
But digital music purveyors market high-resolution music in an attempt to make purchasers think that they are special, that they may, indeed, be one of the few people who can hear the difference between CDs and high-resolution audio files.
So what exactly is high-resolution music? Why couldn’t it sound better than CDs? And why doesn’t it? You can’t test the subjective experiences of listeners, so how much of that experience is just an expensive placebo effect?
For any discussion of high-resolution music, it’s important to clear up some terminology. When you see high-resolution music files, you may see them described as, for example, 24/96. This means the music in the files is 24-bit, and 96 kHz. While high-resolution music comes in a number of different levels of quality, I’m going to focus here on the most common high-resolution files, which are 24/96.
Let’s begin by explaining the specifications for audio CDs. The Red Book standard specifies not only how CDs are manufactured, but also how recorded music is formatted for them. Audio CDs contained two-channel linear PCM audio  at 16-bit and 44,100 Hz; this is commonly abbreviated as 16/44.1. There are two elements here: the bit depth, which is 16-bit, and the sample rate, which is 44,100 Hz.
Bit depth affects the dynamic range of music as well as the signal-to-noise ratio. The dynamic range of music is the difference between the softest and loudest parts of the music. A good example of music with a very broad dynamic range is Mahler’s third symphony. Listen to the final movement, and you’ll hear some very soft sounds as well as an extremely loud sounds. Or listen to Led Zeppelin’s Stairway to Heaven; it starts with a soft acoustic guitar and builds up to a fuzz-box crescendo.
The bit depth is essentially the number of variations a recording can choose from in a given slice of time. 16-bit audio allows for a range of 65,536 possible levels; 24-bit audio increases that to 16,777,216 levels. However, between the threshold of hearing and the threshold of pain, humans cannot distinguish enough of these volume differences for this to be noticeable.
The second number in our pair is the sample rate: this is the number of “slices” of audio that are made per second, and are measured in Hz (Hertz). 44.1 kHz means that the music is sampled 44,100 times a second; 96 kHz means it is sampled 96,000 times a second. The sample rate primarily affects the range of frequencies that can be reproduced by a digital music file.
And the combination of the two determines the size of audio files. A CD can contain up to around 80 minutes, but if it were encoded at a different bit depth or sample rate, it would contain less music. A four-minute piece of music on a CD takes up 41.1 MB; at 256 kpbs (AAC or MP3), it takes up 7.5 MB. But jump to a 24/96 file and it is around 138 MB, though, using lossless compression, it can be shrunk by about 1/3 to 1/2 of its original size.
Is Bigger Better?
This is where the marketing comes in: bigger is always better. It could seem logical that higher numbers would result in better sounding music, but this isn’t the case. Let’s take the bit depth. 24-bit music, according to the marketing department, sounds better than 16-bit music. Yet 16 bits are more than enough to cover what human beings can hear. Too broad a dynamic range can be harmful; if you set the volume to hear the quiet parts of the music, the loud sections could burst your speakers, and hurt your ears.
And that sample rate? Interestingly, CDs use a sample rate, as we saw above, of 44,100 Hz; not a random number at all. This number was chosen because the highest frequency that humans can hear is around 20,000 Hz. According to the Nyquist theorum, the sample rate of music must be at least twice the maximum frequency that humans can hear. Since it’s best to leave a little bit of wiggle room, audio engineers took 20,000 Hz, multiplied it by two, and then added bit of padding, just in case. Most of us don’t even hear up to 20,000 Hz: and, as we age, our hearing deteriorates. I can’t hear above around 12,000 Hz; you can test your hearing here.
Yet high-resolution audio files at 96 kHz can reproduce sounds up to around 48,000 Hz. Dogs can hear sounds that high; but not humans. In fact, it’s very likely that your stereo system cannot reproduce sounds at such levels. Most standard stereo equipment reproduces sounds from 20 to 20,000 Hz. So for ultrasonic sounds to be reproduced, every element of the audio chain needs to be able to reproduce these sounds. If your amplifier can go up to 40,000 Hz, but your speakers or headphones cannot, no amount of voodoo or magic can make high frequencies audible.
While it is certainly possible to have stereo equipment that can reproduce ultrasonic frequencies, you’ll never hear them. Yet, very high sample rate music files can actually cause distortion. As an article on xiph.org says, “If the same transducer reproduces ultrasonics along with audible content, any nonlinearity will shift some of the ultrasonic content down into the audible range as an uncontrolled spray of intermodulation distortion products covering the entire audible spectrum.” There are a lot of $10 words in a sentence, but what they mean is that very high sample rates — in this case, 24/192 – can actually make music sound worse; harmonic distortion can occur when the ultrasonics intrude on audible frequencies.
On top of that, hardly anyone can distinguish music at high sample rates from CDs. A number of blind studies have proven this, time and time again.
“Music as It Was Intended to Be Heard”
One of the biggest marketing arguments for high-resolution music files is that “this is how music was intended to be heard.” Pono Music says, “[Musicians] want their music heard and experienced the way they brought it to life with great care and commitment, in the studio.” This is how the music was recorded; this is how engineers heard it when they edited the music. Therefore, this must be better.
Two elements separate the recording studio – or, more correctly, the engineer’s control room – and home listening spaces. First, control rooms have high-quality monitors (speakers) which are neutral, and which are designed to provide the best possible audio fidelity. Second, control rooms are completely soundproof rooms with no parallel surfaces and completely absorbent walls. Again, they are designed to have no obstacles to reproducing the music as it was recorded. But you won’t have that at home, unless you have a very expensive listening room (and there are some people who go to this expense).
Some websites sell high-resolution files under the moniker “studio masters.” And, in fact, these files are studio masters; what engineers used in the studio. But that doesn’t mean that these are files that we should use when listening to music, and it certainly doesn’t mean that they’ll sound the same on home audio systems.
There is a very simple reason why engineers use high bit depths and sample rates when recording music. Digital music involves a lot of calculations; when you make changes to music, with equalization, speed changes, etc., you are multiplying and dividing numbers. When mixing and mastering an album, an engineer performs thousands of operations to alter sound. Each one of these calculations — to simplify — leads to numbers being rounded off. The bigger the numbers, the less of a chance there is for rounding errors to affect the music. But this doesn’t mean that we, as listeners, need the same types of files. We don’t manipulate these files; we may change volume, or even use some subtle EQ, but that’s it.
In some ways, suggesting that listeners need studio masters is akin to saying that instead of eating sausages, we should get all of the ingredients put together ourselves. Nevertheless, you will find many vocal audiophiles will provide a number of reasons why they need to listen to music files that contain sounds that they simply cannot hear.
However, if someone really wants to provide “music as it was intended to be heard,” they’d do a lot better to look at the mastering process that’s been destroying music in recent decades. Colloquially known as “the loudness wars,” music producers, prodded by record labels, use dynamic compression to increase the overall volume of music, making it sound horrendous. Since, in general, louder sounds better, or brighter, when you compare two songs, producers have been cranking up the volume to make their songs stand out. But string together an albums worth of overly loud tracks, and it’s fatiguing. But it’s a war of attrition, and our ears are the losers. No high-resolution files will make this music sound better, ever.
Also, mastering is often done by someone other than the recording engineer, and someone who may not have been involved in the recording process. So is this music truly the way the artists and engineers intended you to hear it?
As I said in the title of this article: music, not sound. There is a small minority of music listeners who are obsessed by the idea of obtaining “perfect” sound. They go to great lengths, and great expense, to try and reproduce the sound that one hears in a concert hall. By focusing on sound quality alone, it can be easy to neglect the music. Such people may get frustrated if the music doesn’t sound good enough, and find it hard to become immersed in great music.
I’m a music fan. What I want most of all, is good music. Some of my best listening experiences have come on tinny record players or booming car stereos. If the music is good, then the sound quality is less important. This said, without getting obsessive, there are a number of ways you can make your music sound better without maxing out your credit card.
For portable listening, start by getting rid of those white earbuds in a bundled with your iPod or iPhone. Get better earbuds, or get proper headphones. With headphones, you get what you pay for, up to a few hundred dollars. After that price point, it gets a bit iffy.
If you listen to music on your computer, get rid of those little desktop speakers and hook up a real stereo. I strongly recommend getting a good DAC — a digital-analog converter — because the sound card in your computer is probably not great. (Though no DAC will help if your amplifier and speakers are poor.) I have a DAC between my Mac and my amplifier; I find that it does make a difference, providing a more detailed soundstage.
And if you’re listening to digital music — you’re reading this article, so I assume you are — make sure it is at sufficiently high bit rates. Apple’s iTunes Store sells music at 256 kbps, which, for nearly everyone, is indistinguishable from uncompressed music. If you use MP3 files, go for 320 kbps; it should sound just as good as CDs as well.
But unless you’re willing to spend as much money on your stereo system as you do on your car, and set up an acoustically-controlled room, there is simply no way that high-resolution files will make any difference to the music you listen to. Lots of people try and convince you that there is a difference, but most of these people simply want to take your money. And you have to ask yourself: of the ones who aren’t asking for your money, how many are desperately seeking validation for the very large sums of money they’ve spent on something modern science tells us they cannot hear.
I consider high bitrates to be at least 256 kpbs for AAC or 320 kpbs (or VBR V–0) for MP3 files. Check whether you can hear the difference: http://www.mcelhearn.com/can-you-really-tell-the-difference-between-music-at-different-bit-rates/ ↩
The most common high-resolution music files are 24/48, 24/88.2, and 24/96. Pono will offer files up to 24/192, and some companies sell files up to 24/384. ↩
Linear pulse-code modulation: https://en.wikipedia.org/wiki/Pulse-code_modulation. ↩
One must not confuse bit depth and bit rate, which is used to describe how much data is in a music file per second. For example, 256 kbps means that there are 256,000 bits of data per second of music. ↩
See Is Bits Really Bits?. And, how about a test? Check whether you can hear the difference between music at 16 bits, and the same music downsampled to only 8 bits: The 16-bit v/s 8-bit Blind Listening Test. I got 7 out of 10 when I did the test; that’s better than random. ↩
There are also some other technical reasons why that specific sample rate was chosen. “Professional video recorders were originally used to prepare CD master tapes because they were the only recorders capable of handling the high bandwidth requirements of digital audio signals. Because 16-bit digital audio signals (and error correction) were encoded as a video signal, the sampling frequency had to relate to television standards’ line and field rate, storing a few samples per scan line. […] With three samples per line, 490 x 30 x 3 = 44.1 kHz, it is just right. […] Therefore, 44.1 kHz became the universal sampling frequency for CD master tapes. Because sampling-frequency conversion was difficult, and 44.1 kHz was appropriate, the same sampling frequency was used for finished disks.” Principles of Digital Audio, Sixth Edition, Ken C. Pohlmann. (Amazon.com, Amazon UK) ↩
See The Future of Music and, for a more technical explanation, ‘Dynamic Range’ & The Loudness War. And The Dynamic Range Database is a list of more than 50,000 albums, showing their relative loudness. ↩