Playing with High-Res Audio and the Players that Play Them

Recently, we’ve been hearing a lot of noise about high-resolution audio. Casual listeners might be wondering, what is high-resolution audio, anyway? Why does it sound better? Can we even determine that it sounds better? Answering these questions requires discussing digital formats such as PCM, MP3, and DSD—but don’t worry; this won’t be a dive into the technical weeds. To the extent that we’ll address science, it’ll only be in terms of practicalities.

But first, an anecdote: Before I was a professional recording/mixing engineer, I was a guy who wanted to record his band’s demos. I started out with Logic 6, and, with no real knowledge of file formats; I’d export my mixes to MP3, so I could play them on my iPod.

Quickly, I noticed that these recordings did not sound right. Everything sounded more cramped. Highs and lows felt truncated; instruments felt caked, shrouded, masked. On a hunch, I decided to bounce a song to a WAV and give it a listen, and sure enough, it sounded more like my multitrack mix. Feverishly, I compared the WAV and MP3 files, and lo! The result was certain. The WAV won, hands down.

Now, people had always told me that MP3s were a “lossy codec,” and therefore, sounded worse. But I had to hear the difference to believe it—moreover, I had to learn how to hear the difference. Many casual listeners might be in the same boat. After all, sound is sound, right? With few exceptions, we hear sound all the time. Why should the digital format of sound matter?

In the answer lie some technicalities, for sound moves in waves, in energy disturbances through the air. These waves have a continuous aspect to them, flowing naturally (or, one could say, “fluidly”) from one point to another. There are no rigid or harsh steps in a sound wave as it moves. The digital realm, however, is fundamentally different from our analog world in one vital respect: It’s binary.

Therein lies the rub. While a sound wave is continuous and fluid, there is no ambiguity between a one or a zero. The difference between one and zero is all or nothing. Gone are the fluid motions, replaced by severe steps.

Without getting into the Nyquist Theorem, aliasing, or the rest of the weeds, let's jump into the standard way we tend to store audio digitally: The Pulse Code Modulation (PCM) process, which manifests itself in files such as WAVs and AIFFs. This is how engineers typically capture audio. PCM reduces the continuous trajectory of a soundwave into a series of snapshots intelligible in binary. One can think of the PCM process as turning physical sound waves into a kind of flipbook, with the sample rate dictating how fast we flip the book.

Audio is typically sampled between 44.1 and 192 thousand times a second; this is called the sample rate. Its dynamic range is captured and represented in a resolution of either 16 or 24 bits. This is called the “bit depth.” With these two variables, we can effectively chart the sound wave on a two-axis graph, like this:

But zoom in on that graph, and you'll see that there are in fact steps here:

Whereas a soundwave is continuous, PCM provides a result that is inherently stepped, like what you see above. It might be stepped at high sample rates and large bit depths but, still, it’s stepped. It’s a verisimilitude, not reality.

This is the limit against which modern engineers strive, using psychoacoustic tactics to turn this verisimilitude into something more believable. As we'll see, DSD attempts to get closer to the reality of audio rather than a psychoacoustic representation. But before we get there, we need to talk about MP3s—the format with which you're probably the most familiar.

MP3s also use psychoacoustic principles in their encoding, but arguably to a detriment: Here the file size is king, with MP3s being around 1/10th of a WAV in size. To shrink the data, the sound goes through a filtering process in which predictive psychoacoustic algorithms eliminate ones and zeros deemed unnecessary. In these ones and zeros are snippets of audio the filter thinks you won’t miss.

But audiophiles miss them. Highs and lows are shaved off, transients are manipulated to suit the code. Look at a spectrogram of an MP3 and you can see it.

There are plenty of YouTube demonstrations which let you hear the audio taken out of the MP3 encoding process. Once you hear these tutorials, you’ll start to notice the issues with MP3s. You’ll understand that what we gained in convenience, we also lost in quality. You’ll comprehend why it’s a “lossy codec”—why it loses something in the inexact approximation and discarding of ones and zeros.

Audiophiles attest to this regularly. They also attest to analog recordings sporting the best sound quality, even to this day—though more and more of them are getting into DSD.

So, what is DSD?

DSD stands for Direct Stream Digital, and like PCM, it utilizes sample rates and bit-depths, though the sample rates are much higher (in the millions instead of the thousands). Likewise, the bit-depth must be higher too, right?

Wrong: Whereas PCM is usually 16- or 24-bit, DSD usually gives you a whopping 1-bit resolution.

Wait, what gives here?

A lot, but you can think of it this way: In PCM, the bit-depth represents dynamic range with established boundaries. The rule of thumb is that your bit-depth, multiplied by 6, equals your approximate dynamic range. Thus, 16-bit gives you 96 dB of wiggle room from top to bottom, while 24-bit gives you 144 dB. In both bit-depths, you have set values of how loud a sound can be at any given moment, but any sound that doesn’t fit neatly into those values will be quantized—will be rounded off. This can create problems.

In DSD, however, the bit takes on a more relational identity, as there is less focus on the boundaries. The value is either louder or softer, with the guiding principle that a soundwave is never stagnant.

Here we get to 1-bit, and we circle back to binary—the zero and a one: A one if the signal is louder, and a zero if the signal is quieter. All of this is made possible by the high sample rate. Sampled at (or above!) 2.8224 million times per second, audio with this 1-bit rate is said to lose little in dynamic range.

Remember how we said that sound waves are inherently continuous and fluid? Audio captured in the DSD format is said to retain more of this fluidity than PCM formats because of the conditions related above. The result is an audio experience that, some hi-fi geeks say, more authentically replicates the analog experience.

We must mention the caveats though. First, this has been an oversimplification in the extreme; we’ve not gone into the difference between pulse-code modulation (the initials of PCM) and pulse-density modulation (which DSD uses), nor have we compared the low-pass filtering commonly employed in PCM to DSD’s complex noise-shaping filters. We’ve not discussed the principles of delta-sigma modulation. We’ve also not covered how most commercially available DSD files have been converted to PCM at some point in the mixing/mastering process, nor have we delved into how many consumer-level DAC chips convert PCM to DSD before outputting analog. Then we have issues of patent-retention (arguably a reason for DSD’s existence in the first place), and whether all of this is snake oil (some audiophiles say there’s no significant difference between high-quality PCM and DSD, while to some, the difference matters).

Why haven’t we covered this? For one, we don’t have the space and, for another, it muddles our central issue, which is this: Different file formats are going to handle the digital storage of sound in different ways. If you’re interested in hearing DSD for yourself, you deserve to have an introduction in the subject, as well as the caveats listed above.

Finally, if you’re interested in hearing DSD files, you’re going to want to procure a device to handle them natively (i.e., without conversion into PCM). At B&H, we sell many such devices, but one stands out, particularly in the category of portable audio players. That would be the line of Astell&Kern Portable High Resolution Audio Players, some of which allow you to store and play Native DSD files sampled up to 11.2 MHz (11.2 million times a second). If you want to judge for yourself whether DSD makes a difference—or if you’re simply looking to take DSD files with you wherever you roam—these are the devices for you.