A Common Misconception About Bit Depth In Digital Audio

Updated On
Ken Theriot

You are wrong.Would it surprise you that what is commonly taught about digital audio is actually wrong? Almost anyone you talk to - even very smart and successful audio engineers - will explain digital audio as basically a representation of sound created when a computer (an analog-to-digital converter, or just "converter" in the usual common speech) takes many "pictures" of the audio very quickly. These pictures are called "samples." How often does a converter take these pictures/samples? The most common rate is about 44 thousand times per second (written as 44.1KHz). This is called sampling frequency.

Because of the fact that audio waveforms are depicted as smooth curves - especially simple sine wave tones, it seems reasonable that discreet "pictures - static representations of something dynamic (like a still pictures compared to the movie), lined up sequentially next to each other, might get close, but could never truly capture all the real sound accurately. The tops of these "picture bars" would leave a stair-step pattern along their ends, that when laid over the actual smooth sound wave would have little gaps between the flat tops and the curve. You could get close by increasing the number of pictures, but you could never fully be accurate. Well guess what? All that is wrong! In the VERY early days of digital audio, there was some truth to it. But the technology now is such now that the whole stair-step thing is simply no longer true.

Besides, the above is talking about sampling frequency and not bit-depth. We haven't even mentioned bits yet. Bit-depth refers to how many bits are used in each sample (picture) to convey information about dynamic range - how much of the source audio can be accurately represented in each sample. I wrote a much better explanation of this in the post - 16-Bit Audio Recording - What The Heck Does It Mean? It even uses a champagne metaphor;).

Anyway, there is a common explanation out there that bit depth is very much like "resolution" in video. It turns out that this is a bad comparison. We all know that 8-bit video is pretty sucky - all pixelated and stuff. But in audio, we're merely talking about dynamic range. If we only had 2 bits available per sample, we could not represent enough of the original audio accurately, so there would be a lot of just noisy hiss. More bits gives us less noise, in theory. But we only need enough bits to get the noise low enough so we humans can't hear it. Any more bits employed to reduce noise we already can't hear seems silly doesn't it? Why yes, yes it does. And it is! This always reminds me of the joke about two people running away from a hungry tiger. One of them says "I don't have to be faster than the tiger. I only have to be faster than YOU!"

So 16 bits has been the standard for audio on CDs forever. With 16 bits, that low-level hiss is pushed all the way down to - 96 decibels. That is a lot more dynamic range than we really need, and allows for very quiet audio to be heard with no audible hiss. In fact, you could even go down to 8 bits and the audio would still sound a LOT better (in terms of hissy noise) than cassette tapes!

There is fantastic article about all of this where you can delve more into the nitty and (not so) gritty (ha! A resolution joke - get it?) details in the Sonic Scoop article here.



Leave a Reply

Your email address will not be published. Required fields are marked *

linkedin facebook pinterest youtube rss twitter instagram facebook-blank rss-blank linkedin-blank pinterest youtube twitter instagram