Make Small MP3s that Sound Great

UX & Accessibility • Wednesday, February 14, 2018

If you create and share audio on the web you probably create MP3 files. MP3s enjoy wide support and have a lot of flexibility when it comes to how they’re encoded, making them a great choice for everything from sound effects to music and podcasts.

However, that flexibility can raise a number of questions, most of which boil down to a single concern: How do you strike a good balance between audio quality and file size? The higher the quality the better the listening experience, but if the audio takes too long to download there might not even be a listening experience, so it’s important to strike a good balance when encoding your MP3s.

The quality and size of an MP3 are determined primarily by three factors:

The sample rate defines the number of samples per second, and determines the range of frequencies that can be encoded in the audio. This should almost always be set to 44.1 kHz (44100 Hz), which will allow your audio to encompass the effective range of human hearing. Lower sample rates sound noticably worse, and higher sample rates waste bandwidth and disk space.
The bitrate determines the amount of data used to store each second of audio. A 128kbps MP3 uses 128k of space to encode a single second of audio, for example. The higher the bitrate, the more information is stored, and the better the audio sounds.
The choice between mono, stereo, and joint stereo determines how many audio channels the MP3 will contain and how those channels are encoded.

Joint Stereo

The joint stereo option warrants a bit of explanation, both because it’s usually the best option and because it does some neat stuff behind the scenes.

In most stereo audio both the left and right channels contain mostly the same information, with occasional differences. Joint stereo is designed to encode audio that fits this description more efficiently by encoding everything that’s the same between the left and right channels once, then encoding everything that’s different between the left and right audio channels separately.

Imagine a piece of music with the vocals and instruments distributed equally to both the left and right channels, with a short guitar solo in the middle that pans from left to right. A regular stereo MP3 will encode all of the audio twice, once for each channel, despite the fact that most of the audio (except the solo in the middle) is the same. Joint stereo, on the other hand, will only record the identical audio once, and then encode separate left and right channel information only when they differ. This results in smaller files (sometimes significantly smaller if most of the audio is the same across both channels).

To put it another way, if your source audio is mono (single channel) and you encode it as both a stereo MP3 and joint stereo MP3 (with all other settings being equal), the joint stereo MP3 will be half the size of the stereo MP3 because it’s not encoding the same information twice.

It’s also worth pointing out that there is no file size difference between mono audio encoded as a mono MP3 or a joint stereo MP3.

Joint stereo also has an important impact on how the bitrate applies to the audio. A 128kbps stereo MP3 is actually two separate 64kbps channels of audio. On the other hand, a joint stereo MP3 encoded at 128kbps uses all 128kbps for both channels when both channels are identical, and only splits the bitrate between the channels when there are actual differences.

My Recommended MP3 Settings

I always set the sample rate to 44.1 kHz (44100 Hz) for the reasons mentioned above. I also always use joint stereo unless I want to convert multi-channel audio into mono, in which case I’ll encode as mono to save myself a step.

As far as bitrates go, I use the following bitrates for these scenarios:

When I want a smaller file at the expense of some audio quality: 96kbps.
When I want to balance file size and quality: 128kbps.
When I want high quality audio at the expense of file size: 196kbps or 256kbps.

There is one exception to this. If you have audio that’s only people talking, with no music, 64kbps will do just fine, as pointed out by Marco Arment on Twitter. This applies to some podcasts, but if there’s music somewhere in there 96kbps or higher is probably best.

What About Variable Bit Rate MP3s?

So far all of the bitrate information I’ve talked about pertains to constant bitrate MP3s, meaning they use the same amount of information to encode every second of the audio. The MP3 format also supports something called variable bitrate, or VBR, which uses a variable amount of information for every second of the audio. Less information is used for less complex audio, and more information is used to capture more detail when the audio becomes more complex.

While VBR MP3s do tend to be slightly higher quality at slightly lower file sizes, the tradeoff is compatibility and a compromised user experience. Many audio players, even today, do not fully support VBR MP3 files. One common issue is skipping forward or backward in a VBR MP3 and finding yourself in a different place than you expected to be.

I do not encode any audio as variable bitrate MP3s, and I recommend you don’t either.