7 Tips for Mixing Stellar Podcast Vocals


Here’s what you need to know right off the bat: Mixing vocals for podcasts is quite different from mixing vocals for music, where the trend is to impart sheen, sparkle, luster, and other trebly adjectives. A simple A/B test between any Gimlet podcast against any Justin Bieber song will prove the difference: What constitutes a solid, pro sound in the podcast world is a meatier affair, and an altogether different target.

Yes, your podcast vocals do need to be present. But, quite often, vocals are not the main attraction in a podcast—they’re the only attraction. There’s no context for them to rub against, no spirally synth or sizzling cymbal. Thus, when vocals stand alone, you must take certain steps to secure the proper sound.

So, let’s get to it!

De-Ess, De-Ess, De-Ess!

De-essing is the art of making sure sibilance doesn’t tear your head off. I will not dive deeply into de-essing here (I already wrote a treatise on the subject), but keep in mind that the process is especially important for podcasts: People tend to listen to episodes not on their HIFI systems, but on those little white earbuds, which often lend a harshness to the upper-mid range and high end. Therefore, it’s best to use the principles discussed in the appended article, and to deploy them as drastically as you can without calling attention to the process; keep it natural, in other words.

Pay Attention to Your Edits

Be ruthless in your editing, because the vocal will stand alone. You may choose to edit out “ums,” “ahs,” and other extraneous noises, but don’t do this to noticeable extremes. You must learn the pacing of human speech and never violate it.

Next, pay close attention to breath noises, mouth clicks, lip smacks, popping plosives, and the like. It’ll be easy to do so because they are exceedingly annoying.

Lip smacks are handled easily: As long as they don’t lurk in the middle of a word (or at the end of a breath), simply mute their regions. If they are enmeshed in a word, you’ve got choices. De-clicking software can do the trick. Or, if the smack is placed just right, you can opt to slice out the offending region and crossfade the results back together. Sometimes this works, sometimes it doesn’t.

Sonnox Oxford DeClicker - Short Duration Noise Event Removal Plug-In

Breath noises, especially those that have been chopped off in editing, are another point of concern. Plenty of plug-ins promise results, but you can often cut out breath noises by sight, isolating their regions, muting them, and using a fade on the preceding region.

The same goes for plosives: In that case, the waveform will clearly indicate where the plosive goes out of control; slice out the offending wave form, leaving the initial impact, and then conjoin the remaining regions with a subtle crossfade. It takes a while to finesse, but it works more often than crossfading for lip-smacks in the middle of words.

Fair warning: These processes will swallow your day if you don’t have software like iZotope RX 6, which has dedicated modules to handle all the above problems. Sonnox and Waves also make great software for the job. With these plugs, grunt-work that used to eat up hours now only takes…fewer hours.

Waves Restoration - Noise Reduction Plug-Ins Bundle

Learn to Use Fades, Crossfades, and Room Tone

When putting two regions together, make sure their levels correlate, and then massage your crossfades so they become indistinguishable. It takes some practice, but eventually you’ll get a feel for crossfading two regions. Failure to do so will result in obvious-sounding edits.

Make sure to have some spare room-tone on hand to loop underneath any moments of silence, for something must replace the deleted lip-smack or neutralized breath, or else the listener will hear the edit.

Here’s what you do: On a spare track, place the room tone so that it sits during the moment of silence. You’ll need audio on either side for the fades, because you’ll want to apply one to the region’s beginning and another to its end. That way, the listener doesn’t notice the ambiance creeping in/dropping out.

If you’re recording your own podcast, be sure to record room tone for use here. This isn’t always possible, especially if you’re mixing someone else’s project. In that case, grab any dead moment you can find—any pause between a question and an answer, for instance—and use that. String a few along if you must; it’ll teach you how to use crossfades.

Keep Your Plug-in Chain Simple, Yet Effective

In these situations, my chain is usually a de-esser, followed by a clean digital EQ, followed by a clean digital compressor. That’s all you need (discounting audio-restoration software and the post script below). Any DAW worth its salt should have plug-ins up to the job.

FabFilter Pro-DS De-Esser Software

Use an EQ with a frequency analyzer so you can visually identify the sounds that bother you. Then attenuate them. Don’t compress for character here; you’re just trying to control dynamics—and lightly! Too much compression and you increase the apparent clicks, pops, mouth noises, and sibilance. Not enough compression, and you’ll have to spend more time automating levels than you’d otherwise need.

Post Script: I will go farther if I think it’ll make an appreciable difference. Should I seek further polishing, my next move is a character EQ—something that adds not just a frequency curve, but a pleasurable “analog” characteristic. A Maag or a Pultec plug-in with a minimal amount of air will do just fine.

Softube Tube-Tech Classic Channel - Channel Strip Plug-In

If this upsets the balance, I may take it out, or I may use a multiband compressor/dynamic EQ very subtly in one band. For tips on how to use a multiband compressor in this context, please refer back to the de-essing article.

FabFilter Pro-MB Multiband Dynamics Plug-In

Pick Up the Drop-Offs

When people put on their “broadcast voices,” they tend to decrease in volume as they reach the end of a sentence. Once you realize that drop-offs are inevitable, your editing work becomes much more manageable, because you know what a large part of it will be.

Clip-gaining the quiet sections up is the nominal name of the game here, but be careful, it’s a balancing act: You shouldn’t increase the clip-gain to a point where the ambiance is noticeably louder. You can try a round of incremental processes, starting with clip-gaining, moving on to your plug-in chain (where compression should bring higher levels down), and then progress to volume automation.

If that doesn’t work, subtly employ a de-noiser on the drop-off region, so that level boosts don’t raise the overall ambiance. Do not overdo this though, because de-noising brings a host of problems.

iZotope RX Elements - Audio Restoration and Enhancement Software

Back off the Noise Reduction

Without a doubt, noise-reduction is the most abused tool in the podcasting game. It’s the biggest harbinger of an unprofessional sound, and it’s a deceptive process: At first, all sounds good, but in the context of additional processing and hindsight, you start to hear the sonic problems noise reduction introduces. These include artificial ringing, a marked increase in sibilance, unnatural emphasis on transients, and other obstreperous noises.

Whatever software you choose, I recommend dialing-in aggressive settings, training its learning algorithms to recognize the frequencies you wish to preserve and, then, backing off the settings until you don’t hear the artificial ringing anymore. Are you finished? No. Back off some more.

Don’t fall into the trap of reducing the noise too much; it’s far better to utilize ambiance to paint a sonic picture than to delete it artificially. In a previous article, I went into this concept in detail.

Use the Meter and Close Your Eyes

Two counterintuitive tips, I know, but they’re both equally useful. Use a meter with a momentary LUFS readout (momentary, because this scale reacts quicker than the other two options—integrated and short-term). Make sure the vocals aren’t swinging too much on the meter as the words pass you by. Try to stick around a range of four LUs, unless the material really calls for sudden quietude. If the levels are swinging too much, it’s back to clip-gaining and automation for ye!

Waves WLM Plus Loudness Meter - Precision Metering Plug-In

However, sometimes your eyes will trick you; sometimes the meters or the waveforms will tell you that everything should sound find, but it doesn’t. Throughout the editing process, you should often close your eyes and listen. Does everything sound even here? Then it is. Does something sound wrong? Then it is, no matter what your eyes tell you.


With any luck, we’ve done a decent job in getting you prepped for podcast vocal editing/mixing. Some concepts we’ve touched on in other articles, and some we’ll highlight in articles yet to come (I’m gunning for you, noise reduction!). Still, it is my distinct hope that there’s enough here to get you inspired, and get you started.

Do you have any tips for treating vocals in your podcast? If so, let us know down below, in the Comments section!