Take a deep dive into the world of acoustics and learn how to harness its principles to create exceptional voice recordings. From understanding sound waves to mastering room treatment, this guide equips you with the knowledge to transform your audio!
If you want to find the secrets of the universe, think in terms of energy, frequency, and vibration
Nicola Tesla
90-95% of the quality of your dry voice over tracks is far more dependent on the acoustical characteristics of the room you record in, as opposed to the cost or brand of the equipment you purchase. Get your room acoustics right and the rest becomes easy
This is a guide for any podcaster, narrator, voiceover or content creator who records their voice and wants a more pristine sound. It covers everything you need to know to understand what sound is, how it behaves and how to work with it to get a perfect sounding recording.
Ancient Greeks came up with the concept of acoustics, so you know it's complex:
The mathematics behind acoustics goes deep, we're talking Pythagoras of Samos deep, but the application of good acoustics is easy once you've understood just what sound really is, and how it works in your recording space.
This is my best effort at translating complex theories and equations into simple to understand layman's terms: the aim is to help content creators and voiceover artists know more about the science behind their art. Most people bung some foam on the walls and in the corners, but is it needed; if so why, and what does it do? I imagine most would answer 'it stops echoes', but would they be right? Kinda yeah, kinda no.
A Voice Over is a musician who plays the world's greatest musical instrument, even if it is just to say 'Hello and thank you for calling' or 'Like and subscribe as it really helps the channel'.
Having a little understanding of how notes, or frequencies, interact through harmonics and resonance will come in handy.
Harmonics are the mathematical relationships between frequencies
Pythagoras tied a string to a fixed point, held it tight and plucked it so a note was produced, he realised by halving the length of the string the note doubled in pitch, and by making it twice as long it halved in pitch: he figured out music is mathematics.I'm guessing the guy had pretty good hearing, as in pitch perfect!
Pythagoras was the first to work out the link between string length and pitch, or wavelength and frequency as they're now commonly known.
Whilst Heinrich Hertz gets the credit for figuring out that sound exists as waveforms and is based on cycles per second, Pythagoras beat him to discovering the same relationship by two thousand years, but he's best known for tormenting kids with sums about triangles.
The lowest frequency anything vibrates at is called the Fundamental Frequency
When Pythagoras plucked the string at its longest, it vibrated at what is known as the first harmonic, that string's fundamental frequency, and by halving the distance it vibrated at the second harmonic which is also known as the first overtone.
When he halved the string length, he doubled the pitch of the note, and made it increase by an octave.
Harmonics are always based on multiples of the fundamental frequency, using 55Hz as an example, it has harmonic relationships with every frequency divisible by 55Hz.
The following table shows how a fundamental of 55Hz relates to A1 on the piano, along with its harmonics and the related octaves.
There are also undertones where the fundamental can be halved, quartered, ad infinitum: 27.5Hz and 13.75Hz are both undertones of 55Hz.
You can play a single note on a musical instrument and the other notes with which it shares harmonics will sing along by themselves through resonance; they'll be quieter than the fundamental, but their choir will still be heard.
Using people as an example, you either get someone's vibe and resonate with them or you don't
Everything resonates and has a fundamental frequency: Earth and the human retina resonate at 7.83Hz and 19Hz respectively. The effects of resonance are best known with the example of a vocalist shattering glass, it's not as simple as singing any note at the glass, it must be the right note, it must be that glass's resonant frequency.
A 'sympathetic vibration' is when a frequency vibrates in harmony with another
Put most simply, it's when things 'vibe' with each other
We're either sympathetic to someone's vibe because we're in tune, or we're on a different wavelength, and don't resonate with them.
A good example of sympathetic vibrations is when you place two matched tuning forks next to each other and strike the first one, the second one will vibrate in harmony with the first and produce the same note at a lower volume.
Both forks are tuned to the same frequency so when the soundwaves from one hit the other, they make it vibrate at the same pitch, but if you had several different tuning forks only those with notes of a shared harmonic would vibrate, the others don't resonate so can't join their choir.
Image © Kathy Hadley | Experimental Proof of Vibrations and Attraction
I've heard of voiceovers who have been rejected on audition after audition because of strange resonant hums in their booth: ornaments like vases, lamps and ornate bottles might look nice but if they're singing along when you're recording get them out of your space!
Echoes are distinct repetitions of a sound caused by reflection from a surface
We can all imagine the stereotypical image of someone shouting echo into a cavern, we all know what happens, happens, happens...
Considering different frequencies have different wavelengths, and how a frequency will respond to your space will vary by the length of its wave, for an echo to be technically defined as such, the sound source needs to reflect from at least that sound source's wavelength away.
At our lower hearing limit of 20Hz, waves echo from 17.5 meters, and at the higher end of 20KHz, the distance is 1.7cm.
There are two types of echoes, 'full echoes' and 'flutter echoes': full echoes are the type where you only hear the reflection once, and flutter echoes are heard multiple times.
Working out how far a frequency needs to travel to be an echo is the same formula for calculating the wavelength of any given frequency:
Speed of Sound in Meters per Second (343) / Frequency (Hz) = Wavelength
Wavelength = Echo Minimum Distance
Reverberation is when a soundwave hits multiple surfaces then reflects the energy back multiple times
it's also caused by a build up of acoustic energy from big wavelengths struggling to fit in a small space
Imagine someone shouting echo with a big plastic bucket on their head: that space is too small to produce an induvidual echo, the soundwaves will be bouncing back on themselves in such a manner they'll persist after they're produced, and can become an indistinguishable mess of noise.
Image © AEI Acoustics Limited
A lot of people confuse reverb with echoes which is easy considering they're both acoustic energy reacting with the space they're in.
Small waveforms in big spaces = Echo
Reflections from a single surface = Echo
Reflections from multiple surfaces = Reverb
Large waveforms in small spaces = Reverb
There are two indistinguishable events that happen simultaneously and make up the phenomena we know as reverb.
When recording sound it sound only exist in one point of time and space
A 20Hz wave is 17.5 meters long so it's physically impossible for it to fit inside a 1.2 meter voiceover booth: all that acoustic pressure would build up at edges of the booth and create that boxy sound all voice artists dread, an easy to understand example is s single drop of coke in a 250ml glass, it will roll around like an echo, but try pouring a 330ml can to the same glass, that's akin to reverb: it simply can't fit in that space so will fizz up and overflow.
Overflowing reverberant energy will build up along the walls and around the corners of your booth, potentially making it resonate!
How long a sound hangs around, after the original sound source has ceased, is called the RT60, or Reverb Time.
RT60: The time it takes a sound to reduce by 60dB
A 60dB reduction is one millionth of the original acoustic energy
Image © Alexander Sengpiel
If anyone says there are echoes in your recordings, what they most likely mean, technically speaking, is that your RT60 is too high and needs to be lower!
An ideal RT60 is between 0.2 and 0.7 seconds
The BBC have published a White Paper that suggests an RT60 of 0.2 seconds for talk studios and control rooms:
Anything less than 0.3 seconds is 'dead sounding' with anything over that being considered echoic.
Reverb is one of the biggest career killers for content creators! If your audio is an assault on the ears of the listener, why would anyone listen?
Content creators 'simply' need to appease the ears of the most ardent listeners: no echoes or reverberations is vital, technically we know to aim for an RT60 of around 0.2 seconds, anything more than 0.7 seconds and the risk of our audio being rejected by the listener's ears rises.
Thankfully there's no need for dedicated kit as we can measure the RT60 of a room with a clap and some free software!
Audacity is audio recording software you can use to measure your RT60:
I only chose Audacity for this example as it's free, other options include:
Standing Waves make some frequencies louder, and some cancel themselves out
They're the result of similar frequencies and volumes moving in opposing directions
Standing waves are also known as stationary waves or room modes.
They cause a lot of confusion, and having multiple names doesn't help much.
They're multidimensional sympathetic vibrations and reverberations, they're the result of acoustic energy reacting to itself and the geometry of the recording space.
Standing waves can bring more than 20dB extra acoustic power to some frequencies, whilst others will cancel themselves out.
The best way to visualise how frequencies react with the geometry of the space they're in, and cause increased and decreased spots of acoustic pressure, is with a frequency generator, some salt and a vibrating plate known as a Chladni Plate:
Back in 1787 a German physicist named Ernst Chladni discovered that grains of sand on a vibrating plate create art that shows how frequencies interact with geometry.
The plate pushes the grains of sand away from where it vibrates, this represents a build up of acoustic pressure that can happen in your recording space.
Antinodes Vibrate (No Sand):
The plate's pressured areas that vibrate and moving the sand away
Nodes are Static (Sand):
Areas without vibrations that represent the standing aspect of 'standing waves'
Higher frequencies produce more standing waves, but they contain less energy as they're more evenly distributed through the recording environment.
Nigel Stanford, produced a video called CYMATICS: Science Vs. Music which highlights the relationship between sound and matter:
Chladni plates are great for demonstrating how frequencies react with geometry on a two-dimensional plane, what about recording spaces, they're three-dimensional?
© GIK Acoustics: What Are Room Modes?
See that article for the formula to calculate room modes
Acousticians also call Standing Waves Room Modes, of which there are three types, each describing how harmonic resonances work in different dimensional planes.
This is why you don't touch the mic in a pro studio, the engineer will have placed it strategically to avoid acoustic pressure zones!
Read that last line again if you're going into a pro studio to record: don't touch the mic is the golden rule! The long-haired, highly caffeinated engineer dude, who probably stinks of skunk, will not thank you for moving their mic when it took them several sessions to get it in the perfect position.
To calculate the problem frequencies in a room you can either do some pretty stupid maths involving the speed of sound for each of the above, or you can find an online calculator to do it for you, the calculator is much easier:
In a booth that's 1.2 meters square, and 2 meters tall, a pretty common size, the problem frequencies are 86Hz, 143Hz, 172Hz, 258Hz and 287Hz.
The numbers in bold being the frequencies that cause issues in more than one dimensional plane.
Generically speaking we can just aim to manage frequencies < 300Hz
Their very size and shape make booths sound boomy, boxy and reverberant. Buying a booth might lead to a disappointing first recording session if it's not treated right on the insides. Now that we're well armed with knowledge about sound as our enemy, we need to know how it behaves so we can work with it, to our advantage: if we try to work against sound, we will lose, working with it is the only option.
According to the First Law of Thermodynamics, energy can not be created or destroyed, it can only ever be transmuted from one form to another. Whilst it would be nice to think there was a magic material that could destroy unwelcome sounds, we know that's not possible: we have to live with the fact that sound is energy.
We can't simply kill off or destroy a sound
The best we can do is to know what happens when sounds hit a surface
Then we can know how to work with it, and help it change for our benefit
© Reverb: Natural Wood Sonic Diffusers
When a soundwave hits a surface four things will happen:
It will either be transmitted, absorbed, reflected or diffused
Some sound energy will pass through the barrier. The sound energy that isn't reflected, absorbed or diffused passes through the barrier and comes out the other side at a reduced power level.
All those heavy mass, sound reducing materials that go into your recording space are a friend for keeping unwanted sounds out, but they help create problems inside the booth by keeping sounds in! The more 'sound-proof' your space, the less energy can leak out which means you're going to have to manage the internal acoustics more.
Some sound energy will be transferred into heat: soundwaves are a form of kinetic energy, when they pass through something their vibrations react with it on a molecular level, get converted to heat and dissipate.
Watch out for 'sound-proofing' products that 'turn sound to heat'
It's not a unique sales point, it's physics and great marketing!
How absorptive a surface is to sound will depend on how tightly packed it is on a molecular level; the frequency of the wave as well as the angle at which it strikes. Proper absorption takes mass and space: more space than you have in most booths.
Some of the acoustic energy will be reflected in the direction it came from which can manifest in your recordings as echoes in a large enough space, or reverb and resonances in smaller ones.
A soundwave hitting a surface is called the incident wave, where it strikes is the point of incidence, and once a wave has struck a surface and rebounded, it's known as the reflected wave. The Law of Reflection states that an incident wave at X° will always rebound at X°.
Atari's Pong and ball games like Snooker perfectly illustrate the Law of Reflection.
The hardness of a surface dictates how reflective it is, and the shape of the surface it reflects from plays a part too: a flat or convex surface can be used to bounce sounds away from a microphone in a booth in one example; in another, concave surfaces can be used to direct sounds towards a single point, like in a satellite dish or military listening station's receiver.
Rather than reflecting from where it came, or being absorbed and turned to heat, some sound energy will be diffused through being bounced back in different directions. They're multidirectional reflections.
Image © The Royal Albert Hall
When the Royal Albert Hall in London first opened in 1871 they didn't give much thought to the acoustics of the building, as a result, the sound inside sucked: the dome-shaped roof sent soundwaves back down onto the audience.
It was so bad people joked about being able to hear everything twice.
It was a poor design for a concert hall, but the same can also be said about booths being a poor design for VO as they trap standing waves.
Decades later, acoustic tests were conducted on the space and giant fibreglass acoustic diffusers which look like huge mushrooms were fitted: the diffusers inverted the ceiling space, further decades passed and more tests led to them being further fine-tuned.
Back in 1999 when presenting a late-night album track show I had the pleasure of interviewing Jon Lord, the keyboard player from Deep Purple, they'd just released 'In Concert with The London Symphony Orchestra' and he was doing the radio promo thing.
I'll always remember feeling Jon's enthusiasm down the line, he said there was something magical, wonderful and spectacular about being on stage at the Royal Albert Hall with the giant mushrooms overhead.
His joy of recalling the experience was heartfelt, it was impossible to not resonate with it. The more 'from the heart' your read is, the more the audience will feel it!
I've included the video for Smoke on the Water at the RAH, it is epic but it's not the track Jon was most proud of, he told me that was Concerto for Group and Orchestra. He really was a lovely bloke to chat to.
Voice booths need around 70% of their inside walls covered with acoustic treatment
If we don't have enough treatment in our space it'll have echoes and reverberations, yet too much would take away the natural feel we're wanting to capture in our recordings. We want our room to sound dry, but not to the point we're dehydrating all life out of it.
Acoustic energy carries the most velocity along walls and in corners, that's why many voice booths have strategically placed bass traps in the corners, these edgy or cuboid looking things help reduce reverberations, standing waves, and that dreaded boxy sound we all want to avoid.
The First Law of Thermodynamics tells us we can't trap energy and we know that sound is reflected, absorbed, transmitted and diffused when it reacts with surfaces. Calling something a 'Bass Trap' seems like an inaccurate name to me.
Imagine an 85Hz wavelength, which is 4.04 meters long, inside a 1.2m square booth: do you think a twelve-inch cube of foam with little mass is going to 'trap' that energy?
Trying to fit a wave that long in a space that's smaller is like trying to pour 1,000ml of Coke into a 250ml glass: it's going to overflow, spill and fizz everywhere.
Many sounds can easily pass through the foam most bass traps and acoustic tiles are made from; they'll lose some energy as it's dissipated into heat, but much of the energy will hit the wall, then the reflection will travel back through the foam losing more power, and the remaining energy will go back into the room, at both a different amplitude and direction.
They disrupt acoustic energy which would otherwise be a problem, they don't 'trap' it.
The 'Open Celled' nature of most acoustic foam gives it irregular channels for sound to travel along
Acoustic energy will always follow the path of least resistance which in the foam will be totally random
This means energy enters and leaves the foam and different places reducing many problems
The same principles apply to 'Corner Traps' which are often seen running up and down the corners of a booth where two walls meet.
If you wanted to 'trap bass' in an absorber, it'd need to be as thick as at least a quarter of the wavelength you want to stop, that's enough for its power to dissipate: in the case of 85Hz that's over a meter, this is known as the quarter wavelength rule.
Taking note of the quarter wavelength rule here we can't expect a 5cm deep foam tile to absorb much at all, 27,500Hz will struggle to get through it, whereas anything with a wavelength longer than 1.25cm will pass through it without issues.
The average 'soundproof tile' is soundproof 7,500Hz above our range of hearing
They can only keep out noises that we're unable of hearing!
Most foam tiles are usually cut in wedge or pyramid shapes, both being designed to try and scatter sounds they don't absorb, not so much on the way in, they're not solid enough to reflect anything, rather the energy is scattered back into the room in different directions as it's leaving the tile, but again this is thanks to the open-celled nature of the foam.
They really just dampen down the reflective nature of the walls, commercial products like acoustic blankets$ draped around your recording environment will do the same job, as would anything soft and deep enough to cushion those solid surfaces.
I can think of several professional, full time voiceover artists who use acoustic blankets$ and get a sound their clients love. As long as it sounds great it doesn't matter how you achieve it!
The common types of acoustic tile are Egg Crate, Pyramids and Wedges: they're all pretty much the same, the important thing being their depth rather than their style though I personally prefer the wedges for their look, plus I've buyer's remorse from egg crate and pyramid tiles in the past.
Not all foam is created equally, we know some people will put 'soundproofing' or 'acoustics' in a product's description to exploit the unenlightened so Amazon and eBay are not the places to buy acoustic treatment: unless you're getting it from a specialist with a store on that platform, otherwise you run the risk of buying junk, or a fire hazard:
The Spray-glue used to stick foam is often carcinogenic
You do not want to be breathing those vapours in
Use an alternative to aerosol glue, it's safer!
Acoustic panels offer more customisation: canvas printing means you can upload an image, and have your panel bring that extra bit of branding to your space.
Sold with a more honest name, they're panels and they're acoustic treatment, no stupid claims of soundproofing here and they're better at absorbing energy than foam as they're usually made with a denser, heavier insulative material like Rockwool, Owens Corning, or Isover.
Acoustic panels aren't too hard to make yourself, they'll require far fewer carpentry skills than a booth build, and making them just takes a wooden frame as deep as the insulation, a backing board and some fabric wrapping to keep the contents, and any dust or fine particles they might have, safely on the inside
Mike Delgaudio, who YouTubes as Booth Junkie shares his panel making process:
I've bought acoustic tiles, recorded with a blanket over my head and a mic in a foam box: never again
Once I got serious and bought proper acoustic panels I never looked back!
I picked up four acoustic panels from Blue Frog Audio, two large floorstanding ones that are taller than I am, which form a corner behind my microphones, and two slightly smaller, but still chonking huge ones, placed right behind me.