Room Design Considerations for Optimal Podcasting

Madeline Didier –

Jaffe Holden, 114-A Washington Street, Norwalk, CT, 06854, United States

Twitter: @JaffeHolden
Instagram: @jaffeholden

Popular version of 1aAA2-Podcast recording room design considerations and best practices, presented at the 183rd ASA Meeting.

Podcast popularity has been on the rise, with over two million active podcasts as of 2021. There are countless options when choosing a podcast to listen to, and unacceptable audio quality will cause a listener to quickly move on to another option. Poor acoustics in the space where a podcast was recorded are noticeable even by an untrained ear, and listeners may hear differences in room acoustics without even seeing a space. Podcasters use a variety of setups to record episodes, ranging from closets to professional recording spaces. One trend is recording spaces that feel comfortable and look aesthetically pleasing, more like living rooms rather than radio stations.

Figure 1: Podcast studio with a living room aesthetic. Image courtesy of The Qube.

A high-quality podcast recording is one that does not capture sounds other than the podcaster’s voice. Unwanted sounds include noise from mechanical systems, vocal reflections, or ambient noise such as exterior traffic or people in a neighboring room. Listen to the examples below.

More ideal recording conditions:
Media courtesy of Home Cooking Podcast, Episode: Kohlrabi – Turnip for What

Less ideal recording conditions:
Media courtesy of The Birding Life Podcast, Episode 15: Roberts Bird Guide Second Edition

The first example is a higher quality recording where the voices can be clearly heard. In the second example, the podcast guest is not recording in an acoustically suitable room. The voice reflects off the wall surfaces and detracts from the overall quality and listener experience.

Every room design project comes with its own challenges and considerations related to budget, adjacent spaces, and expected quality. Each room may have different design needs, but best practice recommendations for designing a podcasting room remain the same.

Background noise: Mechanical noise should be controlled so that you cannot hear HVAC systems in a recording. Computers and audio interfaces should ideally be located remotely so that noises, such as computer fans, are not picked up on the recording.
Room shape: Square room proportions should be avoided as this can cause room modes, or buildup of sound energy in spots of the room, creating an uneven acoustic environment.
Room finishes: Carpet is ideal for flooring, and an acoustically absorptive material should be attached to the wall(s) in the same plane as the podcaster’s voice. Wall materials should be 1-2” thick. Ceiling materials should be acoustically absorptive, and window glass should be angled upward to reduce resonance within the room.
Sound isolation: Strategies for improving sound separation may include sound rated doors or standard doors with full perimeter gaskets, sound isolation ceilings, and full height wall constructions with insulation and multiple layers of gypsum wallboard.

In the example below, the podcast studio (circled) is strategically located at the back of a dedicated corridor for radio and podcasting. It is physically isolated from the main corridor, creating more acoustical separation. Absorptive ceiling tile (not shown) and 2” thick wall panels help limit vocal reflections, and background noise is controlled.

Podcast recording room within a radio and podcasting suite. Image courtesy of BWBR and RAMSA.Figure 2: Podcast recording room within a radio and podcasting suite. Image courtesy of BWBR and RAMSA.

While the challenges for any podcast room may differ, the acoustical goals remain the same. With thoughtful consideration of background noise, room shape, finishes, and sound isolation, any room can support high-quality podcast recording.

Connecting industry to a more diverse student population

Felicia Doggett –

Instagram: @metropolitan_acoustics

Metropolitan Acoustics, 1628 JFK Blvd., Suite 1902, Philadelphia, PA, 19103, United States

Popular version of 4pED4-Internships in the acoustical disciplines: How can we attract a more diverse student population?, presented at the 183rd ASA Meeting.

Metropolitan Acoustics has employed 26 interns over a 27-year period. Of those 26, there were 6 students who pursued careers in the acoustics fields; of those 6, there was only one who was both a woman and minority, and that person was a foreign born student who came to the United States for school. Not one woman or minority from the United States who interned with us starting from 1995 entered into the acoustics fields after graduation. This is a very telling microcosm into the Acoustical Society of America as a whole.

Within the acoustics fields, we need to ask ourselves how we are connecting to underrepresented student groups. The engineering disciplines are not very diverse and the few woman and minority groups that enter into the field often leave for a variety of reasons, which most often lead back to a lack of inclusion. It doesn’t have to be a mountain – it can simply be a molehill that sends someone off the track of having sustained and productive careers in the science and engineering fields.

At Metropolitan Acoustics, a large majority of our interns have been 6-month co-ops as compared to 3-month summer interns (23-3). For the most part, the students were fairly productive and we found that interest, enthusiasm, engagement, and work ethic are all factors to their success. Six of the 26 went into careers in acoustics, and one of them works for us currently. The gender and racial breakdown are as follows:

  • Gender diversity: 20 male, 6 female
  • Racial diversity: 20 Caucasian, 6 minority; of the 6 minorities, 4 male and 2 femaleGender/Race diverse
  • Out of the 6 interns that went into careers in acoustics, 5 are Caucasian males and 1 is a minority female who is not native to the US

As an organization, what are we doing to attract a more diverse pipeline of candidates to the acoustics fields? And perhaps a bigger question is how we plan to keep them in the field, which is all about inclusiveness. Dedicated student portals on organizational websites populated with videos, student awards, lists of schools with acoustic programs, and other items is a start. This information can be transmitted to underrepresented student organizations like National Society of Black Engineers, Society of Women Engineers, Society of Hispanic Professional Engineers, Society of STEM Women of Color, American Indian Science and Engineering, among others with the hope that this information may light a spark in some to enter the field.

Presence of a drone and estimating its range simply from the drone audio emissions

Kaliappan Gopalan –

Purdue University Northwest, Hammond, IN, 46323, United States

Brett Y. Smolenski, North Point Defense, Rome, NY, USA
Darren Haddad, Information Exploitation Branch, Air Force Research Laboratory, Rome, NY, USA

Popular version of 1ASP8-Detection and Classification of Drones using Fourier-Bessel Series Representation of Acoustic Emissions, presented at the 183rd ASA Meeting.

With the proliferation of drones – from medical supply and hobbyist to surveillance, fire detection and illegal drug delivery, to name a few – of various sizes and capabilities flying day or night, it is imperative to detect their presence and estimate their range for security, safety and privacy reasons.

Our paper describes a technique for detecting the presence of a drone, as opposed to environmental noise such as from birds and moving vehicles, simply from the audio emissions of the drone from its motors, propellers and mechanical vibrations. By applying a feature extraction technique that separates a drone’s distinct audio spectrum from that of atmospheric noise, and employing machine learning algorithms, we were able to identify drones from three different classes flying outdoors with correct class in over 78 % of cases. Additionally, we estimated the range of a drone from the observation point correctly to within ±50 cm in over 85 % of cases.

We evaluated unique features characterizing each type of drone using a mathematical technique known as the Fourier-Bessel series expansion. Using these features which not only differentiated the drone class but also differentiated the drone range, we applied machine learning algorithms to train a deep learning network with ground truth values of drone type, or its range as a discrete variable at intervals of 50 cm. When the trained learning network was tested with new, unused features, we obtained the correct type of drone – with a nonzero range – and a range class that was within the appropriate class, that is, within ±50 cm of the actual range.

Any point along the main diagonal line indicates correct range class, that is, within ±50 cm of actual range, while off-diagonal values correspond to false classification error.

For identifying more than three types of drones, we tested seven different types of drones, namely, DJI S1000, DJI M600, Phantom 4 Pro, Phantom 4 QP with a quieter set of propellers, Mavic Pro Platinum, Mavic 2 Pro, and Mavic Pro, all tethered in an anechoic chamber in an Air Force laboratory and controlled by an operator to go through a series of propeller maneuvers (idle, left roll, right roll, pitch forward, pitch backward, left yaw, right yaw, half throttle, and full throttle) to fully capture the array of sounds the craft emit. Our trained deep learning network correctly identified the drone type in 84 % of our test cases.  Figure 1 shows the results of range classification for each outdoor drone flying between a line-of-sight range of 0 (no-drone) to 935 m.

A moth’s ear inspires directional passive acoustic structures

Lara Díaz-García –
Twitter: @laradigar23
Instagram: @laradigar

Centre for Ultrasonic Engineering, University of Strathclyde, Glasgow, Lanarkshire, G1 1RD, United Kingdom

Popular version of 2aSA1-Directional passive acoustic structures inspired by the ear of Achroia grisella, presented at the 183rd ASA Meeting.

When most people think of microphones, they think of the ones singers use or you would find in a karaoke machine, but they might not realize that much smaller microphones are all around us. Current smartphones have about three or four microphones that are small. The miniaturization of microphones is therefore a desire in technological development. These microphones are strategically placed to achieve directionality. Directionality means that the microphone’s goal is to discard undesirable noise coming from directions other than the speaker’s as well as to detect and transmit the sound signal. For hearing implant users this functionality is also desirable. Ideally, you want to be able to tell what direction a sound is coming from, as people with unimpaired hearing do.

But dealing with small size and directionality presents problems. People with unimpaired hearing can tell where sound is coming from by comparing the input received by each of our ears, conveniently sitting on opposite sides of our heads and therefore receiving sounds at slightly different times and with different intensities. The brain can do the math and compute what direction sound must be coming from. The problem is that, to use this trick, you need two microphones that are separated so the time of arrival and difference in intensity are not negligible, and that goes against microphone miniaturization. What to do if you want a small but directional microphone, then?

When looking for inspiration for novel solutions, scientists often look to nature, where energy efficiency and simple designs are prioritized in evolution. Insects are one such example that faces the challenge of directional hearing at small scales. The researchers have chosen to look at the lesser wax moth (fig 1), observed to have directional hearing in the 1980s. The males produce a mating call that the females can track even when one of their ears is pierced. This implies that, instead of using both ears as humans do, these moths’ directional hearing is achieved with just one ear.

Lesser wax moth specimen with scale bar. Image courtesy of Birgit E. Rhode (CC BY 4.0). Lesser wax moth specimen with scale bar. Image courtesy of Birgit E. Rhode (CC BY 4.0).

The working hypothesis is that directionality must be achieved by the asymmetrical shape and characteristics of the moth’s ear itself. To test this hypothesis, the researchers designed a model that resembles the moth’s ear and checked how it behaved when exposed to sound. The model consists of a thin elliptical membrane with two halves of different thicknesses. For it, they used a readily available commercial 3D printer that allows customization of the design and fabrication of samples in just a few hours. The samples were then placed on a turning surface and the behavior of the membrane in response to sound coming from different directions was investigated (fig 2). It was found that the membrane moves more when sound comes from one direction rather than all the others (fig 3), meaning the structure is therefore passively directional. This means it could inspire a single small directional microphone in the future.

Laboratory setup to turn the sample (in orange, center of the picture) and expose it to sound from the speaker (left of the picture). Researcher’s own picture.
Image adapted from Lara Díaz-García’s original paper. Sounds coming from 0º direction elicit a stronger movement in the membrane than other directions.

3aSPb5 – Improving Headphone Spatialization: Fixing a problem you’ve learned to accept

Muhammad Haris Usmani –
Ramón Cepeda Jr. –
Thomas M. Sullivan –
Bhiksha Raj –
Carnegie Mellon University
5000 Forbes Avenue
Pittsburgh, PA 15213

Popular version of paper 3aSPb5, “Improving headphone spatialization for stereo music”
Presented Wednesday morning, May 20, 2015, 10:15 AM, Brigade room
169th ASA Meeting, Pittsburgh

The days of grabbing a drink, brushing dust from your favorite record and playing it in the listening room of the house are long gone. Today, with the portability technology has enabled, almost everybody listens to music on their headphones. However, most commercially produced stereo music is mixed and mastered for playback on loudspeakers– this presents a problem for the growing number of headphone listeners. When a legacy stereo mix is played on headphones, all instruments or voices in that piece get placed in between the listener’s ears, inside of their head. This not only is unnatural and fatiguing for the listener, but is detrimental toward the original placement of the instruments in that musical piece. It disturbs the spatialization of the music and makes the sound image appear as three isolated lobes inside of the listener’s head [1], see Figure 1.


Hard-panned instruments separate into the left and right lobes, while instruments placed at center stage are heard in the center of the head. However, as hearing is a dynamic process that adapts and settles with the perceived sound, we have accepted headphones to sound this way [2].

In order to improve the spatialization of headphones, the listener’s ears must be deceived into thinking that they are listening to the music inside of a listening room. When playing music in a room, the sound travels through the air, reverberates inside the room, and interacts with the listener’s head and torso before reaching the ears [3]. These interactions add the necessary psychoacoustic cues for perception of an externalized stereo soundstage presented in front of the listener. If this listening room is a typical music studio, the soundstage perceived is close to what the artist intended. Our work tries to place the headphone listener into the sound engineer’s seat inside a music studio to improve the spatialization of music. For the sake of compatibility across different headphones, we try to make minimal changes to the mastering equalization curve of the music.

Since there is a compromise between sound quality and the spatialization that can be presented, we developed three different systems that present different levels of such compromise. We label these as Type-I, Type-II, and Type-0. Type-I focuses on improving spatialization but at the cost of losing some sound quality, Type-II improves spatialization while taking into account that the sound quality is not degraded too much, and Type-0 focuses on refining conventional listening by making the sound image more homogeneous. Since the sound quality is key in music, we will skip over Type-I and focus on the other two systems.

Type-II, consists of a head related transfer function (HRTF) model [4], room reverberation (synthesized reverb [5]), and a spectral correction block. HRTFs embody all the complex spatialization cues that exist due to the relative positions of the listener and the source [6]. In our case, a general HRTF model is used which is configured to place the listener at the “sweet spot” in the studio (right and left speakers placed at an angle of 30° from the listener’s head). The spectral correction attempts to keep the original mastering equalization curve as intact as possible.

Type-0, is made up of a side-content crossfeed block and a spectral correction block. Some headphone amps allow crossfeed between the left and right channels to model the fact that when listening to music through loudspeakers, each ear can hear the music from each speaker with a delay attached to the sound originating from the speaker that is furthest away. A shortcoming of conventional crossfeed is that the delay we can apply is limited (to avoid comb filtering) [7]. Side-content crossfeed resolves this by only crossfeeding unique content between the two channels, allowing us to use larger delays. In this system, the side-content is extracted by using a stereo-to-3 upmixer, which is implemented as a novel extension to Nikunen et al.’s upmixer [8].

These systems were put to the test by conducting a subjective evaluation with 28 participants, all between 18 to 29 years of age. The participants were introduced to the metrics that were being measured in the beginning of the evaluation. Since the first part of the evaluation included specific spatial metrics which are a bit complicated to grasp for untrained listeners, we used a collection of descriptions, diagrams, and/or music excerpts that represented each metric to provide in-evaluation training for the listeners. The results of the first part of the evaluation suggest that this method worked well.
We were able to conclude from the results that Type-II externalized the sounds while performing at a level analogous to the original source in the other metrics and Type-0 was able to improve sound quality and comfort by compromising stereo width when compared to the original source, which is what we expected. Also, there was strong content-dependence observed in the results suggesting that a different setting of improving spatialization must be used with music that’s been produced differently. Overall, two of the three proposed systems in this work are preferred in equal or greater amounts to the legacy stereo mix.

Tags: music, acoustics, design, technology


[1] G-Sonique, “Monitor MSX5 – Headphone monitoring system,” G-Sonique, 2011. [Online]. Available:
[2] S. Mushendwa, “Enhancing Headphone Music Sound Quality,” Aalborg University – Institute of Media Technology and Engineering Science, 2009.
[3] C. J. C. H. K. K. Y. J. L. Yong Guk Kim, “An Integrated Approach of 3D Sound Rendering,” Springer-Verlag Berlin Heidelberg, vol. II, no. PCM 2010, p. 682–693, 2010.
[4] D. Rocchesso, “3D with Headphones,” in DAFX: Digital Audio Effects, Chichester, John Wiley & Sons, 2002, pp. 154-157.
[5] P. E. Roos, “Samplicity’s Bricasti M7 Impulse Response Library v1.1,” Samplicity, [Online]. Available:
[6] R. O. Duda, “3-D Audio for HCI,” Department of Electrical Engineering, San Jose State University, 2000. [Online]. Available: [Accessed 15 4 2015].
[7] J. Meier, “A DIY Headphone Amplifier With Natural Crossfeed,” 2000. [Online]. Available:
[8] J. Nikunen, T. Virtanen and M. Vilermo, “Multichannel Audio Upmixing by Time-Frequency Filtering Using Non-Negative Tensor Factorization,” Journal of the AES, vol. 60, no. 10, pp. 794-806, October 2012.