Artificial intelligence in music production: controversy and opportunity

Joshua Reiss Reiss –
Twitter: @IntelSoundEng

Queen Mary University of London, Mile End Road, London, England, E1 4NS, United Kingdom

Popular version of 3aSP1-Artificial intelligence in music production: controversy and opportunity, presented at the 183rd ASA Meeting.

Music production
In music production, one typically has many sources. They each need to be heard simultaneously, but can all be created in different ways, in different environments and with different attributes. The mix should have all sources sound distinct yet contribute to a nice clean blend of the sounds. To achieve this is labour intensive and requires a professional engineer. Modern production systems help, but they’re incredibly complex and all require manual manipulation. As technology has grown, it has become more functional but not simpler for the user.

Intelligent music production
Intelligent systems could analyse all the incoming signals and determine how they should be modified and combined. This has the potential to revolutionise music production, in effect putting a robot sound engineer inside every recording device, mixing console or audio workstation. Could this be achieved? This question gets to the heart of what is art and what is science, what is the role of the music producer and why we prefer one mix over another.

Figure 1 Caption: The architecture of an automatic mixing system. [Image courtesy of the author]

Perception of mixing
But there is little understanding of how we perceive audio mixes. Almost all studies have been restricted to lab conditions; like measuring the perceived level of a tone in the presence of background noise. This tells us very little about real world cases. It doesn’t say how well one can hear lead vocals when there are guitar, bass and drums.

Best practices
And we don’t know why one production will sound dull while another makes you laugh and cry, even though both are on the same piece of music, performed by competent sound engineers. So we needed to establish what is good production, how to translate it into rules and exploit it within algorithms. We needed to step back and explore more fundamental questions, filling gaps in our understanding of production and perception.

Knowledge engineering
We used an approach that incorporated one of the earliest machine learning methods, knowledge engineering. Its so old school that its gone out of fashion. It assumes experts have already figured things out, they are experts after all. So let’s capture best practices as a set of rules and processes. But this is no easy task. Most sound engineers don’t know what they did. Ask a famous producer what he or she did on a hit song and you often get an answer like ‘I turned the knob up to 11 to make it sound phat.” How do you turn that into a mathematical equation? Or worse, they say it was magic and can’t be put into words.

We systematically tested all the assumptions about best practices and supplemented them with listening tests that helped us understand how people perceive complex sound mixtures. We also curated multitrack audio, with detailed information about how it was recorded, multiple mixes and evaluations of those mixes.

This enabled us to develop intelligent systems that automate much of the music production process.

Video Caption: An automatic mixing system based on a technology we developed.

Transformational impact
I gave a talk about this once in a room that had panel windows all around. These talks are usually half full. But this time it was packed, and I could see faces outside pressed up against the windows. They all wanted to find out about this idea of automatic mixing. It’s  a unique opportunity for academic research to have transformational impact on an entire industry. It addresses the fact that music production technologies are often not fit for purpose. Intelligent systems open up new opportunities. Amateur musicians can create high quality mixes of their content, small venues can put on live events without needing a professional engineer, time and preparation for soundchecks could be drastically reduced, and large venues and broadcasters could significantly cut manpower costs.

Taking away creativity
Its controversial. We entered an automatic mix in a student recording competition as a sort of Turing Test. Technically we cheated, because the mixes were supposed to be made by students, not by an ‘artificial intelligence’ created by a student. Afterwards I asked the judges what they thought of the mix. The first two were surprised and curious when I told them how it was done. The third judge offered useful comments when he thought it was a student mix. But when I told him that it was an ‘automatic mix’, he suddenly switched and said it was rubbish and he could tell all along.

Mixing is a creative process where stylistic decisions are made. Is this taking away creativity, is it taking away jobs? Such questions come up time and time again with new technologies, going back to 19th century protests by the Luddites, textile workers who feared that time spent on their skills and craft would be wasted as machines could replace their role in industry.

Not about replacing sound engineers
These are valid concerns, but its important to see other perspectives. A tremendous amount of music production work is technical, and audio quality would be improved by addressing these problems. As the graffiti artist Banksy said “All artists are willing to suffer for their work. But why are so few prepared to learn to draw?”

Creativity still requires technical skills. To achieve something wonderful when mixing music, you first have to achieve something pretty good and address issues with masking, microphone placement, level balancing and so on.

Video Caption: Time offset (comb filtering) correction, a technical problem in music production solved by an intelligent system.

The real benefit is not replacing sound engineers. Its dealing with all those situations when a talented engineer is not available; the band practicing in the garage, the small restaurant venue that does not provide any support, or game audio, where dozens of sounds need to be mixed and there is no miniature sound engineer living inside the games console.

Atom Tones – A periodic table of audible elements

Jill A. Linz –

Skidmore College, 815 N. Broadway, Saratoga Springs, NY, 12866, United States

Christian Howat
Skidmore College, Class of 2022
815 N. Broadway
Saratoga Springs, NY 12866

Popular version of 4aMU5-Atom Tones: investigating waveforms and spectra of atomic elements in an audible periodic chart using techniques found in music production, presented at the 183rd ASA Meeting.

Atom Tones is an audible periodic table that allows us to identify elements through sound and to investigate the atomic world with methods used by sound engineers. The periodic table of Atom Tones can be accessed on the Atom Tones website. The Atom Music project was introduced in 2019 and explained the background ideas for creating audible tones for each atom. Each tone is clearly unique and can be used to identify the element by its sound. Audible tones can also be used in conjunction with the visual interpretations of the sound’s waveform to possibly gain insight into the atom.

In the same way that sunlight can be decomposed into individual colors of the rainbow, light produced from different elements can be decomposed into rainbow-like patterns that are unique to that element. The rainbow colors of the element appear as a series of bright lines known as spectral lines, or atomic spectra. Figure 1 shows examples of several element patterns, along with the element’s signature tone. The pattern of lines is unique to each atom.

Figure 1: Spectral lines produced by three different elements. These lines are unique for each element and are used to identify the element itself. The tones can be heard by clicking on each image. Image courtesy of Linz original paper (Proceedings on Meetings in Acoustics)

The relationship between music and physics is so intertwined that translating the spectral lines into sound is a relatively easy thing to do. Tedious perhaps, but not difficult. We can translate those colors into sounds of varying frequency, or pitch. These frequencies act like notes in a scale that can be played individually or combined. It is with these notes that we created the sounds of the elements.

A sound engineer can easily identify specific types of musical instruments as well as the musical intervals and chords played by those instruments by observing the digital waveforms and spectra produced in a recording, in addition to simply listening by ear. Digital audio software adds an extra layer of insight to the sound. Figure 2 shows the different waveforms and spectral lines for a French Horn and Bassoon each playing the same note, D3.

Figure 2: waveform and spectra of a French Horn compared to a Bassoon. Image courtesy of Linz original paper (Proceedings on Meetings in Acoustics)

Using the techniques developed for audio recording and music synthesis, we can create an audible representation of each element. Possible ways to interpret the tones produced are being investigated. Figure 3 shows the waveforms and spectra for a few elements that exhibit wave patterns that repeat themselves. This is what a sound engineer would expect to see when the recording sounds harmonic, or musical.

Figure 3: These are a few atom tones whose waveforms exhibited similar patterns that repeat themselves. Image courtesy of Linz, Howat original paper (Proceedings on Meetings in Acoustics)

Other combinations of elements exhibit very different patterns. The software allows you to zoom in and observe the pattern from different perspectives. Not only are we hearing the atoms for the first time, perhaps we are also seeing them in a new light.

The Impact of Formal Musical Training on Speech Comprehension in Heavily Distracting Environments

Alexandra Bruder –

Vanderbilt University Medical Center, Department of Anesthesiology, 1211 21st Avenue South, Medical Arts Building, Suite 422, Nashville, TN, 37212, United States

Joseph Schlesinger –
Twitter: @DrJazz615

Vanderbilt University Medical Center
Nashville, TN 37205
United States

Clayton D Rothwell –<
Infoscitex Corporation, a DCS Company
Dayton, OH, 45431
United States

Popular version of 1pMU4-The Impact of Formal Musical Training on Speech Intelligibility Performance – Implications for Music Pedagogy in High-Consequence Industries, presented at the 183rd ASA Meeting.

Imagine being a waiter… everyone in the restaurant is speaking, music is playing, and co-workers are trying to get your attention, causing you to miss the customer’s order. Communication is necessary but can be hindered due to distractions in many environments, especially in high-risk environments, such as aviation, nuclear power, and healthcare, where miscommunication is a frequent contributing factor to accidents and loss of life. In domains where multitasking is necessary and timely and accurate responses must be ensured, does formal music training help performance?

We used an audio-visual task to test if formal music training can be useful in multitasking environments. Twenty-five students from Vanderbilt University participated in the study and were separated into groups based on their level of formal music training: no formal music training, 1-3 years, 3-5 years, and 5+ years of formal music training. Participants were given three tasks to attend to, a speech comprehension task (modeling distracted communication), a complex visual distraction task (modeling a clinical patient monitor), and an easy visual distraction task (modeling an alarm monitoring task). These tasks were completed in the presence of a combination of alarms and/or background noise and with/without background music.

Image courtesy of Bruder et al. original paper. (Psychology of Music).

Our research focused on results regarding the audio comprehension task and showed that the group with the most formal music training did not show changes in response rate with or without background music added, while all the other groups did. Meaning that with enough music training, background music is not a factor influencing participant response! Additionally, the number of times the participants responded to the audio task depended on the degree of formal music training. Participants with no formal music training had the highest response rate, followed by the 1-3-year group, then the 3–5-year group, with the 5+ year group having the lowest response rate. However, all participants were similar in accuracy overall, and accuracy decreased for all groups when background music was playing. Given the similar accuracy among groups, but less frequent responding with more formal music training, it appears that formal music training helps inform participants to not respond when they don’t know the answer.

Image courtesy of Bruder et al. original paper (Psychology of Music).

Why does this matter? There are many situations when responding and getting something wrong can be more detrimental than not responding, especially in time pressure situations where mistakes are costly to correct. Although the accuracy was similar between all groups, the groups with some formal music training seemed to respond with overconfidence, but did not know enough to increase accuracy, resulting in a potentially dangerous situation. This is contrasted with the 5+ formal music training group, who showed no effect of background music on response rate and who used their trained ears to better judge the extent of their understanding of the information and were less eager to respond to a difficult task under distraction. It turns out that those middle school band lessons paid off after all, that is, if you work in a distracting, multitasking environment.

1pMU4 – Flow Visualization and Aerosols in Performance

Abhishek Kumar –
Tehya Stockman –
Jean Hertzberg –

University of Colorado Boulder
1111 Engineering Drive
Boulder, CO 80309

Popular version of 1pMU4 – Flow visualization and aerosols in performance
Presented Monday afternoon, May 23, 2022
182nd ASA Meeting in Denver, Colorado
Click here to read the abstract

Outbreaks from choir performances, such as the Skagit Valley Choir, showed that singing brought potential risk of COVID-19 infection. The risks of airborne infection from other musical performances, such as playing wind instruments or performing theater are less known. In addition, it is important to understand methods that can be used to reduce infection risk. In this study, we used a variety of methods, including flow visualization, aerosol and CO2 measurements to understand the different components that can lead to transmission risk from musical performance and risk mitigation. We have tested eight musical instruments, both brass and woodwinds, and also singing, with and without a mask/bell cover.

We started with the flow visualization of exhalations (from singers and voice actors) and resultant jets (from musical instruments) using (a) the schlieren method, and, (b) imaging with a laser sheet in a room filled with stage fog. These visualization tools helped identify the spatial location with maximum airflow (i.e. velocities) for aerosol and CO2 measurements, and showed the structure of the flows.


Figure 1: Schlieren method – proof of concept, opera singer. Courtesy:

Figure 2: Laser sheet imaging – proof of concept, oboe. Courtesy:

Our flow visualization velocity estimates indicated that using a barrier, such as a mask or a bell cover significantly reduced axial (exhaust direction) velocities. Keep in mind the jets observed using either method have the same composition as human exhalation, i.e. N2, O2, CO2, and trace gases.

Figure 3: Maximum measured axial velocities, with and without cover/mask Courtesy:

We measured exhaled/exhausted CO2 and aerosol particles from the musicians. Our results indicate that aerosol spikes can be expected when there is a spike in CO2 measurements.

Figure 4: Combined Aerosol and CO2 time series for singing. Courtesy: Tehya Stockman

Figure 5: Aerosol data for performance with and without a mask/cover. Courtesy: Tehya Stockman

These results show that masks on instruments and singers while performing significantly decreases the amount of aerosols measured, thus providing one effective solution to reducing the risk of viral airborne transmission through aerosols. Musicians reported small differences in how the instruments felt, but very little difference in how they sounded.



Sing On: Certain Facemasks Don’t Hinder Vocalists

Sing On: Certain Facemasks Don’t Hinder Vocalists

Masks designed for singers prevent COVID-19 transmission, most voice distortion

Media Contact:
Larry Frum
AIP Media

SEATTLE, December 1, 2021 – When singers generate beautiful notes, they can also release harmful particles like the coronavirus. Wearing a mask prevents virus transmission, but it also affects the sound.

Thomas Moore, from Rollins College, will discuss his observations of a professional soprano singing with and without six types of masks at the 181st Meeting of the Acoustical Society of America, which will be held Nov. 29 to Dec. 3. The session, “Aerosol propagation and acoustic effects while singing with a face mask,” will take place on Dec. 1 at 12:40 p.m. Eastern U.S. in Room 302 of the Hyatt Regency Seattle as part of a session on making music during a pandemic.

Moore found masks effectively block aerosols, forcing the breath to exit at the sides. From there, the aerosols travel upwards, rising with the upward flow of body heat from the singer. The dispersal of breath likely dilutes the virus and prevents the spread of COVID-19.

At low frequencies, masks reduced volume but did not have other effects on the singing. However, masks did reduce the power of higher frequencies, which made the enunciation of words less clear and altered the timbre. Masks had no effect on the pitch.

One of the masks tested, a singer’s mask, was designed specifically with singers in mind. All six masks blocked the forward flow of breath, but the singer’s mask did so with the least change in sound.

“A normal cloth mask can reduce the high frequencies by as much as 10 times, but a singer’s mask will reduce them by a factor of less than 2,” said Moore.

Diluting virus-causing aerosols is key to reducing infection and the spread of the COVID-19 virus. Although Moore found the breath still escaped the sides of the masks, its rise into the air and subsequent dispersal lowers the risk compared to singing without a mask. He said this emphasizes how good air flow in a room is critical for preventing viral risk.

Main meeting website:
Technical program:
Press Room:

In the coming weeks, ASA’s Worldwide Press Room will be updated with additional tips on dozens of newsworthy stories and with lay language papers, which are 300 to 500 word summaries of presentations written by scientists for a general audience and accompanied by photos, audio and video. You can visit the site during the meeting at

We will grant free registration to credentialed journalists and professional freelance journalists. If you are a reporter and would like to attend, contact AIP Media Services at For urgent requests, staff at can also help with setting up interviews and obtaining images, sound clips, or background information.

The Acoustical Society of America (ASA) is the premier international scientific society in acoustics devoted to the science and technology of sound. Its 7,000 members worldwide represent a broad spectrum of the study of acoustics. ASA publications include The Journal of the Acoustical Society of America (the world’s leading journal on acoustics), JASA Express Letters, Proceedings of Meetings on Acoustics, Acoustics Today magazine, books, and standards on acoustics. The society also holds two major scientific meetings each year. See

4aMU8 – Neural Plasticity for Music Processing in Young Adults: the Effect of Transcranial Direct Current Stimulation (tDCS)

Eghosa Adodo, Cameron Patterson, Yan H. Yu
St. John’s University
8000 Utopia Parkway, Queens, New York, 11439

Popular version of 4aMU8 – Neural plasticity for music processing in young adults: The effect of transcranial direct current stimulation (tDCS)
Presented Thursday morning, December 2, 2021
181st ASA Meeting
Click here to read the abstract

Transcranial direct current stimulation (tDCS) is a non-invasive brain stimulation technique. It has increasingly been proposed and utilized as a unique approach to enhance various communicative, cognitive, and emotional functions. However, it is not clear whether, how, and to what extent, tDCS influences nonlinguistic processing such as music processing. The purpose of this study was to examine brain responses to music as a result of noninvasive brain stimulation.

Twenty healthy young adults participated our study. They first sat in a sound-shielded booth, and listened to classic western piano music while watching a muted movie. The music stream used in this study consisted of six types of music pattern changes (rhythm, intensity, slide, location, pitch, and timbre), and it lasted 14 minutes. Brain waves were recorded using a 65-electrode sensor cap.
Then each participant received 10 minutes of tDCS at the frontal-central scalp regions.
After 10 minutes of tDCS, they listened to the music again while their brain waves were recorded again.

Multi-feature music oddball paradigm. (Permission to use the stimuli and paradigm was obtained from the original creator, Peter Vuust).
S = same sounds, D1= pitch change; D2 = timbre change; D3 = location change; D4 = intensity change, D5 = pitch slide change; D6 = rhythm change.

Electroencephalogram/event-related potentials Transcranial direct current stimulation

We hypothesized that 10 minutes of tDCS would enhance music processing.

Our results indicated that the differences between pre- and post-tDCS brain waves were only evident in some conditions. Noninvasive brain stimulation, including tDCS, has the potential to be used as a clinical tool for enhancing auditory processing, but further studies need to examine how experimental parameters (dosage, duration, frequency, etc) influence the brain responses for auditory processing.