Artificial intelligence in music production: controversy & opportunity

Joshua Reiss Reiss – joshua.reiss@qmul.ac.uk
Twitter: @IntelSoundEng

Queen Mary University of London, Mile End Road, London, England, E1 4NS, United Kingdom

Popular version of 3aSP1-Artificial intelligence in music production: controversy and opportunity, presented at the 183rd ASA Meeting.

Music production
In music production, one typically has many sources. They each need to be heard simultaneously, but can all be created in different ways, in different environments and with different attributes. The mix should have all sources sound distinct yet contribute to a nice clean blend of the sounds. To achieve this is labour intensive and requires a professional engineer. Modern production systems help, but they’re incredibly complex and all require manual manipulation. As technology has grown, it has become more functional but not simpler for the user.

Intelligent music production
Intelligent systems could analyse all the incoming signals and determine how they should be modified and combined. This has the potential to revolutionise music production, in effect putting a robot sound engineer inside every recording device, mixing console or audio workstation. Could this be achieved? This question gets to the heart of what is art and what is science, what is the role of the music producer and why we prefer one mix over another.

Artificial Intelligence Figure 1: The architecture of an automatic mixing system. [Image courtesy of the author] Figure 1 Caption: The architecture of an automatic mixing system. [Image courtesy of the author]

Perception of mixing
But there is little understanding of how we perceive audio mixes. Almost all studies have been restricted to lab conditions; like measuring the perceived level of a tone in the presence of background noise. This tells us very little about real world cases. It doesn’t say how well one can hear lead vocals when there are guitar, bass and drums.

Best practices
And we don’t know why one production will sound dull while another makes you laugh and cry, even though both are on the same piece of music, performed by competent sound engineers. So we needed to establish what is good production, how to translate it into rules and exploit it within algorithms. We needed to step back and explore more fundamental questions, filling gaps in our understanding of production and perception.

Knowledge engineering
We used an approach that incorporated one of the earliest machine learning methods, knowledge engineering. Its so old school that its gone out of fashion. It assumes experts have already figured things out, they are experts after all. So let’s capture best practices as a set of rules and processes. But this is no easy task. Most sound engineers don’t know what they did. Ask a famous producer what he or she did on a hit song and you often get an answer like ‘I turned the knob up to 11 to make it sound phat.” How do you turn that into a mathematical equation? Or worse, they say it was magic and can’t be put into words.

We systematically tested all the assumptions about best practices and supplemented them with listening tests that helped us understand how people perceive complex sound mixtures. We also curated multitrack audio, with detailed information about how it was recorded, multiple mixes and evaluations of those mixes.

This enabled us to develop intelligent systems that automate much of the music production process.

Video Caption: An automatic mixing system based on a technology we developed.

Transformational impact
I gave a talk about this once in a room that had panel windows all around. These talks are usually half full. But this time it was packed, and I could see faces outside pressed up against the windows. They all wanted to find out about this idea of automatic mixing. It’s  a unique opportunity for academic research to have transformational impact on an entire industry. It addresses the fact that music production technologies are often not fit for purpose. Intelligent systems open up new opportunities. Amateur musicians can create high quality mixes of their content, small venues can put on live events without needing a professional engineer, time and preparation for soundchecks could be drastically reduced, and large venues and broadcasters could significantly cut manpower costs.

Taking away creativity
Its controversial. We entered an automatic mix in a student recording competition as a sort of Turing Test. Technically we cheated, because the mixes were supposed to be made by students, not by an ‘artificial intelligence’ (AI) created by a student. Afterwards I asked the judges what they thought of the mix. The first two were surprised and curious when I told them how it was done. The third judge offered useful comments when he thought it was a student mix. But when I told him that it was an ‘automatic mix’, he suddenly switched and said it was rubbish and he could tell all along.

Mixing is a creative process where stylistic decisions are made. Is this taking away creativity, is it taking away jobs? Such questions come up time and time again with new technologies, going back to 19th century protests by the Luddites, textile workers who feared that time spent on their skills and craft would be wasted as machines could replace their role in industry.

Not about replacing sound engineers
These are valid concerns, but its important to see other perspectives. A tremendous amount of music production work is technical, and audio quality would be improved by addressing these problems. As the graffiti artist Banksy said “All artists are willing to suffer for their work. But why are so few prepared to learn to draw?”

Creativity still requires technical skills. To achieve something wonderful when mixing music, you first have to achieve something pretty good and address issues with masking, microphone placement, level balancing and so on.

Video Caption: Time offset (comb filtering) correction, a technical problem in music production solved by an intelligent system.

The real benefit is not replacing sound engineers. Its dealing with all those situations when a talented engineer is not available; the band practicing in the garage, the small restaurant venue that does not provide any support, or game audio, where dozens of sounds need to be mixed and there is no miniature sound engineer living inside the games console.

Atom Tones – A periodic table of audible elements

Jill A. Linz – jlinz@skidmore.edu

Skidmore College, 815 N. Broadway, Saratoga Springs, NY, 12866, United States

Christian Howat
Skidmore College, Class of 2022
815 N. Broadway
Saratoga Springs, NY 12866

Popular version of 4aMU5-Atom Tones: investigating waveforms and spectra of atomic elements in an audible periodic chart using techniques found in music production, presented at the 183rd ASA Meeting.

atom tonesAtom Tones is an audible periodic table that allows us to identify elements through sound and to investigate the atomic world with methods used by sound engineers. The periodic table of Atom Tones can be accessed on the Atom Tones website. The Atom Music project was introduced in 2019 and explained the background ideas for creating audible tones for each atom. Each tone is clearly unique and can be used to identify the element by its sound. Audible tones can also be used in conjunction with the visual interpretations of the sound’s waveform to possibly gain insight into the atom.

In the same way that sunlight can be decomposed into individual colors of the rainbow, light produced from different elements can be decomposed into rainbow-like patterns that are unique to that element. The rainbow colors of the element appear as a series of bright lines known as spectral lines, or atomic spectra. Figure 1 shows examples of several element patterns, along with the element’s signature tone. The pattern of lines is unique to each atom.

Spectral lines produced by carbon. Image courtesy of Linz original paper (Proceedings on Meetings in Acoustics)
Spectral lines produced by Nitrogen. Image courtesy of Linz original paper (Proceedings on Meetings in Acoustics)
Spectral lines produced by Oxygen. Image courtesy of Linz original paper (Proceedings on Meetings in Acoustics)
Figure 1: Spectral lines produced by three different elements. These lines are unique for each element and are used to identify the element itself. The tones can be heard by clicking on each image. Image courtesy of Linz original paper (Proceedings on Meetings in Acoustics)

The relationship between music and physics is so intertwined that translating the spectral lines into sound is a relatively easy thing to do. Tedious perhaps, but not difficult. We can translate those colors into sounds of varying frequency, or pitch. These frequencies act like notes in a scale that can be played individually or combined. It is with these notes that we created the sounds of the elements.

A sound engineer can easily identify specific types of musical instruments as well as the musical intervals and chords played by those instruments by observing the digital waveforms and spectra produced in a recording, in addition to simply listening by ear. Digital audio software adds an extra layer of insight to the sound. Figure 2 shows the different waveforms and spectral lines for a French Horn and Bassoon each playing the same note, D3.

waveform and spectra of a French Horn compared to a Bassoon. Image courtesy of Linz original paper (Proceedings on Meetings in Acoustics)Figure 2: waveform and spectra of a French Horn compared to a Bassoon. Image courtesy of Linz original paper (Proceedings on Meetings in Acoustics)

Using the techniques developed for audio recording and music synthesis, we can create an audible representation of each element. Possible ways to interpret the tones produced are being investigated. Figure 3 shows the waveforms and spectra for a few elements that exhibit wave patterns that repeat themselves. This is what a sound engineer would expect to see when the recording sounds harmonic, or musical.

These are a few atom tones whose waveforms exhibited similar patterns that repeat themselves. Image courtesy of Linz, Howat original paper (Proceedings on Meetings in Acoustics)Figure 3: These are a few atom tones whose waveforms exhibited similar patterns that repeat themselves. Image courtesy of Linz, Howat original paper (Proceedings on Meetings in Acoustics)

Other combinations of elements exhibit very different patterns. The software allows you to zoom in and observe the pattern from different perspectives. Not only are we hearing the atoms for the first time, perhaps we are also seeing them in a new light.

Presence of a drone and estimating its range simply from the drone audio emissions

Kaliappan Gopalan – kgopala@pnw.edu

Purdue University Northwest, Hammond, IN, 46323, United States

Brett Y. Smolenski, North Point Defense, Rome, NY, USA
Darren Haddad, Information Exploitation Branch, Air Force Research Laboratory, Rome, NY, USA

Popular version of 1ASP8-Detection and Classification of Drones using Fourier-Bessel Series Representation of Acoustic Emissions, presented at the 183rd ASA Meeting.

With the proliferation of drones – from medical supply and hobbyist to surveillance, fire detection and illegal drug delivery, to name a few – of various sizes and capabilities flying day or night, it is imperative to detect their presence and estimate their range for security, safety and privacy reasons.

Our paper describes a technique for detecting the presence of a drone, as opposed to environmental noise such as from birds and moving vehicles, simply from the audio emissions of the drone from its motors, propellers and mechanical vibrations. By applying a feature extraction technique that separates a drone’s distinct audio spectrum from that of atmospheric noise, and employing machine learning algorithms, we were able to identify drones from three different classes flying outdoors with correct class in over 78 % of cases. Additionally, we estimated the range of a drone from the observation point correctly to within ±50 cm in over 85 % of cases.

We evaluated unique features characterizing each type of drone using a mathematical technique known as the Fourier-Bessel series expansion. Using these features which not only differentiated the drone class but also differentiated the drone range, we applied machine learning algorithms to train a deep learning network with ground truth values of drone type, or its range as a discrete variable at intervals of 50 cm. When the trained learning network was tested with new, unused features, we obtained the correct type of drone – with a nonzero range – and a range class that was within the appropriate class, that is, within ±50 cm of the actual range.

Any point along the main diagonal line indicates correct range class, that is, within ±50 cm of actual range, while off-diagonal values correspond to false classification error.

For identifying more than three types of drones, we tested seven different types of drones, namely, DJI S1000, DJI M600, Phantom 4 Pro, Phantom 4 QP with a quieter set of propellers, Mavic Pro Platinum, Mavic 2 Pro, and Mavic Pro, all tethered in an anechoic chamber in an Air Force laboratory and controlled by an operator to go through a series of propeller maneuvers (idle, left roll, right roll, pitch forward, pitch backward, left yaw, right yaw, half throttle, and full throttle) to fully capture the array of sounds the craft emit. Our trained deep learning network correctly identified the drone type in 84 % of our test cases.  Figure 1 shows the results of range classification for each outdoor drone flying between a line-of-sight range of 0 (no-drone) to 935 m.

Noise Pollution in Hospitals and its Impacts on the Health Care Community and Patients

Olivia C Coiado – coiado@illinois.edu
Twitter: @oliviacoiado
Instagram: @oliviacoiado

Department of Biomedical and Translational Sciences, Carle Illinois College of Medicine, Department of Bioengineering, University of Illinois Urbana-Champaign, Urbana, Illinois, 61801, United States

Erasmo F. Vergara
Laboratory of Vibration and Acoustics, Department of Mechanical Engineering, Federal University of Santa Catarina, Florianópolis, SC, Brazil.

Lizandra G. Lupi Vergara
Laboratory of Ergonomics, Department of Production and Systems Engineering, Federal University of Santa Catarina, Florianópolis, SC, Brazil.

Popular version of 3pNS4-Noise Pollution in Hospitals and its Impacts on the Health Care Community and Patients, presented at the 183rd ASA Meeting.

If you ever had to be hospitalized in your life, you probably know that spending a night in a hospital room and getting some sleep is almost an impossible mission! Why? Noise in hospitals is a common problem for patients, families and teams of professionals and employees. Most of a hospital’s environment is affected by the sounds of equipment and machines with high sound pressure levels (SPL) or “noise”.
What can we do?

Fig 1: Sound pressure meter positioned in front of the reception desk in Brazil.

We used a sound pressure meter (Fig. 1) to record noise of medical equipment such as machines, medical devices, tools, alarms used in the medical activities in hospitals in Brazil and in the United States. SPLs inside hospitals may have high average values, higher than 60 decibels (dB), with peak SPL values of 100 dB and may not meet the international requirements. The World Health Organization (WHO) suggests that the average SPL in hospitals should be around 35 dB during the day and 30 dB at night. SPLs above 65 dB can cause behavioral disorders and affect the quality of sleep and cause changes in the physiological responses to stress in hospitalized patients. High noise levels exceeding 55 dB can affect both patients and staff. The noise effects can cause memory lapses and mental exhaustion in performing tasks, exposing technical and support teams to risks, accidents and errors in the performance of their work. For instance, a plane taking off (Fig. 2) can reach up to 100 dB and a noisy hospital environment can reach up to 70 dB, more than double of the noise recommended by the WHO!

Figure 2: Image adapted from Bayo, Garcia and Garcia 1989.

Our research considered both quantitative aspects, through numerical and qualitative descriptors (subjective and psychological assessment of patients, medical staff, employees, etc.), to assess noise pollution in hospitals. Our model analyzed the relationship between the acoustic characteristics of the environment and people’s sound perception.
We interviewed 47 people in a Brazilian Hospital, the responses were collected from nurses, nursing assistants, doctors, and other staff members. 60% of the participants responded that they needed to speak louder and felt discomfort with the noise in the work environment, 57% said they felt discomfort with the noise coming from the medical equipment, 72% of the participants said the work environment is moderately or very noisy. The next phase of our research is to repeat the same measurements in a United Stated Hospital and compare the results. Then we can make a reflection, what can we do to reduce the effects of noise pollution in hospitals? How to reduce the noise coming from medical equipment? Our “dream” is to provide a more comfortable environment for patients and the health community. Hoping they can finally get a good night of sleep in Brazil in the U.S or any other hospital in the world.

The Impact of Formal Musical Training on Speech Comprehension in Heavily Distracting Environments

Alexandra Bruder – alexandra.l.bruder@vanderbilt.edu

Vanderbilt University Medical Center, Department of Anesthesiology, 1211 21st Avenue South, Medical Arts Building, Suite 422, Nashville, TN, 37212, United States

Joseph Schlesinger – joseph.j.schlesinger@vumc.org
Twitter: @DrJazz615

Vanderbilt University Medical Center
Nashville, TN 37205
United States

Clayton D Rothwell – crothwell@infoscitex.com<
Infoscitex Corporation, a DCS Company
Dayton, OH, 45431
United States

Popular version of 1pMU4-The Impact of Formal Musical Training on Speech Intelligibility Performance – Implications for Music Pedagogy in High-Consequence Industries, presented at the 183rd ASA Meeting.

Imagine being a waiter… everyone in the restaurant is speaking, music is playing, and co-workers are trying to get your attention, causing you to miss the customer’s order. Communication is necessary but can be hindered due to distractions in many environments, especially in high-risk environments, such as aviation, nuclear power, and healthcare, where miscommunication is a frequent contributing factor to accidents and loss of life. In domains where multitasking is necessary and timely and accurate responses must be ensured, does formal music training help performance?

We used an audio-visual task to test if formal music training can be useful in multitasking environments. Twenty-five students from Vanderbilt University participated in the study and were separated into groups based on their level of formal music training: no formal music training, 1-3 years, 3-5 years, and 5+ years of formal music training. Participants were given three tasks to attend to, a speech comprehension task (modeling distracted communication), a complex visual distraction task (modeling a clinical patient monitor), and an easy visual distraction task (modeling an alarm monitoring task). These tasks were completed in the presence of a combination of alarms and/or background noise and with/without background music.

formal musical training study Image courtesy of Bruder et al. original paper. (Psychology of Music).

Our research focused on results regarding the audio comprehension task and showed that the group with the most formal music training did not show changes in response rate with or without background music added, while all the other groups did. Meaning that with enough music training, background music is not a factor influencing participant response! Additionally, the number of times the participants responded to the audio task depended on the degree of formal music training. Participants with no formal music training had the highest response rate, followed by the 1-3-year group, then the 3–5-year group, with the 5+ year group having the lowest response rate. However, all participants were similar in accuracy overall, and accuracy decreased for all groups when background music was playing. Given the similar accuracy among groups, but less frequent responding with more formal music training, it appears that formal music training helps inform participants to not respond when they don’t know the answer.

Image courtesy of Bruder et al. original paper (Psychology of Music).

Why does this matter? There are many situations when responding and getting something wrong can be more detrimental than not responding, especially in time pressure situations where mistakes are costly to correct. Although the accuracy was similar between all groups, the groups with some formal music training seemed to respond with overconfidence, but did not know enough to increase accuracy, resulting in a potentially dangerous situation. This is contrasted with the 5+ formal music training group, who showed no effect of background music on response rate and who used their trained ears to better judge the extent of their understanding of the information and were less eager to respond to a difficult task under distraction. It turns out that those middle school band lessons paid off after all, that is, if you work in a distracting, multitasking environment.