Joshua Reiss Reiss – joshua.reiss@qmul.ac.uk
Twitter: @IntelSoundEng
Queen Mary University of London, Mile End Road, London, England, E1 4NS, United Kingdom
Popular version of 3aSP1-Artificial intelligence in music production: controversy and opportunity, presented at the 183rd ASA Meeting.
Music production
In music production, one typically has many sources. They each need to be heard simultaneously, but can all be created in different ways, in different environments and with different attributes. The mix should have all sources sound distinct yet contribute to a nice clean blend of the sounds. To achieve this is labour intensive and requires a professional engineer. Modern production systems help, but they’re incredibly complex and all require manual manipulation. As technology has grown, it has become more functional but not simpler for the user.
Intelligent music production
Intelligent systems could analyse all the incoming signals and determine how they should be modified and combined. This has the potential to revolutionise music production, in effect putting a robot sound engineer inside every recording device, mixing console or audio workstation. Could this be achieved? This question gets to the heart of what is art and what is science, what is the role of the music producer and why we prefer one mix over another.
Figure 1 Caption: The architecture of an automatic mixing system. [Image courtesy of the author]
Perception of mixing
But there is little understanding of how we perceive audio mixes. Almost all studies have been restricted to lab conditions; like measuring the perceived level of a tone in the presence of background noise. This tells us very little about real world cases. It doesn’t say how well one can hear lead vocals when there are guitar, bass and drums.
Best practices
And we don’t know why one production will sound dull while another makes you laugh and cry, even though both are on the same piece of music, performed by competent sound engineers. So we needed to establish what is good production, how to translate it into rules and exploit it within algorithms. We needed to step back and explore more fundamental questions, filling gaps in our understanding of production and perception.
Knowledge engineering
We used an approach that incorporated one of the earliest machine learning methods, knowledge engineering. Its so old school that its gone out of fashion. It assumes experts have already figured things out, they are experts after all. So let’s capture best practices as a set of rules and processes. But this is no easy task. Most sound engineers don’t know what they did. Ask a famous producer what he or she did on a hit song and you often get an answer like ‘I turned the knob up to 11 to make it sound phat.” How do you turn that into a mathematical equation? Or worse, they say it was magic and can’t be put into words.
We systematically tested all the assumptions about best practices and supplemented them with listening tests that helped us understand how people perceive complex sound mixtures. We also curated multitrack audio, with detailed information about how it was recorded, multiple mixes and evaluations of those mixes.
This enabled us to develop intelligent systems that automate much of the music production process.
Video Caption: An automatic mixing system based on a technology we developed.
Transformational impact
I gave a talk about this once in a room that had panel windows all around. These talks are usually half full. But this time it was packed, and I could see faces outside pressed up against the windows. They all wanted to find out about this idea of automatic mixing. It’s a unique opportunity for academic research to have transformational impact on an entire industry. It addresses the fact that music production technologies are often not fit for purpose. Intelligent systems open up new opportunities. Amateur musicians can create high quality mixes of their content, small venues can put on live events without needing a professional engineer, time and preparation for soundchecks could be drastically reduced, and large venues and broadcasters could significantly cut manpower costs.
Taking away creativity
Its controversial. We entered an automatic mix in a student recording competition as a sort of Turing Test. Technically we cheated, because the mixes were supposed to be made by students, not by an ‘artificial intelligence’ (AI) created by a student. Afterwards I asked the judges what they thought of the mix. The first two were surprised and curious when I told them how it was done. The third judge offered useful comments when he thought it was a student mix. But when I told him that it was an ‘automatic mix’, he suddenly switched and said it was rubbish and he could tell all along.
Mixing is a creative process where stylistic decisions are made. Is this taking away creativity, is it taking away jobs? Such questions come up time and time again with new technologies, going back to 19th century protests by the Luddites, textile workers who feared that time spent on their skills and craft would be wasted as machines could replace their role in industry.
Not about replacing sound engineers
These are valid concerns, but its important to see other perspectives. A tremendous amount of music production work is technical, and audio quality would be improved by addressing these problems. As the graffiti artist Banksy said “All artists are willing to suffer for their work. But why are so few prepared to learn to draw?”
Creativity still requires technical skills. To achieve something wonderful when mixing music, you first have to achieve something pretty good and address issues with masking, microphone placement, level balancing and so on.
Video Caption: Time offset (comb filtering) correction, a technical problem in music production solved by an intelligent system.
The real benefit is not replacing sound engineers. Its dealing with all those situations when a talented engineer is not available; the band practicing in the garage, the small restaurant venue that does not provide any support, or game audio, where dozens of sounds need to be mixed and there is no miniature sound engineer living inside the games console.
Jill A. Linz – jlinz@skidmore.edu
Skidmore College, 815 N. Broadway, Saratoga Springs, NY, 12866, United States
Christian Howat
Skidmore College, Class of 2022
815 N. Broadway
Saratoga Springs, NY 12866
Popular version of 4aMU5-Atom Tones: investigating waveforms and spectra of atomic elements in an audible periodic chart using techniques found in music production, presented at the 183rd ASA Meeting.
Atom Tones is an audible periodic table that allows us to identify elements through sound and to investigate the atomic world with methods used by sound engineers. The periodic table of Atom Tones can be accessed on the Atom Tones website. The Atom Music project was introduced in 2019 and explained the background ideas for creating audible tones for each atom. Each tone is clearly unique and can be used to identify the element by its sound. Audible tones can also be used in conjunction with the visual interpretations of the sound’s waveform to possibly gain insight into the atom.
In the same way that sunlight can be decomposed into individual colors of the rainbow, light produced from different elements can be decomposed into rainbow-like patterns that are unique to that element. The rainbow colors of the element appear as a series of bright lines known as spectral lines, or atomic spectra. Figure 1 shows examples of several element patterns, along with the element’s signature tone. The pattern of lines is unique to each atom.
Figure 1: Spectral lines produced by three different elements. These lines are unique for each element and are used to identify the element itself. The tones can be heard by clicking on each image. Image courtesy of Linz original paper (Proceedings on Meetings in Acoustics)
The relationship between music and physics is so intertwined that translating the spectral lines into sound is a relatively easy thing to do. Tedious perhaps, but not difficult. We can translate those colors into sounds of varying frequency, or pitch. These frequencies act like notes in a scale that can be played individually or combined. It is with these notes that we created the sounds of the elements.
A sound engineer can easily identify specific types of musical instruments as well as the musical intervals and chords played by those instruments by observing the digital waveforms and spectra produced in a recording, in addition to simply listening by ear. Digital audio software adds an extra layer of insight to the sound. Figure 2 shows the different waveforms and spectral lines for a French Horn and Bassoon each playing the same note, D3.
Figure 2: waveform and spectra of a French Horn compared to a Bassoon. Image courtesy of Linz original paper (Proceedings on Meetings in Acoustics)
Using the techniques developed for audio recording and music synthesis, we can create an audible representation of each element. Possible ways to interpret the tones produced are being investigated. Figure 3 shows the waveforms and spectra for a few elements that exhibit wave patterns that repeat themselves. This is what a sound engineer would expect to see when the recording sounds harmonic, or musical.
Figure 3: These are a few atom tones whose waveforms exhibited similar patterns that repeat themselves. Image courtesy of Linz, Howat original paper (Proceedings on Meetings in Acoustics)
Other combinations of elements exhibit very different patterns. The software allows you to zoom in and observe the pattern from different perspectives. Not only are we hearing the atoms for the first time, perhaps we are also seeing them in a new light.
Alexandra Bruder – alexandra.l.bruder@vanderbilt.edu
Vanderbilt University Medical Center, Department of Anesthesiology, 1211 21st Avenue South, Medical Arts Building, Suite 422, Nashville, TN, 37212, United States
Joseph Schlesinger – joseph.j.schlesinger@vumc.org
Twitter: @DrJazz615
Vanderbilt University Medical Center
Nashville, TN 37205
United States
Clayton D Rothwell – crothwell@infoscitex.com<
Infoscitex Corporation, a DCS Company
Dayton, OH, 45431
United States
Popular version of 1pMU4-The Impact of Formal Musical Training on Speech Intelligibility Performance – Implications for Music Pedagogy in High-Consequence Industries, presented at the 183rd ASA Meeting.
Imagine being a waiter… everyone in the restaurant is speaking, music is playing, and co-workers are trying to get your attention, causing you to miss the customer’s order. Communication is necessary but can be hindered due to distractions in many environments, especially in high-risk environments, such as aviation, nuclear power, and healthcare, where miscommunication is a frequent contributing factor to accidents and loss of life. In domains where multitasking is necessary and timely and accurate responses must be ensured, does formal music training help performance?
We used an audio-visual task to test if formal music training can be useful in multitasking environments. Twenty-five students from Vanderbilt University participated in the study and were separated into groups based on their level of formal music training: no formal music training, 1-3 years, 3-5 years, and 5+ years of formal music training. Participants were given three tasks to attend to, a speech comprehension task (modeling distracted communication), a complex visual distraction task (modeling a clinical patient monitor), and an easy visual distraction task (modeling an alarm monitoring task). These tasks were completed in the presence of a combination of alarms and/or background noise and with/without background music.
Image courtesy of Bruder et al. original paper. (Psychology of Music).
Our research focused on results regarding the audio comprehension task and showed that the group with the most formal music training did not show changes in response rate with or without background music added, while all the other groups did. Meaning that with enough music training, background music is not a factor influencing participant response! Additionally, the number of times the participants responded to the audio task depended on the degree of formal music training. Participants with no formal music training had the highest response rate, followed by the 1-3-year group, then the 3–5-year group, with the 5+ year group having the lowest response rate. However, all participants were similar in accuracy overall, and accuracy decreased for all groups when background music was playing. Given the similar accuracy among groups, but less frequent responding with more formal music training, it appears that formal music training helps inform participants to not respond when they don’t know the answer.
Image courtesy of Bruder et al. original paper (Psychology of Music).
Why does this matter? There are many situations when responding and getting something wrong can be more detrimental than not responding, especially in time pressure situations where mistakes are costly to correct. Although the accuracy was similar between all groups, the groups with some formal music training seemed to respond with overconfidence, but did not know enough to increase accuracy, resulting in a potentially dangerous situation. This is contrasted with the 5+ formal music training group, who showed no effect of background music on response rate and who used their trained ears to better judge the extent of their understanding of the information and were less eager to respond to a difficult task under distraction. It turns out that those middle school band lessons paid off after all, that is, if you work in a distracting, multitasking environment.