Virtual Reality Musical Instruments for the 21st Century

Rob Hamilton – hamilr4@rpi.edu
Twitter: @robertkhamilton

Rensselaer Polytechnic Institute, 110 8th St, Troy, New York, 12180, United States

Popular version of 1aCA3 – Real-time musical performance across and within extended reality environments
Presented at the 184 ASA Meeting
Read the abstract at https://doi.org/10.1121/10.0018060

Have you ever wanted to just wave your hands to be able to make beautiful music? Sad your epic air-guitar skills don’t translate into pop/rock super stardom? Given the speed and accessibility of modern computers, it may come as little surprise that artists and researchers have been looking to virtual and augmented reality to build the next generation of musical instruments. Borrowing heavily from video game design, a new generation of digital luthiers is already exploring new techniques to bring the joys and wonders of live musical performance into the 21st Century.

Image courtesy of Rob Hamilton.

One such instrument is ‘Coretet’: a virtual reality bowed string instrument that can be reshaped by the user into familiar forms such as a violin, viola, cello or double bass. While wearing a virtual reality headset such as Meta’s Oculus Quest 2, performers bow and pluck the instrument in familiar ways, albeit without any physical interaction with strings or wood. Sound is generated in Coretet using a computer model of a bowed or plucked string called a ‘physical model’ driven by the motion of a performer’s hands and the use of their VR game controllers. And borrowing from multiplayer online games, Coretet performers can join a shared network server and perform music together.

Our understanding of music, and live musical performance on traditional physical instruments is tightly coupled to time, specifically the understanding that when a finger plucks a string, or a stick strikes a drum head, a sound will be generated immediately, without any delay or latency. And while modern computers are capable of streaming large amounts of data at the speed of light – significantly faster than the speed of sound – bottlenecks in the CPUs or GPUs themselves, or in the code designed to mimic our physical interactions with instruments, or even in the network connections that connect users and computers alike, often introduce latency, making virtual performances feel sluggish or awkward.

This research focuses on some common causes for this kind of latency and looks at ways that musicians and instrument designers can work around or mitigate these latencies both technically and artistically.

Coretet overview video: Video courtesy of Rob Hamilton.

Lead Vocal Tracks in Popular Music Go Quiet

Lead Vocal Tracks in Popular Music Go Quiet

An analysis of top popular music from 1946 to 2020 shows a marked decrease in volume of the lead vocal track and differences across musical genres.

Estimated lead-to-accompaniment-ratio, LAR, for songs in five genres from 1990-2020. Purple circles correspond to solo artists and green squares to bands. Credit: Kai Siedenburg

WASHINGTON, April 25, 2023 – A general rule of music production involves mixing various soundtracks so the lead singer’s voice is in the foreground. But it is unclear how such track mixing – and closely related lyric intelligibility – has changed over the years.

Scientists from the University of Oldenburg in Germany carried out an analysis of hundreds of popular song recordings from 1946 to 2020 to determine…click to read more

From the Journal: JASA Express Letters
Article: Lead-vocal level in recordings of popular music 1946-2020
DOI: 10.1121/10.0017773

Can a Playlist be Your Therapist? Balancing Emotions Through Music #ASA183

Can a Playlist be Your Therapist? Balancing Emotions Through Music #ASA183

Music app provides therapy by consoling, relaxing, uplifting users

Media Contact:
Ashley Piccone
AIP Media
301-209-3090
media@aip.org

NASHVILLE, Tenn., Dec. 5, 2022 – Music has the potential to change emotional states and can distract listeners from negative thoughts and pain. It has also been proven to help improve memory, performance, and mood.

The Emotion Equalization App surveys your mood and energy to create a corresponding therapeutic playlist. Credit: Man Hei Law

At the upcoming meeting of the Acoustical Society of America, Man Hei Law of Hong Kong University of Science and Technology will present an app that creates custom playlists to help listeners care for their emotions through music. The presentation, “Emotion equalization app: A first study and results,” will take place at the Grand Hyatt Nashville Hotel on Dec. 5 at 3:15 p.m. Eastern U.S. in the Rail Head room, as part of ASA’s 183rd meeting running Dec. 5-9.

“As humanity’s universal language, music can significantly impact a person’s physical and emotional state,” said Law. “For example, music can help people to manage pain. We developed this app as an accessible first aid strategy for balancing emotions.”

The app could be used by people who may not want to receive counseling or treatment because of feelings of shame, inadequacy, or distrust. By taking listeners on an emotional roller-coaster ride, the app aims to leave them in a more positive and focused state than where they began.

Users take three self-led questionnaires in the app to measure their emotional status and provide the information needed to create a playlist. Current emotion and long-term emotion status are gauged with a pictorial assessment tool that helps identify emotions in terms of energy level and mood. Energy level can run from high, medium, to low and mood can register as positive, neutral, or negative. A Patient Health Questionnaire and a General Anxiety Disorder screening are also used to establish personalized music therapy treatments.

By determining the emotional state of the user, the app creates a customized and specifically sequenced playlist of songs using one of three strategies: consoling, relaxing, or uplifting. Consoling music reflects the energy and mood of the user, while relaxing music provides a positive, low energy. Uplifting music is also positive but more high energy.

“In our experiments, we found out that relaxing and uplifting methods can significantly move listeners from negative to more positive emotional states. Especially, when listeners are at a neutral mood, all three proposed methods can change listeners’ emotions to more positive,” said Law.

———————– MORE MEETING INFORMATION ———————–
Main meeting website: https://acousticalsociety.org/asa-meetings/
Technical program: https://eppro02.ativ.me/web/planner.php?id=ASAFALL22&proof=true

ASA PRESS ROOM
In the coming weeks, ASA’s Press Room will be updated with newsworthy stories and the press conference schedule at https://acoustics.org/asa-press-room/.

LAY LANGUAGE PAPERS
ASA will also share dozens of lay language papers about topics covered at the conference. Lay language papers are 300 to 500 word summaries of presentations written by scientists for a general audience. They will be accompanied by photos, audio, and video. Learn more at https://acoustics.org/lay-language-papers/.

PRESS REGISTRATION
ASA will grant free registration to credentialed and professional freelance journalists. If you are a reporter and would like to attend the meeting or virtual press conferences, contact AIP Media Services at media@aip.org.  For urgent requests, AIP staff can also help with setting up interviews and obtaining images, sound clips, or background information.

ABOUT THE ACOUSTICAL SOCIETY OF AMERICA
The Acoustical Society of America (ASA) is the premier international scientific society in acoustics devoted to the science and technology of sound. Its 7,000 members worldwide represent a broad spectrum of the study of acoustics. ASA publications include The Journal of the Acoustical Society of America (the world’s leading journal on acoustics), JASA Express Letters, Proceedings of Meetings on Acoustics, Acoustics Today magazine, books, and standards on acoustics. The society also holds two major scientific meetings each year. See https://acousticalsociety.org/.

Artificial intelligence in music production: controversy & opportunity

Joshua Reiss Reiss – joshua.reiss@qmul.ac.uk
Twitter: @IntelSoundEng

Queen Mary University of London, Mile End Road, London, England, E1 4NS, United Kingdom

Popular version of 3aSP1-Artificial intelligence in music production: controversy and opportunity, presented at the 183rd ASA Meeting.

Music production
In music production, one typically has many sources. They each need to be heard simultaneously, but can all be created in different ways, in different environments and with different attributes. The mix should have all sources sound distinct yet contribute to a nice clean blend of the sounds. To achieve this is labour intensive and requires a professional engineer. Modern production systems help, but they’re incredibly complex and all require manual manipulation. As technology has grown, it has become more functional but not simpler for the user.

Intelligent music production
Intelligent systems could analyse all the incoming signals and determine how they should be modified and combined. This has the potential to revolutionise music production, in effect putting a robot sound engineer inside every recording device, mixing console or audio workstation. Could this be achieved? This question gets to the heart of what is art and what is science, what is the role of the music producer and why we prefer one mix over another.

Artificial Intelligence Figure 1: The architecture of an automatic mixing system. [Image courtesy of the author] Figure 1 Caption: The architecture of an automatic mixing system. [Image courtesy of the author]

Perception of mixing
But there is little understanding of how we perceive audio mixes. Almost all studies have been restricted to lab conditions; like measuring the perceived level of a tone in the presence of background noise. This tells us very little about real world cases. It doesn’t say how well one can hear lead vocals when there are guitar, bass and drums.

Best practices
And we don’t know why one production will sound dull while another makes you laugh and cry, even though both are on the same piece of music, performed by competent sound engineers. So we needed to establish what is good production, how to translate it into rules and exploit it within algorithms. We needed to step back and explore more fundamental questions, filling gaps in our understanding of production and perception.

Knowledge engineering
We used an approach that incorporated one of the earliest machine learning methods, knowledge engineering. Its so old school that its gone out of fashion. It assumes experts have already figured things out, they are experts after all. So let’s capture best practices as a set of rules and processes. But this is no easy task. Most sound engineers don’t know what they did. Ask a famous producer what he or she did on a hit song and you often get an answer like ‘I turned the knob up to 11 to make it sound phat.” How do you turn that into a mathematical equation? Or worse, they say it was magic and can’t be put into words.

We systematically tested all the assumptions about best practices and supplemented them with listening tests that helped us understand how people perceive complex sound mixtures. We also curated multitrack audio, with detailed information about how it was recorded, multiple mixes and evaluations of those mixes.

This enabled us to develop intelligent systems that automate much of the music production process.

Video Caption: An automatic mixing system based on a technology we developed.

Transformational impact
I gave a talk about this once in a room that had panel windows all around. These talks are usually half full. But this time it was packed, and I could see faces outside pressed up against the windows. They all wanted to find out about this idea of automatic mixing. It’s  a unique opportunity for academic research to have transformational impact on an entire industry. It addresses the fact that music production technologies are often not fit for purpose. Intelligent systems open up new opportunities. Amateur musicians can create high quality mixes of their content, small venues can put on live events without needing a professional engineer, time and preparation for soundchecks could be drastically reduced, and large venues and broadcasters could significantly cut manpower costs.

Taking away creativity
Its controversial. We entered an automatic mix in a student recording competition as a sort of Turing Test. Technically we cheated, because the mixes were supposed to be made by students, not by an ‘artificial intelligence’ (AI) created by a student. Afterwards I asked the judges what they thought of the mix. The first two were surprised and curious when I told them how it was done. The third judge offered useful comments when he thought it was a student mix. But when I told him that it was an ‘automatic mix’, he suddenly switched and said it was rubbish and he could tell all along.

Mixing is a creative process where stylistic decisions are made. Is this taking away creativity, is it taking away jobs? Such questions come up time and time again with new technologies, going back to 19th century protests by the Luddites, textile workers who feared that time spent on their skills and craft would be wasted as machines could replace their role in industry.

Not about replacing sound engineers
These are valid concerns, but its important to see other perspectives. A tremendous amount of music production work is technical, and audio quality would be improved by addressing these problems. As the graffiti artist Banksy said “All artists are willing to suffer for their work. But why are so few prepared to learn to draw?”

Creativity still requires technical skills. To achieve something wonderful when mixing music, you first have to achieve something pretty good and address issues with masking, microphone placement, level balancing and so on.

Video Caption: Time offset (comb filtering) correction, a technical problem in music production solved by an intelligent system.

The real benefit is not replacing sound engineers. Its dealing with all those situations when a talented engineer is not available; the band practicing in the garage, the small restaurant venue that does not provide any support, or game audio, where dozens of sounds need to be mixed and there is no miniature sound engineer living inside the games console.

Atom Tones – A periodic table of audible elements

Jill A. Linz – jlinz@skidmore.edu

Skidmore College, 815 N. Broadway, Saratoga Springs, NY, 12866, United States

Christian Howat
Skidmore College, Class of 2022
815 N. Broadway
Saratoga Springs, NY 12866

Popular version of 4aMU5-Atom Tones: investigating waveforms and spectra of atomic elements in an audible periodic chart using techniques found in music production, presented at the 183rd ASA Meeting.

atom tonesAtom Tones is an audible periodic table that allows us to identify elements through sound and to investigate the atomic world with methods used by sound engineers. The periodic table of Atom Tones can be accessed on the Atom Tones website. The Atom Music project was introduced in 2019 and explained the background ideas for creating audible tones for each atom. Each tone is clearly unique and can be used to identify the element by its sound. Audible tones can also be used in conjunction with the visual interpretations of the sound’s waveform to possibly gain insight into the atom.

In the same way that sunlight can be decomposed into individual colors of the rainbow, light produced from different elements can be decomposed into rainbow-like patterns that are unique to that element. The rainbow colors of the element appear as a series of bright lines known as spectral lines, or atomic spectra. Figure 1 shows examples of several element patterns, along with the element’s signature tone. The pattern of lines is unique to each atom.

Spectral lines produced by carbon. Image courtesy of Linz original paper (Proceedings on Meetings in Acoustics)
Spectral lines produced by Nitrogen. Image courtesy of Linz original paper (Proceedings on Meetings in Acoustics)
Spectral lines produced by Oxygen. Image courtesy of Linz original paper (Proceedings on Meetings in Acoustics)
Figure 1: Spectral lines produced by three different elements. These lines are unique for each element and are used to identify the element itself. The tones can be heard by clicking on each image. Image courtesy of Linz original paper (Proceedings on Meetings in Acoustics)

The relationship between music and physics is so intertwined that translating the spectral lines into sound is a relatively easy thing to do. Tedious perhaps, but not difficult. We can translate those colors into sounds of varying frequency, or pitch. These frequencies act like notes in a scale that can be played individually or combined. It is with these notes that we created the sounds of the elements.

A sound engineer can easily identify specific types of musical instruments as well as the musical intervals and chords played by those instruments by observing the digital waveforms and spectra produced in a recording, in addition to simply listening by ear. Digital audio software adds an extra layer of insight to the sound. Figure 2 shows the different waveforms and spectral lines for a French Horn and Bassoon each playing the same note, D3.

waveform and spectra of a French Horn compared to a Bassoon. Image courtesy of Linz original paper (Proceedings on Meetings in Acoustics)Figure 2: waveform and spectra of a French Horn compared to a Bassoon. Image courtesy of Linz original paper (Proceedings on Meetings in Acoustics)

Using the techniques developed for audio recording and music synthesis, we can create an audible representation of each element. Possible ways to interpret the tones produced are being investigated. Figure 3 shows the waveforms and spectra for a few elements that exhibit wave patterns that repeat themselves. This is what a sound engineer would expect to see when the recording sounds harmonic, or musical.

These are a few atom tones whose waveforms exhibited similar patterns that repeat themselves. Image courtesy of Linz, Howat original paper (Proceedings on Meetings in Acoustics)Figure 3: These are a few atom tones whose waveforms exhibited similar patterns that repeat themselves. Image courtesy of Linz, Howat original paper (Proceedings on Meetings in Acoustics)

Other combinations of elements exhibit very different patterns. The software allows you to zoom in and observe the pattern from different perspectives. Not only are we hearing the atoms for the first time, perhaps we are also seeing them in a new light.

The Impact of Formal Musical Training on Speech Comprehension in Heavily Distracting Environments

Alexandra Bruder – alexandra.l.bruder@vanderbilt.edu

Vanderbilt University Medical Center, Department of Anesthesiology, 1211 21st Avenue South, Medical Arts Building, Suite 422, Nashville, TN, 37212, United States

Joseph Schlesinger – joseph.j.schlesinger@vumc.org
Twitter: @DrJazz615

Vanderbilt University Medical Center
Nashville, TN 37205
United States

Clayton D Rothwell – crothwell@infoscitex.com<
Infoscitex Corporation, a DCS Company
Dayton, OH, 45431
United States

Popular version of 1pMU4-The Impact of Formal Musical Training on Speech Intelligibility Performance – Implications for Music Pedagogy in High-Consequence Industries, presented at the 183rd ASA Meeting.

Imagine being a waiter… everyone in the restaurant is speaking, music is playing, and co-workers are trying to get your attention, causing you to miss the customer’s order. Communication is necessary but can be hindered due to distractions in many environments, especially in high-risk environments, such as aviation, nuclear power, and healthcare, where miscommunication is a frequent contributing factor to accidents and loss of life. In domains where multitasking is necessary and timely and accurate responses must be ensured, does formal music training help performance?

We used an audio-visual task to test if formal music training can be useful in multitasking environments. Twenty-five students from Vanderbilt University participated in the study and were separated into groups based on their level of formal music training: no formal music training, 1-3 years, 3-5 years, and 5+ years of formal music training. Participants were given three tasks to attend to, a speech comprehension task (modeling distracted communication), a complex visual distraction task (modeling a clinical patient monitor), and an easy visual distraction task (modeling an alarm monitoring task). These tasks were completed in the presence of a combination of alarms and/or background noise and with/without background music.

formal musical training study Image courtesy of Bruder et al. original paper. (Psychology of Music).

Our research focused on results regarding the audio comprehension task and showed that the group with the most formal music training did not show changes in response rate with or without background music added, while all the other groups did. Meaning that with enough music training, background music is not a factor influencing participant response! Additionally, the number of times the participants responded to the audio task depended on the degree of formal music training. Participants with no formal music training had the highest response rate, followed by the 1-3-year group, then the 3–5-year group, with the 5+ year group having the lowest response rate. However, all participants were similar in accuracy overall, and accuracy decreased for all groups when background music was playing. Given the similar accuracy among groups, but less frequent responding with more formal music training, it appears that formal music training helps inform participants to not respond when they don’t know the answer.

Image courtesy of Bruder et al. original paper (Psychology of Music).

Why does this matter? There are many situations when responding and getting something wrong can be more detrimental than not responding, especially in time pressure situations where mistakes are costly to correct. Although the accuracy was similar between all groups, the groups with some formal music training seemed to respond with overconfidence, but did not know enough to increase accuracy, resulting in a potentially dangerous situation. This is contrasted with the 5+ formal music training group, who showed no effect of background music on response rate and who used their trained ears to better judge the extent of their understanding of the information and were less eager to respond to a difficult task under distraction. It turns out that those middle school band lessons paid off after all, that is, if you work in a distracting, multitasking environment.