How Do the Erhu and Violin Shape Our Emotions? A Cross-Cultural Study

Wenyi Song – wsongak@cse.ust.hk
Twitter: @sherrys72539831

Department of Department of Computer Science and Engineering
The Hong Kong University of Science and Technology
Hong Kong SAR

Andrew Brian Horner
horner@cse.ust.hk
Department of Department of Computer Science and Engineering
The Hong Kong University of Science and Technology
Hong Kong SAR

Popular version of 1aMU3 – Emotional characteristics of the erhu and violin: a comparative study of emotional intensity in musical excerpts
Presented at the 187th ASA Meeting
Read the abstract at https://eppro01.ativ.me//web/index.php?page=IntHtml&project=ASAFALL24&id=3767558

–The research described in this Acoustics Lay Language Paper may not have yet been peer reviewed–


Music speaks to us across cultures, but can the instruments we choose shape our emotions in different ways?

This study compares the emotional responses evoked by two similar yet culturally distinct string instruments: the Chinese erhu and the Western violin. Both are bowed string instruments, but they have distinct sounds and cultural roles that could lead listeners to experience different emotions. Our research focuses on whether these instruments, along with variations in performance and listener familiarity, influence emotional intensity in unique ways.

Western violin performance example: violinist Ray Chan playing ‘Mendelssohn Violin Concerto in E minor, Op. 64’

 

Chinese erhu performance example: erhu player Guo Gan playing the Chinese piece ‘Horse Racing’ (feat. Pianist Lang Lang)

 

To explore these questions, we conducted three online listening experiments. Participants were asked to listen to a series of short musical pieces performed on both the erhu and violin. They then rated each piece using two emotional measures: specific emotion categories (such as happy, sad, calm, and agitated) and emotional positivity and intensity.

Our results show clear emotional differences between the instruments. The violin often evokes positive, energetic emotions, which may be due to its bright tone and dynamic range. By contrast, the erhu tends to evoke sadness, possibly because of its softer timbre and its traditional association with melancholy in Chinese music.

Interestingly, familiarity with the instrument played a significant role in listeners’ emotional responses. Those who were more familiar with the violin rated the pieces as more emotionally intense, suggesting that cultural background and previous exposure shape how we emotionally connect with music. However, our analysis also found that different performances of the same piece generally did not change emotional ratings, emphasizing that the instrument itself is a major factor in shaping our emotional experience.

These findings open new paths for understanding how cultural context and personal experiences influence our emotional reactions to music. The distinct emotional qualities of the erhu and violin reveal how musical instruments can evoke different emotional responses, even when playing the same piece.

How voice training changes the tongue in chest versus head voice

Jiu Song – jiusongjd@gmail.com
Integrated Speech Research Lab
University of British Columbia
Vancouver, British Columbia, V6T 1Z4
Canada

Additional authors:
Jaida Siu – jaidasiu@gmail.com
Jahurul Islam – jahurul.islam@ubc.ca
Bryan Gick – gick@mail.ubc.ca

Popular version of 1aMU8 – Effect of years of voice training on chest and head register tongue shape variability
Presented at the 187th ASA Meeting
Read the abstract at https://eppro01.ativ.me/web/page.php?page=IntHtml&project=ASAFALL24&id=3767562

–The research described in this Acoustics Lay Language Paper may not have yet been peer reviewed–


Imagine being in a voice lesson, and as you try to hit a high note, your voice coach says, “suppress your tongue” or “pretend your tongue doesn’t exist!” What does this mean, and why do singers do this?

One vocal technique used by professional singers is to sing in different vocal registers. Generally, a man’s natural speaking voice and the voice people use to sing lower notes is called the chest voice—you can feel a vibration in your chest if you place your hand over it as you vocalize. When moving to higher notes, singers shift to their head voice, where vibrations feel stronger in the head. However, what role does the tongue play in this transition? Do all singers, including amateurs, naturally adjust their tongue when switching registers, or is this adjustment a learned skill?

Figure 1: Approximate location of feeling/sensation for chest and head voice.

We are interested in vowels and the pitch range during the passaggio, which is the shift or transition point between different vocal registers. The voice is very unstable and prone to audible cracking during the passaggio, and singers are trained to navigate it smoothly. We also know that different vowels are produced in different locations in the mouth and possess different qualities. One way that singers successfully navigate the passaggio is by altering the vowel through slight adjustments to tongue shape. To study this, we utilized ultrasound imaging to monitor the position and shape of the tongue while participants with varying levels of vocal training sang vowels across their pitch range, similar to a vocal warm-up.

Video 1: Example of ultrasound recording

The results indicated that, in head voice, the tongue is generally positioned higher in the mouth than in chest voice. Unsurprisingly, this difference is more pronounced for certain vowels than for others.

Figure 2: Tongue position in chest and head voice for front and back vowel groups. Overlapping shades indicate that there is virtually no difference.

Singers’ tongues are also shaped by training. Recall the voice coach’s advice to lower your jaw and tongue while singing—this technique is employed to create more space in the mouth to enhance resonance and vocal projection. Indeed, trained singers generally have a lower overall tongue position.

As professional singers’ transitions between registers sound more seamless, we speculated that trained singers would exhibit smaller differences in tongue position between registers than untrained singers, who have less developed tongue control. In fact, it turns out that the opposite is true: the tongue behaves differently in chest voice and head voice, but only for individuals with vocal training.

Figure 3: Tongue position in chest and head voice for singers with different levels of training.

In summary, our research suggests that tongue adjustments for register shifts may be a learned technique. The manner in which singers adjust their tongues for different vowels and vocal registers could be an essential component in achieving a seamless transition between registers, as well as in the effective use of various vocal qualities. Understanding the interactions among vowels, registers, and the tongue provides insight into the mechanisms of human vocal production and voice pedagogy.

Walk to the Beat: How Your Playlist Can Shape Your Emotional Balance

Man Hei LAW – mhlawaa@connect.ust.hk

Computer Science and Engineering, The Hong Kong University of Science and Technology, Hong Kong, -, -, Hong Kong

Andrew HORNER
Computer Science and Engineering
Hong Kong University of Science and Technology
Hong Kong

Popular version of 1aCA2 – Exploring the Therapeutic Effects of Emotion Equalization App During Daily Walking Activities
Presented at the 187th ASA Meeting
Read the abstract at https://eppro01.ativ.me/web/index.php?page=Inthtml&project=ASAFALL24&id=3771973

–The research described in this Acoustics Lay Language Paper may not have yet been peer reviewed–


During our daily tasks, we spend a lot of time getting things done. When walking, some people may find it boring and feel like time drags on. On the other hand, some see it as a chance to think and plan ahead. Our researchers believe that we can use this short period of time to help people rebalance their emotions. This way, individuals can feel refreshed and energized as they walk to their next destination.

Our idea is to provide each participant with a specific music playlist to listen to while walking. The playlists consisted of Uplifting, Relaxing, Angry, and Sad music, each lasting for 15 minutes. While our listeners were walking, they were using our Emotion Equalization App (Figures 1a to 1d) for accessing the playlist and collect all users’ data.

Figures 1a to 1d: The interface of the Emotion Equalization App

The key data we focused on was assessing the changes in emotions. To understand the listeners’ emotions, we used the Self-Assessment Manikin scale (SAM), a visual tool that helps depict emotions based on internal energy levels and mood positivity (refer to Figure 2). After the tests, we analyzed at how their emotions changed before and after listening to the music.

Figure 2: The Self-Assessment Manikin scale, showing energy levels at the top and mood positivity at the bottom [1]

The study found that the type of music influenced how far participants walked. Those listening to Uplifting music walked the farthest, followed by Angry, Relaxing, and Sad music. It was as expected that the music’s energy could affect the participants’ physical energy.

So, if music can affect physical energy, can it also have a positive effect on emotions? Can negative music help in mood regulation? An unexpected finding was that Angry music was found to be the most effective therapeutic music for walking. Surprisingly, listening to Angry music while walking not only elevated internal energy levels but also promoted positive feelings. On the other hand, Uplifting and Sad music only elicited positive emotions in listeners. However, Relaxing music during walking did not contribute to increased internal energy levels or positive feelings. This result breaks the impression on the therapeutic use of music while engaging in walking activities. Angry music has a negative vibe, but our study proved it to be beneficial in helping individuals relieve stress while walking, ultimately enhancing internal energy and mood.

If you’re having a tough day, consider listening to an Angry music playlist while taking a walk. It can help in balancing your emotions and uplifting your mood for your next activity.

[1] A. Mehrabian and J. A. Russell, An approach to environmental psychology. in An approach to environmental psychology. Cambridge, MA, US: The MIT Press, 1974, pp. xii, 266.

The Trump Rally Shooting: Listening to an Assassination Attempt

Robert C Maher – rmaher@montana.edu

Montana State University, Electrical and Computer Engineering Department, PO Box 173780, Bozeman, MT, 59717-3780, United States

Popular version of 3pSP10 – Interpreting user-generated recordings from the Trump assassination attempt on July 13, 2024
Presented at the 187th ASA Meeting
Read the abstract at https://eppro01.ativ.me//web/index.php?page=IntHtml&project=ASAFALL24&id=3771549

–The research described in this Acoustics Lay Language Paper may not have yet been peer reviewed–


On Saturday, July 13, 2024, thousands of supporters attended an outdoor rally held by presidential candidate Donald J. Trump at the Butler Farm Show grounds in Butler, Pennsylvania. Shortly after Mr. Trump began speaking, gunshots rang out. Several individuals in the crowd were seriously wounded and killed.

While the gunfire was clearly audible to thousands at the scene—and soon millions online—many significant details of the incident could only be discerned by the science of audio forensic analysis. More than two dozen mobile phone videos from rally attendees provided an unprecedented amount of relevant audio information for quick forensic analysis. Audio forensic science identified a total of ten gunshots: eight shots from a single location later determined to be the perpetrator’s perch, and two shots from law enforcement rifles.

In our era of rapid spread of speculative rumors on the internet, the science of audio forensics was critically important in quickly documenting and confirming the actual circumstances from the Trump rally scene.

 

Where did the shots come from?

Individuals near the stage described hearing pop pop pop noises that they reported to be “small-arms fire.” However, scientific audio forensic examination of the audio picked up by the podium microphones immediately revealed that the gunshot sounds were not small-arms fire as the earwitnesses had reported, but instead showed the characteristic sounds of supersonic bullets from a rifle.

When a bullet travels faster than sound, it creates a small sonic boom that moves with the bullet as it travels down range. A microphone near the bullet’s path will pick up the “crack” of the bullet passing by, and then a fraction of a second later, the familiar “bang” of the gun’s muzzle blast arrives at the microphone (see Figure 1).

Figure 1: Sketch depicting the position of the supersonic bullet’s shock wave and the firearm’s muzzle blast.

 

From the Trump rally, audio forensic analysis of the first audible shots in the podium microphone recording showed the “crack” sound due to the supersonic bullet passing the microphone, followed by the “bang” sound of the firearm’s muzzle blast. Only a small fraction of a second separated the “crack” and the “bang” for each audible shot, but the audio forensic measurement of those tiny time intervals (see Figure 2) was sufficient to estimate that the shooter was 130 meters from the microphone—a little more than the length of a football field away. The acoustic calculation prediction was soon confirmed when the body of the presumed perpetrator was found on a nearby rooftop, precisely that distance away from the podium.

Figure 2: Stereo audio waveform and spectrogram from podium microphone recording showing the first three shots (A, B, C), with manual annotation.

 

How many shots were fired?

The availability of nearly two dozen video and audio recordings of the gunfire from bystanders at locations all around the venue offered a remarkable audio forensic opportunity, and our audio forensic analysis identified a total of ten gunshots, labeled A-J in Figure 3.

Figure 3: User-generated mobile phone recording from a location near the sniper’s position, showing the ten audible gunshots.

 

The audio forensic analysis revealed that the first eight shots (labeled A-H) came from the identified perpetrator’s location, because all the available recordings gave the same time sequence between each of those first eight shots. This audio forensic finding was confirmed later when officials released evidence that eight spent shell casings had been recovered from the perpetrator’s location on the rooftop

Comparing the multiple audio recordings, the two additional audible shots (I and J) did not come from the perpetrator’s location, but from two different locations. Audio forensic analysis placed shot “I” as coming from a location northeast of the podium. Matching the audio forensic analysis, officials later confirmed that shot “I” came from a law enforcement officer firing toward the perpetrator from the ground northeast of the podium. The final audible shot “J” came from a location south of the podium. Again, consistent with the audio forensic analysis, officials confirmed that shot “J” was the fatal shot at the perpetrator by a Secret Service counter-sniper located on the roof of a building southeast of the podium.

Analysis of sounds from the Trump rally accurately described the location and characteristics of the audible gunfire, and helped limit the spread of rumors and speculation after the incident. While the unique audio forensic viewpoint cannot answer every question, this incident demonstrated that many significant details of timing, sound identification, and geometric orientation can be discerned and documented using the science of audio forensic analysis.

Please feel free to contact the author for more information.

How to find the best material for making exciter-based plate speakers

David Anderson – and10445@d.umn.edu

Instagram: @earthtoneselectronics
Assistant Professor- Electrical Engineering, University of Minnesota Duluth, Duluth, Minnesota, 55812, United States

Popular version of 2aEA1 – A Method for Comparing Candidate Materials in Subjective Tests of Flat-Panel Loudspeakers
Presented at the 187th ASA Meeting
Read the abstract at https://eppro01.ativ.me//web/index.php?page=Session&project=ASAFALL24&id=3771459

–The research described in this Acoustics Lay Language Paper may not have yet been peer reviewed–


Exciters are devices that can be stuck to just about anything in order to make a speaker. Many DIY speaker makers wonder what material is going to sound the best for their product or home speaker project. The sound of an exciter-based plate speaker depends on many things. These include the plate materials, size, shape, and where the exciter is attached. The fact that there are so many factors that control the sound makes it difficult to directly compare the sound of materials. For example, a plastic and aluminum plate of the same size and thickness will have completely different frequency ranges when set up as speakers. In this paper, a method is proposed to calculate the required shape and size of any set of materials so that speakers made from them will have the same loudness and frequency range and the effect of the materials on the speaker sound can be easily compared.

Equations derived in the paper demonstrate that the vibrations and volumes of plates made from different materials will match when they have the same length-to-width ratio and weight (volume times density). Three different materials (Foam poster board, plastic, and aluminum) were chosen for comparison in this paper because they are commonly used by DIY makers to create speakers. Figure 1 shows the simulated relative loudness over a range of audio frequencies for three different materials (foam poster board, plastic, and aluminum) with the same length-to-width ratio and weight. The loudness graphs mostly overlap, but the volumes diverge at high frequencies because the ring shape of the exciter interacts differently with each material. This effect can be mitigated by using a smaller exciter.

Figure 1 – Simulated speaker loudness using three different materials with matching length-to-width ratios and weights.

Simulated plate responses are then compared with experimentally measured loudness results using actual plates made from plastic, aluminum, and foam poster board. These comparisons shown in Figure 2 allow us to identify whether there are any material-specific deviations from the simulated response that would lend each material its unique “sound.”

Figure 2 – Simulated vs. experimentally measured plate loudness for three different materials.

The plastic and aluminum plates match their simulations closely. The aluminum plate has sharper peaks than the plastic plate, indicating a more “hollow” sound. The foam poster board does not match its simulation well, showing that this material adds a distinctive “color” to the sound at mid-range and high audio frequencies.

Applying this method to additional materials that DIY speaker builders use like wood, cardboard, and foam insulation can shed light on their unique “sounds” as well.

Enhancing Speech Recognition in Healthcare

Andrzej Czyzewski – andczyz@gmail.com

Gdańsk University of Technology, Faculty of Electronics, Telecommunications and Informatics, Multimedia Systems Department, Gdańsk, Pomerania, 80-233, Poland

Popular version of 1aSP6 – Strategies for Preprocessing Speech to Enhance Neural Model Efficiency in Speech-to-Text Applications
Presented at the 187th ASA Meeting
Read the abstract at https://eppro01.ativ.me//web/index.php?page=IntHtml&project=ASAFALL24&id=3771522

–The research described in this Acoustics Lay Language Paper may not have yet been peer reviewed–


Effective communication in healthcare is essential, as accurate information can directly impact patient care. This paper discusses research aimed at improving speech recognition technology to help medical professionals document patient information more effectively. By using advanced techniques, we can make speech-to-text systems more reliable for healthcare, ensuring they accurately capture spoken information.

In healthcare settings, professionals often need to quickly and accurately record patient interactions. Traditional typing can be slow and error-prone, while speech recognition allows doctors to dictate notes directly into electronic health records (EHRs), saving time and reducing miscommunication.

The main goal of our research was to test various ways of enhancing speech-to-text accuracy in healthcare. We compared several methods to help the system understand spoken language more clearly. These methods included different ways of analyzing sound, like looking at specific sound patterns or filtering background noise.

In this study, we recorded around 80,000 voice samples from medical professionals. These samples were then processed to highlight important speech patterns, making it easier for the system to learn and recognize medical terms. We used a method called Principal Component Analysis (PCA) to keep the data simple while ensuring essential information was retained.

Our findings showed that combining several techniques to capture speech patterns improved system performance. We saw an average accuracy improvement, with fewer word and character recognition errors.

The potential benefits of this work are significant:

  • Smoother documentation: Medical staff can record notes more efficiently, freeing up time for patient care.
  • Improved accuracy: Patient records become more reliable, reducing the chance of miscommunication.
  • Better healthcare outcomes: Enhanced communication can improve the quality of care.

This study highlights the promise of advanced speech recognition in healthcare. With further development, these systems can support medical professionals in delivering better patient care through efficient and accurate documentation.

 

Figure1. Frontpage of the ADMEDVOICE corpus containing medical text and their spoken equivalents