–The research described in this Acoustics Lay Language Paper may not have yet been peer reviewed–
Underwater Radiated Noise (URN) generated by naval vessels is critically important as it directly impacts survivability. Underwater Radiated Noise (URN) refers to the sound emitted by objects, like ships or submarines, into the water. This noise is generated by various sources, including the vessel’s machinery, propellers, and movement through water. It can be detected underwater, affecting their ability to remain undetected. So various studies have been conducted to reduce URN for submarines to maintain stealth and silence.
This study focuses on the ‘absorptive fluid silencer’ installed in piping to reduce noise from the complex machinery system. An absorptive fluid silencer is similar to a car muffler, reducing noise by placing sound-absorbing materials inside.
We measured how well the silencer reduced noise by comparing sound levels at the beginning and end of the silencer. Polyurethane, a porous elastic material, was used as the internal sound-absorbing material, and five types of absorbent materials suitable for actual manufacturing were selected. By applying a ‘global optimization method,’ we designed a high-performance ‘fluid silencer.’.
The above graph shows a partial analysis result, It can be observed that using composite absorbing materials provides superior sound absorption performance compared to using a single absorbing material.
–The research described in this Acoustics Lay Language Paper may not have yet been peer reviewed–
Have you ever listened to a song and later been surprised to hear the artist speak with a different accent than the one you heard in the song? Take country singer Keith Urban’s song “What About Me” for instance; when listening, you might assume that he has a Southern American (US) English accent. However, in his interviews, he speaks with an Australian English accent. So why did you think he sounded Southern?
Research suggests that specific accents or dialects are associated with musical genres [2], that singers adjust their accents based on genre [4]; and that foreign accents are more difficult to recognize in songs compared to speech [5]. However, when listeners perceive an accent in a song, it is unclear which type of information they rely on: the acoustic speech information or information about the musical genre. Our previous research investigated this question for Country and Reggae music and found that genre recognition may play a larger role in dialect perception than the actual sound of the voice [9].
Our current study explores American Blues and Folk music, genres that allow for easier separation of vocals from instrumentals, with more refined stimuli manipulation. Blues is strongly associated with African American English [3], while Folk can be associated with a variety of (British, American, etc.) dialects [1]. Participants listened to manipulated clips of sung and “spoken” lines taken from songs in both genres, which were transcribed for participants (see Figure 1). AI applications were used to remove instrumentals for both sung and spoken clips, while “spoken” clips also underwent rhythm and pitch normalization so that they sounded like spoken rather than sung speech. After hearing each sung or spoken line, participants were asked to identify the dialect they heard from six options [7, 8] (see Figure 2).
Figure 1: Participant view of a transcript from a Folk song clip.
Figure 2: Participant view of six dialect options after hearing a clip.
Participants were much more confident and accurate in categorizing accents for clips in the Sung condition, regardless of genre. The proportion of uncertainty (“Not Sure” responses) in the Spoken condition was consistent across genres (see “D” in Figure 3), suggesting that participants were more certain of dialect when musical cues were present. Dialect categories followed genre expectations, as can be seen from the increase in identifying African American English for Blues in the Sung condition (see “A”). Removing uncertainty by adding genre cues did not increase the likelihood of “Irish English” or “British English” being chosen for Blues, though it did for Folk (see “B” and “C” in Figure 3), in line with genre-based expectations.
Figure 3: Participant dialect responses.
These findings enhance our understanding of the relationship between musical genre and accent. Referring again to the example of Keith Urban, the singer’s stylistic accent change may not be the only culprit for our interpretation of a Southern drawl. Rather, we may have assumed we were listening to a musician with a Southern American English Accent when we heard the first banjo-like twang or tuned into iHeartCountry Radio. When we listen to a song and perceive a singer’s accent, we are not only listening to the sounds of their speech, but are also shaping our perception from our expectations of dialect based on the musical genre.
References:
Carrigan, J., Henry L. (2004). Lornell, kip. the NPR curious listener’s guide to american folk music. Library Journal (1976), 129(19), 63.
De Timmerman, Romeo, et al. (2024). The globalization of local indexicalities through music: African‐American English and the blues. Journal of Sociolinguistics, 28(1), 3–25. https://doi.org/10.1111/josl.12616.
Gibson, A. M. (2019). Sociophonetics of popular music: insights from corpus analysis and speech perception experiments [Doctoral dissertation, University of Canterbury]. http://dx.doi.org/10.26021/4007.
Mageau, M., Mekik, C., Sokalski, A., & Toivonen, I. (2019). Detecting foreign accents in song. Phonetica, 76(6), 429–447. https://doi.org/10.1159/000500187.
RStudio. (2020). RStudio: Integrated Development for R. RStudio, PBC, Boston, MA. http://www.rstudio.com/.
Stoet, G. (2010). PsyToolkit – A software package for programming psychological experiments using Linux. Behavior Research Methods, 42(4), 1096-1104.
Stoet, G. (2017). PsyToolkit: A novel web-based method for running online questionnaires and reaction-time experiments. Teaching of Psychology, 44(1), 24-31.
Walter, M., Bengtson, G., Maitinsky, M., Islam, M. J., & Gick, B. (2023). Dialect perception in song versus speech. The Journal of the Acoustical Society of America, 154(4_supplement), A161. https://doi.org/10.1121/10.0023131.
Department of Department of Computer Science and Engineering
The Hong Kong University of Science and Technology
Hong Kong SAR
Andrew Brian Horner horner@cse.ust.hk
Department of Department of Computer Science and Engineering
The Hong Kong University of Science and Technology
Hong Kong SAR
–The research described in this Acoustics Lay Language Paper may not have yet been peer reviewed–
Music speaks to us across cultures, but can the instruments we choose shape our emotions in different ways?
This study compares the emotional responses evoked by two similar yet culturally distinct string instruments: the Chinese erhu and the Western violin. Both are bowed string instruments, but they have distinct sounds and cultural roles that could lead listeners to experience different emotions. Our research focuses on whether these instruments, along with variations in performance and listener familiarity, influence emotional intensity in unique ways.
Western violin performance example: violinist Ray Chan playing ‘Mendelssohn Violin Concerto in E minor, Op. 64’
Chinese erhu performance example: erhu player Guo Gan playing the Chinese piece ‘Horse Racing’ (feat. Pianist Lang Lang)
To explore these questions, we conducted three online listening experiments. Participants were asked to listen to a series of short musical pieces performed on both the erhu and violin. They then rated each piece using two emotional measures: specific emotion categories (such as happy, sad, calm, and agitated) and emotional positivity and intensity.
Our results show clear emotional differences between the instruments. The violin often evokes positive, energetic emotions, which may be due to its bright tone and dynamic range. By contrast, the erhu tends to evoke sadness, possibly because of its softer timbre and its traditional association with melancholy in Chinese music.
Interestingly, familiarity with the instrument played a significant role in listeners’ emotional responses. Those who were more familiar with the violin rated the pieces as more emotionally intense, suggesting that cultural background and previous exposure shape how we emotionally connect with music. However, our analysis also found that different performances of the same piece generally did not change emotional ratings, emphasizing that the instrument itself is a major factor in shaping our emotional experience.
These findings open new paths for understanding how cultural context and personal experiences influence our emotional reactions to music. The distinct emotional qualities of the erhu and violin reveal how musical instruments can evoke different emotional responses, even when playing the same piece.
–The research described in this Acoustics Lay Language Paper may not have yet been peer reviewed–
Imagine being in a voice lesson, and as you try to hit a high note, your voice coach says, “suppress your tongue” or “pretend your tongue doesn’t exist!” What does this mean, and why do singers do this?
One vocal technique used by professional singers is to sing in different vocal registers. Generally, a man’s natural speaking voice and the voice people use to sing lower notes is called the chest voice—you can feel a vibration in your chest if you place your hand over it as you vocalize. When moving to higher notes, singers shift to their head voice, where vibrations feel stronger in the head. However, what role does the tongue play in this transition? Do all singers, including amateurs, naturally adjust their tongue when switching registers, or is this adjustment a learned skill?
Figure 1: Approximate location of feeling/sensation for chest and head voice.
We are interested in vowels and the pitch range during the passaggio, which is the shift or transition point between different vocal registers. The voice is very unstable and prone to audible cracking during the passaggio, and singers are trained to navigate it smoothly. We also know that different vowels are produced in different locations in the mouth and possess different qualities. One way that singers successfully navigate the passaggio is by altering the vowel through slight adjustments to tongue shape. To study this, we utilized ultrasound imaging to monitor the position and shape of the tongue while participants with varying levels of vocal training sang vowels across their pitch range, similar to a vocal warm-up.
Video 1: Example of ultrasound recording
The results indicated that, in head voice, the tongue is generally positioned higher in the mouth than in chest voice. Unsurprisingly, this difference is more pronounced for certain vowels than for others.
Figure 2: Tongue position in chest and head voice for front and back vowel groups. Overlapping shades indicate that there is virtually no difference.
Singers’ tongues are also shaped by training. Recall the voice coach’s advice to lower your jaw and tongue while singing—this technique is employed to create more space in the mouth to enhance resonance and vocal projection. Indeed, trained singers generally have a lower overall tongue position.
As professional singers’ transitions between registers sound more seamless, we speculated that trained singers would exhibit smaller differences in tongue position between registers than untrained singers, who have less developed tongue control. In fact, it turns out that the opposite is true: the tongue behaves differently in chest voice and head voice, but only for individuals with vocal training.
Figure 3: Tongue position in chest and head voice for singers with different levels of training.
In summary, our research suggests that tongue adjustments for register shifts may be a learned technique. The manner in which singers adjust their tongues for different vowels and vocal registers could be an essential component in achieving a seamless transition between registers, as well as in the effective use of various vocal qualities. Understanding the interactions among vowels, registers, and the tongue provides insight into the mechanisms of human vocal production and voice pedagogy.
–The research described in this Acoustics Lay Language Paper may not have yet been peer reviewed–
During our daily tasks, we spend a lot of time getting things done. When walking, some people may find it boring and feel like time drags on. On the other hand, some see it as a chance to think and plan ahead. Our researchers believe that we can use this short period of time to help people rebalance their emotions. This way, individuals can feel refreshed and energized as they walk to their next destination.
Our idea is to provide each participant with a specific music playlist to listen to while walking. The playlists consisted of Uplifting, Relaxing, Angry, and Sad music, each lasting for 15 minutes. While our listeners were walking, they were using our Emotion Equalization App (Figures 1a to 1d) for accessing the playlist and collect all users’ data.
Figures 1a to 1d: The interface of the Emotion Equalization App
The key data we focused on was assessing the changes in emotions. To understand the listeners’ emotions, we used the Self-Assessment Manikin scale (SAM), a visual tool that helps depict emotions based on internal energy levels and mood positivity (refer to Figure 2). After the tests, we analyzed at how their emotions changed before and after listening to the music.
Figure 2: The Self-Assessment Manikin scale, showing energy levels at the top and mood positivity at the bottom [1]
The study found that the type of music influenced how far participants walked. Those listening to Uplifting music walked the farthest, followed by Angry, Relaxing, and Sad music. It was as expected that the music’s energy could affect the participants’ physical energy.
So, if music can affect physical energy, can it also have a positive effect on emotions? Can negative music help in mood regulation? An unexpected finding was that Angry music was found to be the most effective therapeutic music for walking. Surprisingly, listening to Angry music while walking not only elevated internal energy levels but also promoted positive feelings. On the other hand, Uplifting and Sad music only elicited positive emotions in listeners. However, Relaxing music during walking did not contribute to increased internal energy levels or positive feelings. This result breaks the impression on the therapeutic use of music while engaging in walking activities. Angry music has a negative vibe, but our study proved it to be beneficial in helping individuals relieve stress while walking, ultimately enhancing internal energy and mood.
If you’re having a tough day, consider listening to an Angry music playlist while taking a walk. It can help in balancing your emotions and uplifting your mood for your next activity.
[1] A. Mehrabian and J. A. Russell, An approach to environmental psychology. in An approach to environmental psychology. Cambridge, MA, US: The MIT Press, 1974, pp. xii, 266.
–The research described in this Acoustics Lay Language Paper may not have yet been peer reviewed–
On Saturday, July 13, 2024, thousands of supporters attended an outdoor rally held by presidential candidate Donald J. Trump at the Butler Farm Show grounds in Butler, Pennsylvania. Shortly after Mr. Trump began speaking, gunshots rang out. Several individuals in the crowd were seriously wounded and killed.
While the gunfire was clearly audible to thousands at the scene—and soon millions online—many significant details of the incident could only be discerned by the science of audio forensic analysis. More than two dozen mobile phone videos from rally attendees provided an unprecedented amount of relevant audio information for quick forensic analysis. Audio forensic science identified a total of ten gunshots: eight shots from a single location later determined to be the perpetrator’s perch, and two shots from law enforcement rifles.
In our era of rapid spread of speculative rumors on the internet, the science of audio forensics was critically important in quickly documenting and confirming the actual circumstances from the Trump rally scene.
Where did the shots come from?
Individuals near the stage described hearing pop pop pop noises that they reported to be “small-arms fire.” However, scientific audio forensic examination of the audio picked up by the podium microphones immediately revealed that the gunshot sounds were not small-arms fire as the earwitnesses had reported, but instead showed the characteristic sounds of supersonic bullets from a rifle.
When a bullet travels faster than sound, it creates a small sonic boom that moves with the bullet as it travels down range. A microphone near the bullet’s path will pick up the “crack” of the bullet passing by, and then a fraction of a second later, the familiar “bang” of the gun’s muzzle blast arrives at the microphone (see Figure 1).
Figure 1: Sketch depicting the position of the supersonic bullet’s shock wave and the firearm’s muzzle blast.
From the Trump rally, audio forensic analysis of the first audible shots in the podium microphone recording showed the “crack” sound due to the supersonic bullet passing the microphone, followed by the “bang” sound of the firearm’s muzzle blast. Only a small fraction of a second separated the “crack” and the “bang” for each audible shot, but the audio forensic measurement of those tiny time intervals (see Figure 2) was sufficient to estimate that the shooter was 130 meters from the microphone—a little more than the length of a football field away. The acoustic calculation prediction was soon confirmed when the body of the presumed perpetrator was found on a nearby rooftop, precisely that distance away from the podium.
Figure 2: Stereo audio waveform and spectrogram from podium microphone recording showing the first three shots (A, B, C), with manual annotation.
How many shots were fired?
The availability of nearly two dozen video and audio recordings of the gunfire from bystanders at locations all around the venue offered a remarkable audio forensic opportunity, and our audio forensic analysis identified a total of ten gunshots, labeled A-J in Figure 3.
Figure 3: User-generated mobile phone recording from a location near the sniper’s position, showing the ten audible gunshots.
The audio forensic analysis revealed that the first eight shots (labeled A-H) came from the identified perpetrator’s location, because all the available recordings gave the same time sequence between each of those first eight shots. This audio forensic finding was confirmed later when officials released evidence that eight spent shell casings had been recovered from the perpetrator’s location on the rooftop
Comparing the multiple audio recordings, the two additional audible shots (I and J) did not come from the perpetrator’s location, but from two different locations. Audio forensic analysis placed shot “I” as coming from a location northeast of the podium. Matching the audio forensic analysis, officials later confirmed that shot “I” came from a law enforcement officer firing toward the perpetrator from the ground northeast of the podium. The final audible shot “J” came from a location south of the podium. Again, consistent with the audio forensic analysis, officials confirmed that shot “J” was the fatal shot at the perpetrator by a Secret Service counter-sniper located on the roof of a building southeast of the podium.
Analysis of sounds from the Trump rally accurately described the location and characteristics of the audible gunfire, and helped limit the spread of rumors and speculation after the incident. While the unique audio forensic viewpoint cannot answer every question, this incident demonstrated that many significant details of timing, sound identification, and geometric orientation can be discerned and documented using the science of audio forensic analysis.
Please feel free to contact the author for more information.