Computer Science and Engineering, The Hong Kong University of Science and Technology, Hong Kong, -, -, Hong Kong
Andrew HORNER
Computer Science and Engineering
Hong Kong University of Science and Technology
Hong Kong
Popular version of 1aCA2 – Exploring the Therapeutic Effects of Emotion Equalization App During Daily Walking Activities
Presented at the 187th ASA Meeting
Read the abstract at https://doi.org/10.1121/10.0034927
–The research described in this Acoustics Lay Language Paper may not have yet been peer reviewed–
During our daily tasks, we spend a lot of time getting things done. When walking, some people may find it boring and feel like time drags on. On the other hand, some see it as a chance to think and plan ahead. Our researchers believe that we can use this short period of time to help people rebalance their emotions. This way, individuals can feel refreshed and energized as they walk to their next destination.
Our idea is to provide each participant with a specific music playlist to listen to while walking. The playlists consisted of Uplifting, Relaxing, Angry, and Sad music, each lasting for 15 minutes. While our listeners were walking, they were using our Emotion Equalization App (Figures 1a to 1d) for accessing the playlist and collect all users’ data.
Figures 1a to 1d: The interface of the Emotion Equalization App
The key data we focused on was assessing the changes in emotions. To understand the listeners’ emotions, we used the Self-Assessment Manikin scale (SAM), a visual tool that helps depict emotions based on internal energy levels and mood positivity (refer to Figure 2). After the tests, we analyzed at how their emotions changed before and after listening to the music.
Figure 2: The Self-Assessment Manikin scale, showing energy levels at the top and mood positivity at the bottom [1]
The study found that the type of music influenced how far participants walked. Those listening to Uplifting music walked the farthest, followed by Angry, Relaxing, and Sad music. It was as expected that the music’s energy could affect the participants’ physical energy.
So, if music can affect physical energy, can it also have a positive effect on emotions? Can negative music help in mood regulation? An unexpected finding was that Angry music was found to be the most effective therapeutic music for walking. Surprisingly, listening to Angry music while walking not only elevated internal energy levels but also promoted positive feelings. On the other hand, Uplifting and Sad music only elicited positive emotions in listeners. However, Relaxing music during walking did not contribute to increased internal energy levels or positive feelings. This result breaks the impression on the therapeutic use of music while engaging in walking activities. Angry music has a negative vibe, but our study proved it to be beneficial in helping individuals relieve stress while walking, ultimately enhancing internal energy and mood.
If you’re having a tough day, consider listening to an Angry music playlist while taking a walk. It can help in balancing your emotions and uplifting your mood for your next activity.
[1] A. Mehrabian and J. A. Russell, An approach to environmental psychology. in An approach to environmental psychology. Cambridge, MA, US: The MIT Press, 1974, pp. xii, 266.
Montana State University, Electrical and Computer Engineering Department, PO Box 173780, Bozeman, MT, 59717-3780, United States
Popular version of 3pSP10 – Interpreting user-generated recordings from the Trump assassination attempt on July 13, 2024
Presented at the 187th ASA Meeting
Read the abstract at https://doi.org/10.1121/10.0035346
–The research described in this Acoustics Lay Language Paper may not have yet been peer reviewed–
On Saturday, July 13, 2024, thousands of supporters attended an outdoor rally held by presidential candidate Donald J. Trump at the Butler Farm Show grounds in Butler, Pennsylvania. Shortly after Mr. Trump began speaking, gunshots rang out. Several individuals in the crowd were seriously wounded and killed.
While the gunfire was clearly audible to thousands at the scene—and soon millions online—many significant details of the incident could only be discerned by the science of audio forensic analysis. More than two dozen mobile phone videos from rally attendees provided an unprecedented amount of relevant audio information for quick forensic analysis. Audio forensic science identified a total of ten gunshots: eight shots from a single location later determined to be the perpetrator’s perch, and two shots from law enforcement rifles.
In our era of rapid spread of speculative rumors on the internet, the science of audio forensics was critically important in quickly documenting and confirming the actual circumstances from the Trump rally scene.
Where did the shots come from?
Individuals near the stage described hearing pop pop pop noises that they reported to be “small-arms fire.” However, scientific audio forensic examination of the audio picked up by the podium microphones immediately revealed that the gunshot sounds were not small-arms fire as the earwitnesses had reported, but instead showed the characteristic sounds of supersonic bullets from a rifle.
When a bullet travels faster than sound, it creates a small sonic boom that moves with the bullet as it travels down range. A microphone near the bullet’s path will pick up the “crack” of the bullet passing by, and then a fraction of a second later, the familiar “bang” of the gun’s muzzle blast arrives at the microphone (see Figure 1).
Figure 1: Sketch depicting the position of the supersonic bullet’s shock wave and the firearm’s muzzle blast.
From the Trump rally, audio forensic analysis of the first audible shots in the podium microphone recording showed the “crack” sound due to the supersonic bullet passing the microphone, followed by the “bang” sound of the firearm’s muzzle blast. Only a small fraction of a second separated the “crack” and the “bang” for each audible shot, but the audio forensic measurement of those tiny time intervals (see Figure 2) was sufficient to estimate that the shooter was 130 meters from the microphone—a little more than the length of a football field away. The acoustic calculation prediction was soon confirmed when the body of the presumed perpetrator was found on a nearby rooftop, precisely that distance away from the podium.
Figure 2: Stereo audio waveform and spectrogram from podium microphone recording showing the first three shots (A, B, C), with manual annotation.
How many shots were fired?
The availability of nearly two dozen video and audio recordings of the gunfire from bystanders at locations all around the venue offered a remarkable audio forensic opportunity, and our audio forensic analysis identified a total of ten gunshots, labeled A-J in Figure 3.
Figure 3: User-generated mobile phone recording from a location near the sniper’s position, showing the ten audible gunshots.
The audio forensic analysis revealed that the first eight shots (labeled A-H) came from the identified perpetrator’s location, because all the available recordings gave the same time sequence between each of those first eight shots. This audio forensic finding was confirmed later when officials released evidence that eight spent shell casings had been recovered from the perpetrator’s location on the rooftop
Comparing the multiple audio recordings, the two additional audible shots (I and J) did not come from the perpetrator’s location, but from two different locations. Audio forensic analysis placed shot “I” as coming from a location northeast of the podium. Matching the audio forensic analysis, officials later confirmed that shot “I” came from a law enforcement officer firing toward the perpetrator from the ground northeast of the podium. The final audible shot “J” came from a location south of the podium. Again, consistent with the audio forensic analysis, officials confirmed that shot “J” was the fatal shot at the perpetrator by a Secret Service counter-sniper located on the roof of a building southeast of the podium.
Analysis of sounds from the Trump rally accurately described the location and characteristics of the audible gunfire, and helped limit the spread of rumors and speculation after the incident. While the unique audio forensic viewpoint cannot answer every question, this incident demonstrated that many significant details of timing, sound identification, and geometric orientation can be discerned and documented using the science of audio forensic analysis.
Please feel free to contact the author for more information.
Instagram: @earthtoneselectronics
Assistant Professor- Electrical Engineering, University of Minnesota Duluth, Duluth, Minnesota, 55812, United States
Popular version of 2aEA1 – A Method for Comparing Candidate Materials in Subjective Tests of Flat-Panel Loudspeakers
Presented at the 187th ASA Meeting
Read the abstract at https://doi.org/10.1121/10.0035130
–The research described in this Acoustics Lay Language Paper may not have yet been peer reviewed–
Exciters are devices that can be stuck to just about anything in order to make a speaker. Many DIY speaker makers wonder what material is going to sound the best for their product or home speaker project. The sound of an exciter-based plate speaker depends on many things. These include the plate materials, size, shape, and where the exciter is attached. The fact that there are so many factors that control the sound makes it difficult to directly compare the sound of materials. For example, a plastic and aluminum plate of the same size and thickness will have completely different frequency ranges when set up as speakers. In this paper, a method is proposed to calculate the required shape and size of any set of materials so that speakers made from them will have the same loudness and frequency range and the effect of the materials on the speaker sound can be easily compared.
Equations derived in the paper demonstrate that the vibrations and volumes of plates made from different materials will match when they have the same length-to-width ratio and weight (volume times density). Three different materials (Foam poster board, plastic, and aluminum) were chosen for comparison in this paper because they are commonly used by DIY makers to create speakers. Figure 1 shows the simulated relative loudness over a range of audio frequencies for three different materials (foam poster board, plastic, and aluminum) with the same length-to-width ratio and weight. The loudness graphs mostly overlap, but the volumes diverge at high frequencies because the ring shape of the exciter interacts differently with each material. This effect can be mitigated by using a smaller exciter.
Figure 1 – Simulated speaker loudness using three different materials with matching length-to-width ratios and weights.
Simulated plate responses are then compared with experimentally measured loudness results using actual plates made from plastic, aluminum, and foam poster board. These comparisons shown in Figure 2 allow us to identify whether there are any material-specific deviations from the simulated response that would lend each material its unique “sound.”
Figure 2 – Simulated vs. experimentally measured plate loudness for three different materials.
The plastic and aluminum plates match their simulations closely. The aluminum plate has sharper peaks than the plastic plate, indicating a more “hollow” sound. The foam poster board does not match its simulation well, showing that this material adds a distinctive “color” to the sound at mid-range and high audio frequencies.
Applying this method to additional materials that DIY speaker builders use like wood, cardboard, and foam insulation can shed light on their unique “sounds” as well.
Gdańsk University of Technology, Faculty of Electronics, Telecommunications and Informatics, Multimedia Systems Department, Gdańsk, Pomerania, 80-233, Poland
Popular version of 1aSP6 – Strategies for Preprocessing Speech to Enhance Neural Model Efficiency in Speech-to-Text Applications
Presented at the 187th ASA Meeting
Read the abstract at https://doi.org/10.1121/10.0034984
–The research described in this Acoustics Lay Language Paper may not have yet been peer reviewed–
Effective communication in healthcare is essential, as accurate information can directly impact patient care. This paper discusses research aimed at improving speech recognition technology to help medical professionals document patient information more effectively. By using advanced techniques, we can make speech-to-text systems more reliable for healthcare, ensuring they accurately capture spoken information.
In healthcare settings, professionals often need to quickly and accurately record patient interactions. Traditional typing can be slow and error-prone, while speech recognition allows doctors to dictate notes directly into electronic health records (EHRs), saving time and reducing miscommunication.
The main goal of our research was to test various ways of enhancing speech-to-text accuracy in healthcare. We compared several methods to help the system understand spoken language more clearly. These methods included different ways of analyzing sound, like looking at specific sound patterns or filtering background noise.
In this study, we recorded around 80,000 voice samples from medical professionals. These samples were then processed to highlight important speech patterns, making it easier for the system to learn and recognize medical terms. We used a method called Principal Component Analysis (PCA) to keep the data simple while ensuring essential information was retained.
Our findings showed that combining several techniques to capture speech patterns improved system performance. We saw an average accuracy improvement, with fewer word and character recognition errors.
The potential benefits of this work are significant:
Smoother documentation: Medical staff can record notes more efficiently, freeing up time for patient care.
Improved accuracy: Patient records become more reliable, reducing the chance of miscommunication.
Better healthcare outcomes: Enhanced communication can improve the quality of care.
This study highlights the promise of advanced speech recognition in healthcare. With further development, these systems can support medical professionals in delivering better patient care through efficient and accurate documentation.
Figure1. Frontpage of the ADMEDVOICE corpus containing medical text and their spoken equivalents
Department of Communicative Disorders and Sciences, University at Buffalo, Buffalo, NY, 14214, United States
Popular version of 4aSCb6 – Age and category structure in phonetic category learning
Presented at the 186th ASA Meeting
Read the abstract at https://doi.org/10.1121/10.0027460
–The research described in this Acoustics Lay Language Paper may not have yet been peer reviewed–
Imagine being a native English speaker learning to speak French for the first time. You’ll have to do a lot of learning, including learning new ways to fit words together to form sentences and a new set of words. Beyond that, though, you must also learn to tell sounds apart that you’re not used to. Even the French word for “sound”, son, is different from the word for “bucket”, seau, in a way that English speakers don’t usually pay attention to. How do you manage to learn to tell these sounds apart when you’re listening to others? You need to group those sounds into categories. In this study, museum and library visitors interacting with aliens in a simple game helped us to understand which categories that people might find harder to learn. The visitors were of many different ages, which allowed us to see how this might change as we get older.
One thing that might help would be if you come with knowledge that certain types of categories are impossible. If you’re in a new city trying to choose a restaurant, it can be really daunting if you decide to investigate every single restaurant in the city. The decision becomes less overwhelming if you narrow yourself to a specific cuisine or neighborhood. Similarly, if you’re learning a new language, it might be very difficult if you entertain every possible category, but limiting yourself to certain options might help. My previous research (Heffner et al., 2019) indicated that learners might start the language learning process with biases against complicated categories, like ones that you need the word “or” to describe. I can describe a day as uncomfortable in its temperature if it is too hot or too cold. We compared these complicated categories to simple ones and saw that the complicated ones were hard to learn.
In this study, I studied this sort of bias across lots of different ages. Brains change as we grow into adulthood and continue to change as we grow older. I was curious whether the bias we have against those certain complicated categories would shift with age, too. To study this, I enlisted visitors to a variety of community sites, by way of partnerships with, among others, the Buffalo Museum of Science, the Rochester Museum and Science Center, and the West Seneca Public Library, all located in Western New York. My lab brought portable equipment to those sites and recruited visitors. The visitors were able to learn about acoustics, a branch of science they had probably not heard much about before; the community spaces got a cool, interactive activity for their guests; and we as the scientists got access to a broader population than we could get sitting inside the university.
Figure 1. The three aliens that my participants got to know over the course of the experiment. Each alien made a different combination of sounds, or no sounds at all.
We told the visitors that they were park rangers in Neptune’s first national park. They had to learn which aliens in the park made which sounds. The visitors didn’t know that the sounds they were hearing were taken from German. Over the course of the experiment, they learned to group sounds together according to categories that we made up in the German speech sounds. What we found is that learning of simple and complicated categories was different across ages. Nobody liked the complicated categories. Everyone, no matter their age, found them difficult to learn. However, the responses to the simple categories differed a lot depending on the age. Kids found them very difficult, too, but learning got easier for the teens. Learning peaked in young adulthood, then was a bit harder for those in older age. This suggests that the brain systems that help us learn simple categories might change over time, while everyone seems to have the bias against the complicated categories.
Figure 2. A graph, created by me, showing how accurate people were at matching the sounds they heard with aliens. There are three pairs of bars, and within each pair, the red bars (on the right) show the accuracy for the simple categories, while the blue bars (on the left) show the accuracy for the complicated categories. The left two bars show participants aged 7-17, the middle two bars show participants aged 18-39, and the right two show participants aged 40 and up. Note that the simple categories are easier than the complicated ones for participants above 18, while for those younger than 18, there is no difference between the categories.