2pPP7 – Your ears never sleep: auditory processing of nonwords during sleep in children

Adrienne Roman – adrienne.s.roman@vumc.org
Carlos Benitez – carlos.r.benitez@vanderbilt.edu
Alexandra Key – sasha.key@vanderbilt.edu
Anne Marie Tharpe – anne.m.tharpe@vumc.org

The brain needs a variety of stimulation from the environment to develop and grow. The ability for the brain to change as a result of sensory input and experiences is often referred to as experience-dependent plasticity. When children are young, their brains are more susceptible to experience-dependent plasticity (e.g., Kral, 2013) so the quantity and quality of input is important. Because our ears are always “on”, our auditory system receives a lot of input to process, especially while we are awake. But, what can we “hear” when we are asleep? And, does what we hear while we are asleep help our brains develop?

Although there has been research in infants and adults examining the extent to which our brains process sounds during sleep, very little research has focused on young children, a group that sleeps a significant portion of their day (Paruthi et al., 2016). We decided to start our investigation by trying to answer the question, do children process and retain information heard during sleep? To investigate this question, we used electroencephalography (EEG) to measure the electrical activity of children’s brains in response to different sounds – sounds they heard when asleep and sounds they heard when awake.

First, during the child’s regular naptime, each child was hooked up to a portable EEG. Using EEG, a technician could tell us when the child went to sleep. Once asleep, we played the child three made-up words over and over in random order for ten minutes. Then, we let the child continue to sleep until he or she woke up.

When the children awoke from their naps, we took them to our EEG lab for event-related potential (ERP) testing. ERPs are segments of on-going EEG recordings appearing as waveforms that reflect the brain’s response to  events or stimulation (such as a sound played).

The children wore “hats” consisting of 128 spongy electrodes while listening to the same three made-up words heard during the nap mixed in with new made-up words that the children never heard before. We then analyzed the ERPs, to determine if the children’s brains responded differently to the words played during sleep than to the new words the children had not heard before. We were looking for ‘memory traces’ in the EEG that would indicate that the children ‘remembered’ the words heard while sleeping.

We found that children’s brains were able to differentiate the nonsensical words “heard” during the nap from the brand new words played during the ERP testing. This means that the brain did not just filter the information coming in, but also retained it long enough to recognize it after they woke up. This is the first step in understanding the impact of a child’s auditory environment during sleep on the brain.

Kral, A. (2013). Auditory critical periods: a review from system’s perspective. Neuroscience, 247, 117-133.

Paruthi, S., Brooks, L. J., D’Ambrosio, C., Hall, W. A., Kotagal, S., Lloyd, R. M.,

Malow, B. A., Maski, K., Nichols, C., Quan, S. F., Rosen, C. L., Troester, M. M., & Wise, M.S. (2016). Recommended amount of sleep for pediatric populations: a consensus statement of the American Academy of Sleep Medicine. Journal of clinical sleep medicine: JCSM: official publication of the American Academy of Sleep Medicine, 12(6), 785.

 

3pID2 – Yanny or Laurel? Acoustic and non-acoustic cues that influence speech perception

Brian B. Monson, monson@illinois.edu

Speech and Hearing Science
University of Illinois at Urbana-Champaign
901 S Sixth St
Champaign, IL 61820
USA

Popular version of paper 3pID2, “Yanny or Laurel? Acoustic and non-acoustic cues that influence speech perception”
Presented Wednesday afternoon, November 7, 1:25-1:45pm, Crystal Ballroom FE
176th ASA Meeting, Victoria, Canada

“What do you hear?” This question that divided the masses earlier this year highlights the complex nature of speech perception, and, more generally, each individual’s perception of the world.  From the yanny v. laurel phenomenon, it should be clear that what we perceive is dependent not only upon the physics of the world around us, but also upon our individual anatomy and individual life experiences. For speech, this means our perception can be influenced greatly by individual differences in auditory anatomy, physiology, and function, but also by factors that may at first seem unrelated to speech.

In our research, we are learning that one’s ability (or inability) to hear at extended high frequencies can have substantial influence over one’s performance in common speech perception tasks.  These findings are striking because it has long been presumed that extended high-frequency hearing is not terribly useful for speech perception.

Extended high-frequency hearing is defined as the ability to hear at frequencies beyond 8,000 Hz.  These are the highest audible frequencies for humans, are not typically assessed during standard hearing exams, and are believed to be of little consequence when it comes to speech.  Notably, sensitivity to these frequencies is the first thing to go in most forms of hearing loss, and age-related extended high-frequency hearing loss begins early in life for nearly everyone.  (This is why the infamous “mosquito tone” ringtones are audible to most teenagers but inaudible to most adults.)

Previous research from our lab and others has revealed that a surprising amount of speech information resides in the highest audible frequency range for humans, including information about the location of a speech source, the consonants and vowels being spoken, and the sex of the talker. Most recently, we ran two experiments assessing what happens when we simulate extended high-frequency hearing loss.  We found that one’s ability to detect the head orientation of talker is diminished without extended high frequencies.  Why might that be important?  Knowing a talker’s head orientation (i.e., “Is this person facing me or facing away from me?”) helps to answer the question of whether a spoken message is intended for you or someone else.  Relatedly, and most surprisingly, we found that restricting access to the extended high frequencies diminishes one’s ability to overcome the “cocktail party” problem.  That is, extended high-frequency hearing improves one’s ability to “tune in” to a specific talker of interest when many interfering talkers are talking simultaneously, as when attending a cocktail party or other noisy gathering.  Do you seem to have a harder time understanding speech at a cocktail party than you used to?  Are you middle-aged?  It may be that the typical age-related hearing loss at extended high frequencies is contributing to this problem.  Our hope is that assessment of hearing at extended high frequencies will become standard routine for audiological exams.  This would allow us to determine the severity of extended high-frequency hearing loss in the population and whether some techniques (e.g., hearing aids) could be used to address it.

Yanny or Laurel

Figure 1. Spectrographic representation of the phrase “Oh, say, can you see by the dawn’s early light.” While the majority of energy in speech lies below about 6,000 Hz (dotted line), extended high-frequency (EHF) energy beyond 8,000 Hz is audible and assists with speech detection and comprehension.

1pPPB – Emotion Recognition from Speaker-dependent low-level acoustic features

Tejal Udhan – tu13b@my.fsu.edu
Shonda Bernadin – bernadin@eng.famu.fsu.edu
FAMU-FSU College of Engineering,
Department of Electrical and Computer Engineering
2525 Pottsdamer Street Tallahassee
Florida 32310

Popular version of paper 1pPPB: ‘Speaker-dependent low-level acoustic feature extraction for emotion recognition’
Presented Monday afternoon May 7, 2018
175th ASA Meeting, Minneapolis

EmotionSpeech is a most common and fastest means of communication between humans. This fact compelled researchers to study acoustic signals as a fast and efficient means of interaction between humans and machines. For authentic human-machine interaction, the method requires that the machines should have the sufficient intelligence to recognize human voices and their emotional state. Speech emotion recognition, extracting the emotional state of speakers from acoustic data, plays an important role in enabling machines to be ‘intelligent’. Audio and speech processing provides better, noninvasive and easy to acquire solutions than other biomedical signals such as electrocardiograms (ECG), and electroencephalograms (EEG).

Speech is an informative source for the perception of emotions. For example, talking in a loud voice when feeling very happy, speaking in an uncharacteristically high pitched voice when greeting a desirable person, or the presence of vocal tremor when something fearful or sad have been experienced. This cognitive recognition of emotions in turn indicates that listeners are able to infer the emotional state of the speaker reasonably accurately even in the absence of visual presence of information [1]. This theory of cognitive emotion inference forms the basis for speech emotion recognition. Acoustic emotion recognition finds so many applications in modern world ranging from interactive entertainment systems, medical therapies and monitoring to various human safety devices.

We conducted some preliminary experiments to classify four human emotions anger, happy, sad and neutral (no emotion) in male and female speakers. We chose two simple acoustic features, pitch and intensity, for this analysis. The choice of features is based on readily available tools for their calculation. Pitch is a relative highness or lowness of a tone as perceived by the ear and intensity is the energy contained in speech as it is produced. Since these are one- dimensional features, they can be easily analyzed for any acoustic emotion recognition system. We designed decision-tree based algorithm in MATLAB to perform emotion classification. LDC Emotional Prosody Dataset samples are used for this experiment [2]. One sample of each emotion for one male and one female speaker are given below.

{audio missing}

We observed that male speaker does not have many variations in the pitch for all the emotions. The pitch is consistently similar for any given emotion. The median intensity over each emotion class, though changing, remains consistently similar to training data values. As a result, emotion recognition in male speaker has accuracy of 88% for acoustic test signals. Though pitch is almost similar, there is clear distinction in intensities for emotions happy and sad. This dissimilarity in intensity resulted in higher accuracy of emotion recognitions in male speaker data. For female speaker, the pitch ranges anywhere from 230 Hz to 435 Hz for three different emotions, namely, happy, sad and anger. Hence, the median intensity becomes the sole criterion for emotion recognition. The intensities for emotions, happy and angry are almost similar since both the emotions are high arousal emotions. This resulted in low accuracy of emotion recognition in female speaker of about 63%. The overall accuracy of emotion recognition using this method is 75%.

Emotion

Fig. 1. Emotion Recognition Accuracy Comparison

Our algorithm successfully recognized emotions in male speaker. Since the pitch is consistent across each emotion in male speaker, the selected features, pitch and intensity, resulted in better accuracy of emotion recognition. For female acoustic data, selected features are insufficient to describe the emotions and hence in future research of this work, other features which are independent of voice quality such as prosodic, formant or spectral features will be evaluated.

[1]Fonagy, I. Emotions, voice and music. Sundberg J (Ed.) Research aspects on singing. Royal Swedish Academy of Music and Payot, Stockholm and Paris; pp 51–79, 1981.
[2]Liberman, Mark, et al. Emotional Prosody Speech and Transcripts LDC2002S28. Web Download. Philadelphia: Linguistic Data Consortium, 2002.

1pPP – Trends that are shaping the future of hearing aid technology

Brent Edwards – Brent.Edwards@nal.gov.au

Popular version of paper 1pPPa, “Trends that are shaping the future of hearing aid technology”
Presented Monday afternoon, May 7, 2018, 1:00PM, Nicollet D2 Room
175th ASA Meeting, Minneapolis

Hearing aid technology is experiencing a faster rate of change than it has in the history of its existence. A primary reason for this is its convergence with consumer electronics, resulting in an acceleration of the pace of innovation and a change in its nature from incremental to disruptive.

Hearable and wearable technology are non-medical devices that use sensors to measure and inform the user about their biometric data in addition to providing other sensory information. Since hearing aids are worn every day and the ear is an ideal location to place many of these sensors, hearing aids have the potential to become the ideal form factor for consumer wearables. Conversely, hearable devices that augment and enhance audio for normal hearing consumers while also measuring their biometric data have the potential to become a new form of hearing aids for those with hearing loss, combining medical functionality of hearing loss compensation with such consumer functionality as speech recognition with always-on access to Siri. The photo below shows one hearable on the market that allows the wearer to measure their hearing with a smartphone app and adjust the audibility of the hearing to personalise the sound for the individual’s hearing ability, a process that has similarities to the fitting of a traditional hearing aid by an audiologist.

Hearing aid technologyAccelerating this convergence between medical and consumer hearing technologies is the recently passed congressional bill that mandates the creation of a new over-the-counter hearing aid that consumers can purchase in a store and fit their own prescription. E-health technologies already exist that allow a consumer to measure their own hearing loss and apply clinically-validated prescriptions to their hearable devices. This technology development will explode once over-the-counter hearing aids are a reality.

Deep science is also impacting hearing aid innovation. The integration of cognitive function with hearing aid technology will continue to be one of the strongest trends in the field. Neural measures of the brain using EEG have the potential to be used to fit hearing devices and also to demonstrate hearing aid benefit by showing how wearing devices affects activity in the brain. Brain sensors have been proven able to determine which talker a person is listening to, a capability that could be included in future hearing aids to enhance the speech from the desired talker and suppress all other sounds. Finally, science continues to advance our understanding of how hearing aid technology can benefit cognitive function. These scientific and other medical developments such as light-driven hearing aids will advance hearing aid benefit through the more traditional medical channel, complementing the advances on the consumer side of the healthcare delivery spectrum.

4aPP7 – The Loudness of an Auditory Scene

William A. Yost – william.yost@asu.edu
Michael Torben Pastore – m.torben.pastore@gmail,edu

Speech and Hearing Science
Arizona State University
PO Box 870102
Tempe AZ, 85287-0102

Popular version of paper 4aPP7
Presented Thursday morning, May 10, 2018
175th ASA Meeting, Minneapolis, MN

This paper is part of special session honoring Dr. Neil Viemeister, University of Minnesota, for his brilliant career. One of the topics Dr. Viemeister studies is loudness perception. Our presentation deals with the perceived loudness of an auditory scene when several people talk at about the same time. In the real world, the sounds of all the talkers are combined into one complex sound before they reach a listener’s ears. The auditory brain sorts this single complex sound into acoustic “images“, where each image represents the sound of one of the talkers. In our research, we try to understand how many such images can be ”pulled out” of an auditory scene so that they are perceived as separate, identifiable talkers.

In one type of simple experiment listeners are asked to determine how many more talkers it takes for listeners to notice that the number of talkers has increased. When we increase the number of talkers, the additional talkers make the overall sound louder and the change in loudness can be used as a cue to help listeners discriminate which sound has more talkers. If we make the overall loudness of a four-talker scene (as an example) and a six-talker scene (as an example) the same, the loudness of the individual talkers in the six-talker scene will be less than the loudness of the individual talkers in the four-talker scene.

If listeners can focus on the individual talkers in the two scenes, they might be able to use the change in loudness of individual talkers as a cue for discrimination. If listeners cannot focus on individual talkers in a scene, then the two scenes may not be discriminable and they are likely to be judged as equally loud. We have found that listeners can make loudness judgments of the individual talkers for scenes of two or three talkers, but not more. This indicates that the loudness of a complex sound may depend on how well the individual components of the sound are perceived and, if so, that only two or three such components (images, talkers) can be processed by the auditory brain at a given time.

Trying to listen to one or more people in a situation of many people talking at the same time is difficult, especially for people who are hard of hearing. If the normal auditory system can only process a few sound sources presented at the same time, this reduces the complexity of devices (e.g., hearing aids) that might be designed to help people with hearing impairment process sounds in complex acoustic environments. In auditory virtual reality (AVR) scenarios, there is a computational cost associated with processing each sound source. If an AVR system only has to process a few sound sources to mimic normal hearing, it would be a lot less expensive than if the system has to process many sound sources.  (Supported by grants from National Institutes of Health, NIDCD and Oculus VR, LLC)

1aPP7- Say what? Brief periods of hearing loss in childhood can have consequences later in life

Kelsey L Anbuhl – kla@nyu.edu
Daniel J Tollin – Daniel.tollin@ucdenver.edu

Department of Physiology & Biophysics
University of Colorado School of Medicine
RC1-N, 12800 E 19th Avenue
Aurora, CO 80045

Popular version of paper 1aPP7
Presented Monday morning, May 7, 2018
175th ASA Meeting, Minneapolis, MN

The sense of hearing enables us to effortlessly and precisely pinpoint the sounds around us. Even in total darkness, listeners with normal, healthy hearing can distinguish sound sources only inches apart. This remarkable ability depends on the coordinated use of sounds at the two ears, known as binaural hearing. Binaural hearing helps us to discern and learn speech sounds as infants, to listen to the teacher’s voice rather than the chatter of nearby students as children, and to navigate and communicate in a noisy world as adults.

For individuals with hearing loss, these tasks are notoriously more challenging, and often remain so even after treatment with hearing aids or other assistive devices. Classrooms, restaurants, and parties represent troublesome settings where listening is effortful and interrupted. Perplexingly, some individuals that appear to have normal hearing (as assessed with an audiogram, a common test of hearing) experience similar difficulties, as if the two ears are not working together. Such binaural hearing difficulties can lead to a diagnosis of Central Auditory Processing Disorder, CAPD. CAPD is defined by auditory deficits that are not explained by typical hearing loss (as would be seen on an audiogram), and indicates dysfunction in the auditory brain. Prevalence of CAPD has been estimated at 5-20% in adults and ~5-7% in children.

Interestingly, CAPD is especially prevalent in children that have experienced frequent ear infections during the first few years of life. Ear infections can lead to a temporary conductive hearing loss from the buildup of fluid in the middle ear (called otitis media with effusion) which prevents sound from reaching the inner ear normally. For children who experience repeated ear infections, the developing auditory brain might receive unbalanced input from the two ears for weeks or months at a time. While infections generally dissipate later in childhood and the audiograms of both ears return to normal, early disruptions in auditory input could have lasting consequences for the binaural centers of the brain.

We hypothesized that persistent conductive hearing loss (such as that caused by ear infections) disrupts the fine-tuning of binaural hearing in the developing auditory system. Using an animal model (the guinea pig), we found that chronic conductive hearing loss during development (induced by an earplug) caused the brain to generate an altered representation of auditory space. When the hearing loss was reversed by simply removing the earplug, the brain misinterpreted the normal sounds arriving at the two ears and the animals consequently pinpointed sounds less precisely; in fact, animals were ~threefold worse at a simple sound location discrimination task than animals that had not worn earplugs, as if the sense of auditory space had been blurred.  These results provide a model for CAPD; a child with CAPD may struggle to understand a teacher because a less precise (“blurry”) representation of sound location in the brain makes it difficult to disentangle the teacher’s voice from competing sounds (Figure 1). Overall, the results suggest that experiencing even temporary hearing loss during early development can alter the normal maturation of the auditory brain.  These findings underscore the importance of early detection and treatment of hearing loss.

Figure 1: A typical classroom is an acoustically complex environment that can be difficult for a child with CAPD. Children with normal binaural hearing (blue shading) can separate the teacher’s voice from background noise sources, but those with impaired binaural hearing (red shading) may have a much harder time accomplishing this task.