For many decades, speech scientists have marveled at the complexity of speech sounds. In English, a relatively simple task of distinguishing “bat” from “pat” can involve as many as 16 different sound cues. Also, English vowels are pronounced so differently across speakers that one person’s “Dan” can sound like another’s “done”. Despite all this, most adult native English speakers are able to understand English speech sounds rapidly, effortlessly, and accurately. In contrast, learning a new language is not an easy task, partly because the characteristics of foreign speech sounds are unfamiliar to us. For instance, Mandarin Chinese is a tonal language, which means that the pitch pattern used to produce each syllable can change the meaning of the word. Therefore, the word “ma” can mean “mother”, “hemp”, “horse”, or “to scold,” depending on whether the word was produced with a flat, rising, dipping, or a falling pitch pattern. It is no surprise that many native English speakers struggle in learning Mandarin Chinese. At the same time, some seem to master these new speech sounds with relative ease. With our research, we seek to discover the neural and genetic bases of this individual variability in language learning success. In this paper, we are focusing on genes that target activity of two distinct neural regions: prefrontal cortex and striatum.
Recent advances in speech science research strongly suggest that for adults, learning speech sounds for the first time is a cognitively challenging task. What this means is that every time you hear a new speech sound, a region of your brain called the prefrontal cortex – the part of the cerebral cortex that sits right under your forehead –¬ must do extra work to extract relevant sound patterns and parse them according to learned rules. Such activity in the prefrontal cortex is driven by dopamine, which is one of the many chemicals that the cells in your brain use to communicate with each other. In general, higher dopamine activity in the prefrontal cortex means better performance in complex and difficult tasks.
Interestingly, there is a well-studied gene called COMT that affects the dopamine activity level in the prefrontal cortex. Everybody has a COMT gene, although with different subtypes. Individuals with a subtype of the COMT gene that promotes dopamine activity perform hard tasks better than do those with other subtypes. In our study, we found that the native English speakers with the dopamine-promoting subtype of the COMT gene (40 out of 169 participants) learned Mandarin Chinese speech sounds better than those with different subtypes. This means that, by assessing your COMT gene profile, you might be able to predict how well you will learn a new language.
However, this is only half the story. While new learners may initially use their prefrontal cortex to discern foreign speech sound contrasts, expert learners are less likely to do so. As with any other skill, speech perception becomes more rapid, effortless, and accurate with practice. At this stage, your brain can bypass all that burdensome cognitive reasoning in the prefrontal cortex. Instead, it can use the striatum – a deep structure within the brain¬¬ – to directly decode the speech sounds. We find that the striatum is more active for expert learners of new speech sounds. Furthermore, individuals with a subtype of a gene called FOXP2 that promotes flexibility of the striatum to new experiences (31 out of 204 participants) were found to learn Mandarin Chinese speech sounds better than those with other subtypes.
Our research suggests that learning speech sounds in a foreign language involves multiple neural regions, and that genetic variations which affect the activity within those regions lead to better or worse learning. In other words, your genetic framework may be contributing to how well you learn to understand a new language. What we do not know at this point is how these variables interact with other sources of variability, such as prior experience. Previous studies have shown that extensive musical training, for example, can enhance learning speech sounds of a foreign language. We are a long way from cracking the code of how the brain, a highly complex organism, functions. We hope that a neurocognitive genetic approach may help bridge the gap between biology and language.
Han-Gyol Yi – firstname.lastname@example.org
W. Todd Maddox ¬– email@example.com
The University of Texas at Austin
2504A Whitis Ave. (A1100)
Austin, TX 78712
Valerie S. Knopik – firstname.lastname@example.org
Rhode Island Hospital
593 Eddy Street
Providence, RI 02093
John E. McGeary – email@example.com
Providence Veterans Affairs Medical Center
830 Chalkstone Avenue
Providence, RI 02098
Bharath Chandrasekaran – firstname.lastname@example.org
The University of Texas at Austin
2504A Whitis Ave. (A1100)
Austin, TX 78712
Popular version of paper 4aSCb16
Presented Thursday morning, October 30, 2014
168th ASA Meeting, Indianapolis
Consider a common scenario in a conversation: your friend is in the middle of asking you a question, and you already know the answer. To be polite, you wait to respond until your friend finishes the question. But what are you doing while you are waiting?
You might think that you are passively waiting for your turn to speak, but the results of this study suggest that you may be more impatient than you think. In analogous circumstances recreated experimentally, speakers move their vocal organs—i.e. their tongues, lips, and jaw—to positions that are appropriate for the sounds that they intend to produce in the near future. Instead of waiting passively for their turn to speak, they are actively preparing to respond.
To examine how speakers control their vocal organs prior to speaking, this study used real-time magnetic resonance imaging of the vocal tract. This recently developed technology takes a picture of tissue in middle of the vocal tract, much like an x-ray, and it takes the picture about 200 times every second. This allows for measurement of rapid changes in the positions of vocal organs before, during, and after people are speaking.
A video is available online (http://youtu.be/h2_NFsprEF0).
To understand how changes in the positions of vocal organs are related to different speech sounds, it is helpful to think of your mouth and throat as a single tube, with your lips at one end and the vocal folds at the other. When your vocal folds vibrate, they create sound waves that resonate in this tube. By using your lips and tongue to make closures or constrictions in the tube, you can change the frequencies of the resonating sound waves. You can also use an organ called the velum to control whether sound resonates in your nasal cavity. These relations between vocal tract postures and sounds provide a basis for extracting articulatory features from images of the vocal tract. For example, to make a “p” sound you close your lips, to make an “m” sound you close your lips and lower your velum, and to make “t” sound you press the tip of the tongue against the roof of your mouth.
Participants in this study produced simple syllables with a consonant and vowel (such as “pa” and “na”) in several different conditions. In one condition, speakers knew ahead of time what syllable to produce, so that they could prepare their vocal tract specifically for the response. In another condition, they produced the syllable immediately without any time for response-specific preparation. The experiment also manipulated whether speakers were free to position their vocal organs however they wanted before responding, or whether they were constrained by the requirement to produce the vowel “ee” before their response.
All of the participants in the study adopted a generic “speech-ready” posture prior to making a response, but only some of them adjusted this posture specifically for the upcoming response. This response-specific anticipation only occurred when speakers knew ahead of time exactly what response to produce. Some examples of anticipatory posturing are shown in the figures below.
Figure 2. Examples of anticipatory postures for “p” and “t” sounds. The lips are closer together in anticipation of “p” and the tongue tip is raised in anticipation of “t”.
Figure 3. Examples of anticipatory postures for “p” and “m” sounds. The velum is raised in anticipation of “p” and lowered in anticipation of “m”.
The surprising finding of this study was that only some speakers anticipatorily postured their vocal tracts in a response-specific way, and that speakers differed greatly in which vocal organs they used for this purpose. Furthermore, some of the anticipatory posturing that was observed facilitates production of an upcoming consonant, while other anticipatory posturing facilitates production of an upcoming vowel. The figure below summarizes these results.
Figure 4. Summary of anticipatory posturing effects, after controlling for generic speech-ready postures.
Why do some people anticipate vocal responses while others do not? Unfortunately, we don’t know: the finding that different speakers use different vocal organs to anticipate different sounds in an upcoming utterance is challenging to explain with current models of speech production. Future research will need to investigate the mechanisms that give rise to anticipatory posturing and the sources of variation across speakers.
Sam Tilsen – email@example.com
Peter Doerschuk – firstname.lastname@example.org
Wenming Luh – email@example.com
Robin Karlin – firstname.lastname@example.org
Hao Yi – email@example.com
Ithaca, NY 14850
Pascal Spincemaille – firstname.lastname@example.org
Bo Xu – email@example.com
Yi Wang – firstname.lastname@example.org
Weill Medical College
New York, NY 10065
Popular version of paper 2aSC8
Presented Tuesday morning, October 28, 2014
168th ASA Meeting, Indianapolis
The human voice is a pattern of sound generated by both the mind and body, and carries with it information about about a speaker’s mental and physical state. Qualities such as gender, age, physique, dialect, health, and emotion are often embedded in the voice, and can produce sounds that are comforting and pleasant, intense and urgent, sad and happy, and so on. The human voice can also project a sense of eeriness when the sound contains qualities that are human-like, but not necessarily typical of the speech that is heard on a daily basis. A person with an unusually large head and neck, for example, may produce highly intelligible speech, but it will be oddly dominated by low frequency sounds that belie the atypical size of the talker. Excessively slow or fast speaking rates, strangely-timed and irregular speech, as well as breathiness and tremor may all also contribute to an eeriness if produced outside the boundaries of typical speech.
The sound pattern of the human voice is produced by the respiratory system, the larynx, and the vocal tract. The larynx, located at the bottom of the throat, is comprised of a left and right vocal fold (often referred to as vocal cords) and a surrounding framework of cartilage and muscle. During breathing the vocal folds are spread far apart to allow for an easy flow of air to and from the lungs. To generate sound they are brought together firmly, allowing air pressure to build up below them. This forces the vocal folds into vibration, creating the sound waves that are the “raw material” to be formed into speech by the vocal tract. The length and mass of the vocal folds largely determine the vocal pitch and vocal quality. Small and light vocal folds will generally produce a high pitched sound, whereas low pitch typically originate with large, heavy vocal folds.
The vocal tract is the airspace created by the throat and the mouth whose shape at any instant of time depends on the positions of the tongue, jaw, lips, velum, and larynx. During speech it is a continuously changing tube-like structure that “sculpts” the raw sound produced by the vocal folds into a stream of vowels and consonants. The size and shape of the vocal tract imposes another layer of information about the talker. A long throat and large mouth may transmit the impression of a large body while more subtle characteristics like the contour of the roof of the mouth may add characteristics that are unique to the talker.
For this study, speech was simulated with a mathematical representation of the vocal folds and vocal tract. Such simulations allow for modifications of size and shape of structures, as well as temporal aspects of speech. The goal was to simulate extremes in vocal tract length, unusual timing patterns of speech movements, and odd combinations of breathiness and tremor. The result can be both eerie and amusing because the sounds produced are almost human, but not quite.
Three examples are included to demonstrate these effects. The first is set of seven simulations of the word “abracadabra” produced while gradually decreasing the vocal tract length from 22 cm to 6.6 cm, increasing the vocal pitch from very low to very high, and increasing the speaking rate from slow to fast. The longest and shortest vocal tracts are shown in Figure 1 and are both configured as “ah” vowels; for production of the entire word, the vocal tract shape continuously changes. The set of simulations can be heard in sound sample 1.
Although it may be tempting to assume that the changes present in sound sample 1 are similar to simply increasing the playback speed of the audio, the changes are based on physiological scaling of the vocal tract, vocal folds, as well as an increase in the speaking rate. Sound sample 2 contains the same seven simulations except that the speaking rate is exactly the same in each case, eliminating the sense of increased playback speed.
The third example demonstrates the effects of modifying the timing of the vowels and consonants within the word “abracadabra” while simultaneously adding a shaky or tremor-like quality, and an increased amount of breathiness. A series of six simulations can be heard in sound sample 3; the first three versions of the word are based on the structure of an unusually large male talker, whereas the second three are representative of an adult female talker.
This simulation model used for these demonstrations has been developed for purposes of studying and understanding human speech production and speech development. Using the model to investigate extreme cases of structure and unusual timing patterns is useful for better understanding the limits of human speech.
Figure 1 caption:
Unnaturally long and short tube-like representations of the human vocal tract. Each vocal tract is configured as an “ah” vowel (as in “hot”), but during speech the vocal tract continuously changes shape. Vocal tract lengths for typical adult male and adult female talkers are approximately 17.5 cm and 15 cm, respectively. Thus, the 22 cm long tract would be representative of a person with an unusually large head and neck, whereas the 6.6 cm vocal tract is even shorter than a typical infant.
Brad Story – email@example.com
Dept. of Speech, Language, and Hearing Sciences
University of Arizona
P.O. Box 210071
Tucson, AZ 85712
Popular version of paper 4pAAa10
Presented Thursday afternoon, October 30, 2014
168th ASA Meeting, Indianapolis
In focused ultrasound surgery (FUS), an ultrasound source radiates pressure waves into the patient’s body to achieve a desired therapeutic effect. FUS has already gained regulatory approval in the U.S. for treating uterine fibroids and pain palliation for bone metastases; other applications – including prostate cancer, liver cancer, and neurosurgery – remain active topics for clinical trials and research. Because applications of FUS often involve high intensity levels, insufficient knowledge of the acoustic field in the patient could lead to damage of healthy tissue away from the targeted treatment site. In this sense, high-intensity ultrasound treatments could cause collateral effects much like radiotherapy treatments that use ionizing radiation. In radiotherapy, treatment planning is critical for delivery of an effective and safe treatment: Typically, CT or MRI is used to form a virtual patient and the treatment is planned by computer-aided design. Simulations are used to plan the geometric, radiological, and dosimetric aspects of the therapy using radiation transport simulations. Analogous to a radiation beam, ultrasound therapy uses an acoustic beam as a 3D “scalpel” to treat tumors or other tissues. Accordingly, there is motivation to establish standard procedures for FUS treatment planning that parallel those in radiotherapy [1, 2]. However, such efforts toward treatment planning first require very precise knowledge of the source transducer in order to accurately predict the acoustic beam structure inside the patient.
Fig. 1 Acoustic holography to characterize an ultrasound source, with schematic illustration of the corresponding ultrasound field. A measured hologram in a plane can be used to reconstruct the entire wave field anywhere in 3D space.
Toward this end, it is instructive to recognize that ultrasound comprises pressure waves and thus possesses several basic features of wave physics that can be used in practice. One such feature is the potential to reproduce a 3D wave field from a 2D distribution of the wave amplitude and phase. This principle was made famous in optics by Dennis Gabor (Nobel Prize, 1971), who invented holography . A similar approach is possible in acoustics [4 – 8] and is illustrated in Fig. 1 for a therapeutic ultrasound source. To measure an acoustic hologram, a hydrophone (i.e., a microphone used underwater) can be scanned across a plane in front of the transducer. Because these measurements in 2D capture the whole field, this measured hologram can be used to reconstruct the surface vibrations of the source transducer. In turn, once the vibrations of the source are known, the corresponding acoustic field can be computed in water or tissue or any other medium with known properties.
Besides ultrasound surgery, holography techniques can be applied to characterize ultrasound transducers used for other therapeutic and diagnostic ultrasound-based applications. In this work we have used it for the first time to characterize a shock wave lithotripter source. Shock wave lithotripters radiate high intensity pulses that are focused on a kidney stone. High pressure, short rise time, and path-dependent nonlinearity make characterization in water and extrapolation to tissue difficult.
The electromagnetic lithotripter characterized in this effort is a commercial model (Dornier Compact S, Dornier MedTech GmbH, Wessling, Germany) with a 6.5 mm focal width. A broadband hydrophone (a fiber optic probe hydrophone, model FOPH 2000, RP Acoustics; Leutenbach, Germany) was used to sequentially measure the field over a set of points in a plane in front of the source. Following the previously developed transient holography approach, the recorded pressure field was numerically back-propagated to the source surface (Fig. 2). The method provides an accurate boundary condition from which the field in tissue can be simulated.
Fig. 2 Characterization of an electro-magnetic shock wave lithotripter. Top: A photo of the lithotripter head. Bottom: Holographically reconstructed peak-to-peak pressure along the transducer face.
In addition, we use acoustic holography to characterize imaging probes, which generate short, transient pulses of ultrasound (Fig. 3). Accurate 3D field representations have been confirmed .
Fig. 3 Characterization of a diagnostic imaging probe. Top: A photo of the HDI C5-2 probe, which was excited at a frequency of 2.3 MHz. Middle: Holographically reconstructed pattern of vibration velocities along the probe surface. Bottom: Corresponding phase distribution.
We believe that our research efforts on acoustic holography will make it possible in the near future for manufacturers to sell each medical ultrasound transducer with a “source hologram” as a part of its calibration. This practice will enable calculation of the 3D ultrasound and temperature fields produced by each source in situ, from which the “dose” delivered to a patient can be inferred with better accuracy than is currently achievable.
1. White PJ, Andre B, McDannold N, Clement GT. A pre-treatment planning strategy for high-intensity focused ultrasound (HIFU) treatments. Proceedings 2008 IEEE International Ultrasonics Symposium, 2056-2058 (2008).
2. Pulkkinen A, Hynynen K. Computational aspects in high intensity ultrasonic surgery planning. Comput. Med. Imaging Graph. 34(1), 69-78 (2010).
3. Gabor D. A new microscopic principle. Nature 161, 777-778 (1948).
4. Maynard JD, Williams EG, and Lee Y. Nearfield acoustic holography: I. Theory of generalized holography and the development of NAH. J. Acoust. Soc. Am. 78, 1395-1413 (1985).
5. Schafer ME, Lewin PA. Transducer characterization using the angular spectrum method. J. Acoust. Soc. Am. 85(5), 2202-2214 (1989).
6. Sapozhnikov O, Pishchalnikov Y, Morozov A. Reconstruction of the normal velocity distribution on the surface of an ultrasonic transducer from the acoustic pressure measured on a reference surface. Acoustical Physics 49(3), 354–360 (2003).
7. Sapozhnikov OA, Ponomarev AE, Smagin MA. Transient acoustic holography for reconstructing the particle velocity of the surface of an acoustic transducer. Acoustical Physics 52(3), 324–330 (2006).
8. Kreider W, Yuldashev PV, Sapozhnikov OA, Farr N, Partanen A, Bailey MR, Khokhlova VA. Characterization of a multi-element clinical HIFU system using acoustic holography and nonlinear modeling. IEEE Transactions on Ultrasonics, Ferroelectrics, and Frequency Control 60(8), 1683-1698 (2013).
9. Kreider W, Maxwell AD, Yuldashev PV, Cunitz BW, Dunmire B, Sapozhnikov OA, Khokhlova VA. Holography and numerical projection methods for characterizing the three-dimensional acoustic fields of arrays in continuous-wave and transient regimes. J. Acoust. Soc. Am. 134(5), Pt 2, 4153 (2013).
Oleg A. Sapozhnikov1,2, Sergey A. Tsysar1, Wayne Kreider2, Guangyan Li3,
Vera A. Khokhlova1,2, and Michael R. Bailey2,4
1Physics Faculty, Moscow State University, Leninskie Gory, Moscow 119991, Russia
2Center for Industrial and Medical Ultrasound, Applied Physics Laboratory, University of Washington, 1013 NE 40th Street, Seattle WA 98105, USA
3Department of Anatomy and Cell Biology, Indiana University School of Medicine, 635 Barnhill Dr. MS 5055 Indianapolis, IN 462025120
4Department of Urology, University of Washington Medical Center, 1959 NE Pacific Street, Box 356510, Seattle, WA 98195, USA
Traumatic brain injuries from blast exposure have been called the “signature injury” of military conflicts in Iraq and Afghanistan. This is largely due to our enemies’ unprecedented reliance on explosive weaponry such as improvised explosive devices (IED) (Figure 1). Estimates indicate that approximately 19.5% of deployed military personnel have suffered traumatic brain injuries since 2001 (Rand Report, Invisible Wounds of War, 2008). With more than 2 million service members deployed to Iraq and Afghanistan, this means that over 400,000 American Veterans are currently living with the chronic effects of blast exposure.
Figure 1 caption: An armored military vehicle lies on its side after surviving a buried IED blast on April 15, 2007. The vehicle was hit by a deeply buried improvised explosive device while conducting operations just south of the Shiek Hamed village in Iraq. Photograph courtesy of the U.S. Army: http://www.army.mil/article/9708/general-lee-rides-again/
When a blast wave from a high-intensity explosive impacts the head, a wave of intense heat and pressure moves through the skull and brain. Delicate neural tissues are stretched and compressed, potentially leading to cell damage, cell death, hemorrhaging, and inflammation. All regions of the brain are at risk of damage, and the auditory system is no exception. In recent years, increasing numbers of young Veterans with blast exposure have sought help from VA audiologists for hearing-related problems such as poor speech understanding. However, standard tests of hearing sensitivity often show no signs of hearing loss. This this combination of factors often suggests damage to areas in the brain dealing with auditory signals. The efforts of hearing health professionals to help these Veterans are hampered by a lack of information regarding the effects of blast exposure on auditory function. The purpose of this presentation is to present some early results of a study currently underway at the National Center for Rehabilitative Auditory Research (NCRAR) investigating the long term consequences of blast exposure on hearing. Discovering the types of auditory problems caused by blast exposure is a crucial step toward developing effective rehabilitation options for this population.
Study participants include Veterans who experienced high-intensity blast waves within the past twelve years. The majority of participants have experienced multiple blast episodes, with the most severe events occurring approximately eight years prior to enrolling in the study. Another group of participants of similar age and gender but with no blast exposure are also included to serve as comparisons to the blast-exposed group (controls). On questionnaires assessing hearing ability in different contexts, blast-exposed Veterans described having more difficulties in many listening situations compared control participants. Common challenging situations reported by blast-exposed Veterans involve understanding speech in background noise, understanding when multiple people are talking simultaneously, and recalling multiple spoken instructions. Further, blast-exposed Veterans are more likely to rate the overall quality of sounds such as music and voices more poorly than control participants, and often report that listening requires greater effort. Different tests of listening abilities found many areas of difficulty which probably help explain self-reports. First, blast-exposed Veterans often have poorer ability to distinguish timing cues than control participants. Hence, sounds may seem blurry or smeared over time. Second, the ability to process sounds presented to both ears is often poorer in blast-exposed Veterans. Normally, listeners are able to utilize small differences in the timing and level of sound arriving at the two ears to improve listening performance, especially in noisy listening environments. This ability is often degraded in blast-exposed Veterans. Third, blast-exposed Veterans are poorer at distinguishing changes in the pitch of sounds compared to control participants, even when the pitch change is large. Lastly, blast-exposed Veterans often have greater difficulty ignoring distracting information in order to focus on listening. This leads to problems such as trouble conversing with others when the television is on or when conversing at restaurants or parties. Our study results show that these listening difficulties are often great enough to impact daily life, causing blast-exposed Veterans to avoid social situations that they once enjoyed.
Figure 2 caption: Average EEG responses from blast-exposed and control participant groups in response to a large change in tone pitch. The horizontal axis shows time since the pitch changed (which occurred at time 0 on this axis). The vertical axis shows the magnitude of the neural response of the brain. Notice that the peak of activity in the control group (blue star labeled ‘P300’)is considerably larger and occurs earlier in time compared to the blast-exposed group (red star labeled ‘P300’).
Self-assessment and behavioral performance measures are supported by numerous direct measures of auditory processing. Using a type of electroencephalography (EEG), we non-invasively measure the response of the brain to sound by assessing the timing and size of neural activity associated with sound perception and processing. These tests reveal that the brains of blast-exposed Veterans require more time to analyze sound and respond less actively to changes in sounds. For example, the average EEG responses of blast-exposed and control participants are shown in Figure 2. These waveforms reflect neural detection of a large change in the pitch of tones presented to participants. Notice that the peak marked ‘P300’ is larger and occurs earlier in time in control participants compared to blast-exposed Veterans. Similar effects are seen in response to more complex sounds, such as when participants are asked to identify target words among non-target filler words. Overall, these EEG results suggest degraded sound processing in the brains of blast-exposed Veterans compared to control participants.
In summary, our results strongly suggest that blast exposure can cause chronic problems in multiple areas of the brain where sound is processed. Blast exposure has the potential to damage auditory areas of the brain as well as cognitive regions, both of which likely contribute to hearing difficulties. Thus, though Veterans may have normal hearing sensitivity, blast exposure may cause problems processing complex sounds. These difficulties may persist for many years after blast exposure.