ASA Lay Language Papers
164th Acoustical Society of America Meeting

Training Speakers of Other Languages to Hear the Sounds of English

Charles S. Watson -- watson@indiana.edu
James D. Miller -- jamdmill@indiana.edu
Gary R. Kidd -- kidd@indiana.edu
Communication Disorders Technology, Inc. (CDT)
501 N. Morton Street, Sta 215
Bloomington, IN 45405

Popular version of paper 2aSC3
Presented Tuesday morning, October 23, 2012
164th ASA Meeting, Kansas City, Missouri

Background: Many adult learners of English have difficulty understanding English as spoken by native speakers. This is often true even though these adult learners of English have acquired basic or even advanced skills in reading and writing English. A commonly heard complaint is that native speakers of English "talk too fast." The problem faced by speakers of other languages learning to perceive spoken English is that early in life their brains have been trained to analyze sounds and cues used in their native languages that differ from those used in English. The differences between the sounds and cues used in various languages are subjects of continuing study as is the study of the development of first and second language perception (Best et al., 2011, Flege et al., 1997, Tsao, Liu, & Kuhl, 2004, Werker & Tees, 2002).

The Software Program: At CDT we have developed a software program, the Speech Perception Assessment and Training System for learners of English as a Second Language (SPATS-ESL). This system is designed for use by learners of English that have acquired reading and writing skills in English that have problems understanding English spoken by native speakers at customary speeds and have associated difficulties in pronunciation. SPATS-ESL enables the learner to master the sounds of English and the skills needed to indentify words in naturally spoken sentences presented in varying amounts of background noise. This computerized system allows the learners to work independently and at their own pace

The program trains the user to identify 109 basic sounds of spoken English. Spoken languages are made up of syllables and syllables can be broken into constituent parts: beginnings (consonants and consonant clusters, called onsets), middles (vowel-like sounds, called nuclei), and endings (consonants and consonant clusters, called codas). By our analyses there are 45 onsets, 28 nuclei, and 36 codas that are most important for spoken English. Also, within each constituent type we have ranked these speech sounds in their relative order of importance. Training is progressive and begins with the most important sounds of each type and progressively adds new sounds as performance criteria are met. Furthermore, training is adaptive as the training uses an algorithm called Adaptive Item Selection (AIS) that focuses the training on items of intermediate difficulty for each individual learner until that learner has mastered or nearly mastered all of the sounds in a set. Because of this and other adaptive features, the training program has been shown to be efficient no matter what the learner's original language and particular perceptual problems.

The program also trains the user to identify the words in simple, naturally spoken sentences presented in varying amounts of background noise (multi-talker babble). To identify the words in naturally spoken sentences the listener must be able to segment the speech stream into syllables and words and must be able to use linguistic context to infer words that may be masked by the background noise. SPATS-ESL has 1000 distinct sentences for use in sentence training.

All of materials used in SPATS-ESL were recorded by speakers of Middle American English. Syllable constituents were recorded in a variety of phonetic contexts by 8 talkers: 4 female, 4 male, 4 under forty years of age, and 4 over forty years of age. The sentences were recorded by 12 talkers similarly distributed in age and sex. These materials provide "high variability" training which has been found to be effective in training listeners to learn speech sound distinctions in a new language (e.g. see Bradlow et al., 1997). The program developers believe that mastery of one dialect of English with these materials provides skills that allow the learner to master other dialects of English in a manner similar to the way that native speakers of one English dialect adapt to other English dialects.

Results: The program has been used by over 200 students and adult learners of English with a wide variety of first languages, including Arabic, Cantonese, Japanese, Kazakh, Korean, Mandarin, Mongolian, Portuguese, Spanish, Taiwanese, Turkish, and Vietnamese. Almost all have achieved native or near-native performance on the SPATS-ESL tasks (both syllable constituents and sentences) after using the program for 15 to 30 hours. Most successful users work on the program about four hours per week until mastery is achieved in four to eight weeks.

Successful users report an increased awareness of the distinctions among English speech sounds and an improved ability to understand native speakers of English in everyday situations and in difficult listening situations, such as over cell-phones. They also say that they believe that the perceptual training will allow them to improve their own pronunciation. The developers believe that SPATS-ESL training will allow the student greater benefit from immersion programs as they will be better able to identify spoken words and thus be better able to learn their meaning. Also, the improved ability to distinguish speech sounds gained from SPATS-ESL training should help learners of English to monitor their new pronunciation skills and to better profit from pronunciation and accent reduction instruction.

References:

Best, C.T., Bradlow, A.R., Guion-Anderson, S., and Polka, L. (2011). Cross Language Speech Perception and Variations in Linguistic Experience. Special Issue: J.Phonetics 39(4), 453-710.

Bradlow, A. R., Pisoni, D. B., Akahane-Yamada, R., & Tohkura, Y. i. (1997). Training Japanese listeners to identify English /r/ and /l/: IV. Some effects of perceptual learning on speech production. Journal of the Acoustical Society of America, 101, 2299-2310.

Flege, J. E., Bohn, O. S. and Jang, S. (1997) Effects of experience on non-native speakers’ production and perception of English vowels. J. Phonetics 25, 437-470.
http://dx.doi.org/10.1006/jpho.1997.0052

Tsao, F.-M., Liu, H.-M. and Kuhl, P. K. (2004) Speech Perception in Infancy Predicts Language Development in the Second Year of Life: A Longitudinal Study, Child Development, 75, 1067-1084.

Werker, J. F., & Tees, R. C. (2002). Cross-language speech perception: Evidence for perceptual reorganization during the first year of life. Infant Behavior and Development, 25, 121-133.
http://dx.doi.org/10.1016/S0163-6383(02)00093-0

[ Lay Language Papers Index |

ASA Lay Language Papers 164th Acoustical Society of America Meeting

Training Speakers of Other Languages to Hear the Sounds of English

ASA Lay Language Papers
164th Acoustical Society of America Meeting