Training effects on speech prosody production by Cantonese-speaking children with autism spectrum disorder

Si Chen –

Bei Li

Fang Zhou

Angel Wing Shan Chan

Tempo Po Yi Tang

Eunjin Chun

Phoebe Choi

Chakling Ng

Fiona Cheng

Xinrui Gou


Department of Chinese and Bilingual Studies

The Hong Kong Polytechnic University

11 Yuk Choi Road, Hung Hom, Hong Kong, China


Popular version of paper 1aSCa3

Presented Monday, December 07, 2020, 9:30 AM – 10:15 AM EST

179th ASA Meeting, Acoustics Virtually Everywhere


Speakers can utilize prosodic variations to express their intentions, states and emotions. Specifically, the relatively new information of an utterance, namely the focus, is often associated with expanded range of prosodic cues. The main types of focus include broad, narrow, and contrastive focus. Broad focus involves focus in a whole sentence (A: What did you say? B: [Emily ate an apple]FOCUS), whereas narrow focus emphasizes one constituent asked in the question (A: What did Emily eat? B: Emily ate an [apple]FOCUS). Contrastive focus rejects alternative statements (A: Did Emily eat an orange? B: (No,) Emily ate an [apple]FOCUS).

Children with autism spectrum disorder (ASD) have been reported to show difficulties in using speech prosody to mark focus. The presented research aims to test whether speech training and sung speech training may improve the use of speech prosody to mark focus. Fifteen Cantonese-speaking ASD children finished pre- and post-training speech production tasks and received either speech or sung speech training. In the pre- and post- training speech production tasks, we designed games to measure participants’ ability to mark focus in conversations. In the training tasks, we improved the mapping between acoustic cues and information structure categories through a series of tasks. The conversations used in sung speech training were designed with melodies that imitated the change of acoustic cues in speech.

Training.mp4, An example of congruous and incongruous conversation pairs in sung speech training

Both of the two training methods consisted of three phases of training. In the first phase, participants listened to congruous conversations pairs attentively in a designed game. In the second phase, participants were told that the odd trial of conversation was incongruous (the focus type that the question elicited did not match with that of the answer), and the even trial was congruous. They need to attend to the differences between the odd and even trials. In the third phase, all the trials were presented in a random order. Participants needed to judge if a pair was congruous or not. Instant feedback was provided after each response.

We extracted acoustic cues from ASD children’s speech before and after training and performed statistical analyses. Our pilot results showed that both speech and sung speech training might have improved the use of prosodic cues such as intensity and f0 in marking focus across various focus positions (e.g. meanF0.tiff). However, ASD children may still have difficulties in integrating all the prosodic cues across focus conditions.


Mean f0 of narrow focus in the initial position before and after training



Share This