ASA Lay Language Papers
165th Acoustical Society of America Meeting

Voice Types of American English and the New Science of Vocal Typology

Tyler McPeek – tyler@floridalinguistics.com
James Harnsberger – jharnsberger@gmail.com
University of Florida
Linguistics Department
4131 Turlington Hall, PO Box 115454
Gainesville FL 32611-5454
Contact: Office (352-284-2431); Cell (609-240-2292)

Popular version of paper 4aSCb3
Presented Thursday morning, June 6, 2013
ICA 2013 Montreal

Are you a Barry White or a Gilbert Gottfried, a Marilyn Monroe or a Shirley Temple, a droner or a diva, an authoritarian or a nurturer? How similar or dissimilar is your voice to other voices and how might that affect how you are perceived by those around you? Are you uniquely suited with a voice type to be a policeman, a public safety announcer, a voice talent actor, a preacher, a teacher, or some other occupation? These are the types of questions that flow from the study of vocal typology. Individual voices are not uniformly similar to others, even when controlling for other speaker characteristics such as one’s gender, age, and dialect. Some speakers share common features, forming groups based on gross vocal similarity, and colloquial terms exist for these groups, such as “nasal”, “whiny”, “gravelly”, “droning”, “staccato”, and others. In the forensic domain, speaker identification is a very common analysis required of audio evidence in civil and criminal cases and, yet, the duration of the speech samples and their quality can often preclude a highly confident judgment of the match/mismatch to the voice of a defendant or a relevant party in the case. However, such evidence recordings may be of sufficient caliber to permit a match/mismatch determination on the basis of a grosser category, such as a voice type. The evaluation of voice talent is also a growing field given the increasing use of digital animation in the entertainment industry. While individual vocal attributes such as “pleasantness” or “authority” have been examined in prior work, there is currently no automatic method for classifying all of the relevant characteristics of a talented voice. The positing of a voice type taxonomy ultimately serves to reduce the vast number of speaker identities within a given gender/age/dialect subpopulation down to a manageable and useful number of categories.

To date, however, no attempt has been made to describe normal voices systematically. In this study, voice types were developed through extensive similarity judgments by human listeners of pairs of voices from a large database that represents healthy young and middle aged voices speaking American English. Using a statistical analysis that derives groupings from such data, a male and female vocal taxonomy was generated, consisting of ten male voice types and nine female types. An acoustic analysis was performed in order to describe the characteristics of each type. Five acoustic cues proved to be important in predicting the classification of an individual male or female voice into a voice type: speaking rate, voice quality, mean fundamental frequency (perceived as pitch), fundamental frequency variability, and vowel space area, referring to some acoustic consequences of greater enunciation

From these measures, a simple coding system was utilized for simple characterization (and naming) of the voice types:
•   Speaking Rate: S(low) vs. F(ast)
•   Voice Quality: R(ough) vs. Clear
•   Mean Pitch: H(igh) vs. L(ow)
•   Pitch Variability: M(onotone) vs. D(ynamic)
•   Articulatory Effort: E(nunciated) vs. R(educed)

The inventory of male and female voice types appears in Tables 1 and 2, respectively. The two taxonomies differed from one another in terms of the relative importance of the acoustic characteristics, the heavy reliance of female voices on fewer characteristics, and most importantly, the influence of speaker age. Male voice types used all five characteristics liberally, although a slow speaking rate characterized the two most populated voice types. Most male voice types also consisted of a mix of voices of different ages. For female voices, chronological age influenced distribution of voices into types, with younger voices grouped separately from middle aged voices, while male voice types were more heterogeneous with respect to age. In female voices, the aging process is more perceptible, and therefore, voice types were first grouped by age, and then into acoustically-defined types.

To hear samples of the American English types discovered in this study, please visit www.voicetypes.com.

Table 1

ASA Lay Language Papers 165th Acoustical Society of America Meeting

Voice Types of American English and the New Science of Vocal Typology

ASA Lay Language Papers
165th Acoustical Society of America Meeting