Judith C.Brown, brown@media.mit.edu
Physics Dept, Wellesley College
Wellesley, MA 02181
I will be at 508 653 7345 until 5/30
Popular Version of Paper
2pMU5
Wednesday Afternoon, May 31
139th ASA Meeting, Atlanta, GA
Historically there has been great interest in the ability of humans to distinguish the sounds made by musical instruments. The group called the woodwinds is particularly interesting since they have similar attacks and decays (beginning and end of sound) and similar note ranges. All of the members of this instrument family consist of a cylindrical or conical tube excited by either a vibrating reed or a vibrating air column.
The question arises as to whether a computer might be able to distinguish among these same sounds. This question is currently of practical importance for its potential to free humans from time-consuming searches of the internet. The speech analog, computer identification of speakers, has been studied in depth because of its many practical applications, but there has been little work done on the identification of musical instruments by computer. See Brown (1999) for a summary.
Description of sound properties:
The current study focuses on four members of the woodwind family: the oboe, saxophone, clarinet, and flute. Sounds produced by each of these instruments are given below with each instrument playing the same note (middle C). These are chosen so the differences in sounds are only due to the sound qualities associated with each particular instrument.
Aiff files 16 bit mono for all samples.
Notice that each sound has a unique quality, called its timbre; and the challenge for a computer calculation is to find particular qualities ("features") of these sounds which distinguish them from one another. These features fall into two categories: those describing the changes in time and those describing frequency content. The time changes are straightforward characterizing the changes in loudness as the sound begins (the attack), is sustained, and dies away (the decay).
The left column of the figure below shows the amplitude plotted against time for the four sounds studied. In each there is a rapid attack, followed by a long sustain. The flute is very distinctive as the amplitude is changing periodically indicating the presence of tremulo, an amplitude variation often accompanying vibrato.
The sound spectrum, also called the "recipe" or frequency content, is a less familiar quantity and tells what frequencies are present in a sound. What we commonly call "the" frequency is the quantity which is changing when one whistles several notes of a melody. But within a single note, even though we perceive but one pitch (or frequency), there are almost always many frequencies present. In a sustained sound with non-varying loudness, it is the frequencies present which enable a listener to distinguish between a whistle and a soprano.
The right column in the figure is a plot of the sound spectrum for each of the four sounds studied. This is an average over the entire sound. The most important peaks are labeled in the figure for the flute at the bottom. It can be seen that these plots are very distinctive for each of these instruments.
Experiment and results:
Twenty five or more examples each of oboe, sax, clarinet, and flute sounds playing many notes in a musical context were collected from CD's and cassettes. Properties (features) related to the sound spectrum were calculated for these sounds, and a statistical comparison to the same properties for known examples of the instruments was carried out. The most successful features for classification gave correct results in the range 79-84 %, which compares favorably with results on human perception of these same instruments.
Reference:
Brown, J.C. (1999) ``Computer identification of musical instruments
using pattern recognition with cepstral coefficients as features''
J. Acoust. Soc. Am. 105, 1933-1941.