Acoustical Society of America- 138th Meeting Lay Language Papers -Finding the Missing Fundamental: A Connectionist Model of Harmonic detection

ASA PRESSROOM

Acoustical Society of America
138th Meeting Lay Language Papers


Finding the Missing Fundamental:
A Connectionist Model of Harmonic Detection

Clifford F. Lewis- clewis@gwis.com
Stephen B. Fountain
Dept. of Psych., Kent State University
P.O. Box 5190
Kent, OH 44242-0001

Michael K. McBeath
Dept. of Psych., Arizona State University
Tempe, AZ 85287-1104

Note: First author can be reached at (216) 371-3981

Popular version of paper 4pPP3
Presented Thursday afternoon, November 4, 1999
138th ASA Meeting, Columbus, Ohio

A neural network model can provide a possible explanation for virtual pitch. Also known as the missing fundamental, virtual pitch is one of the classic phenomena in auditory perception. The model demonstrates that the harmonics of complex tones form recognizable patterns, and it suggests that the brain may fill in missing parts of those patterns to create perceptions of virtual pitch.

Virtual pitch

Nearly all natural sounds that have a distinctive pitch-like quality consist of a combination of many frequencies, which are called harmonics. These harmonic frequencies are related in a simple way. The lowest-frequency harmonic is called the fundamental. The other harmonics are multiples of the fundamental; that is, two times the fundamental frequency, three times, four times, and so forth. When the pitch of a complex sound is compared to that of a single-frequency standard, people generally report that the pitches match if the fundamental frequency of the complex sound is the same as the frequency of the standard. When musical instruments play the same note, all of them produce the same fundamental frequency and the same harmonics. However, the intensities and phase angles of the harmonics differ for each type of instrument, and these differences give the instrument its unique sound, or timbre.

Virtual pitch presents an intriguing problem in understanding how people perceive the pitch of musical notes. Virtual pitch, or the missing fundamental, refers to the fact that if the fundamental frequency is filtered out of a complex sound, the perceived pitch usually does not change. The lowest harmonic is now twice the fundamental, which is the same frequency as the fundamental for a note an octave higher. In many ways, however, the missing fundamental still is more similar to the lower note than it is to the higher. For example, an A2 has a fundamental frequency of 110 Hz, with harmonics at 220, 330, 440, 550, etc. An A3, an octave above, has a fundamental of 220 Hz, with harmonics at 440, 660, 880, etc. Without the fundamental, the lowest harmonic in an A2 is 220 Hz, but the odd harmonics at 330, 550, etc. are still present. Given this combination of harmonics, people perceive hearing the 110 Hz fundamental. This seems counterintuitive, in that a person would perceive hearing the fundamental frequency when it isnt there.

One well-accepted theory of pitch perception is the place coding theory. According to this theory, portions of the basilar membrane of the inner ear are tuned to respond to different frequencies, with the vibrations occurring close to the entrance of the inner ear for high-frequency sounds, and farther away for low-frequency sounds. Neural receptors in the inner ear detect these vibrations, and the perceived pitch is determined by what position on the basilar membrane is vibrating the most. This model cannot by itself explain virtual pitch, however, because there is no sound energy to cause the basilar membrane to vibrate at the low-frequency position.

The appeal of neural networks

Neural networks, also known as connectionist models, are part of a growing trend toward the approach of modeling the brain with computers, and away from the approach of modeling the brain as a computer. (Scientists model many processes with computers, and those processes rarely resemble the operation of the computer itself.) Neural network models are based on the principle that the brain consists of many neurons that act simultaneously (in parallel), and that each neuron responds to simultaneous inputs from many other neurons. Neural networks are able to learn without following a specific set of programmed instructions. Instead, they react to a variety of inputs, adjusting the weights between individual neural units so as to identify patterns and form associations.

Most of these models are still much too simple to be regarded as realistic models of brain functions. However, they do often yield results that are suggestive of brain characteristics. They excel at making associations and identifying patterns. Their abilities to learn and to respond to stimuli are not constrained to the strict logic of a computer program. They show an ability to adapt to changes, and to recognize degraded or modified inputs.

The neural network pitch model

In the present model, each harmonic of a complex tone is modeled as an input level at one position on the network, in a manner analogous to the place-coding theory. An autoassociator model is used, which differs from other neural networks in that it is not given a "right" answer to learn for any of the inputs. Instead, the networks task is to learn to produce an output that matches the input. This goal is achieved by adjusting numerical weights between the input and the output. The larger the difference between the input and the output, the more the weights are adjusted. A diagram of the network is shown in Figure 1.

The network was "trained" using harmonic data for a variety of musical instruments. All of the A notes from A2 to A6 were used in training. Each input of the network corresponded to the intensity of one harmonic, with a total of 90 inputs. The network was trained by repeatedly inputting the harmonics of each instrument one at a time until the difference between input and output was small for the last instrument in the set.

For the test part of the experiment, the network was given the same inputs as in the training phase, but with zeros as input for the fundamental frequency of each note.

Results and Discussion

Results are given in Table 1.

The neural network model compensated for the missing fundamental, filling in the missing part of the pattern and producing an output that was very similar to the output produced when the fundamental was present. Even the output corresponding to the fundamental showed a significant activation.

If our brains have a similar ability to make associations between harmonics, then this model of pitch coding can explain the problem of virtual pitch. From the time we are born, and even before, we hear sounds that are rich in harmonics. Most of the time, the same harmonics are present for a given pitch, regardless of the sound source. The network model demonstrates that it is possible even for a simple network to learn to associate these co-occurring frequencies with each other. A missing fundamental (or any other missing harmonic) results in a degraded input. The network model recognizes the partial input and simply fills in the missing parts.

The model helps explain why low-quality sound systems, such as telephones, still produce satisfactory sounds even when low and high frequencies are not reproduced. It also helps explain why people with minor hearing losses do not notice their losses. According to this model, we dont notice the missing harmonics because our brains can fill in the missing parts, given a sufficient match to the harmonics that are present.



Figure 1. The autoassociator network. Each input receives the intensity of one harmonic. Higher pitched sounds receive zero input for harmonics that are not multiples of the fundamental. Each input node also receives feedback from one corresponding output node. 

Table 1. Average over all instruments of the first eight harmonics input to and output by the autoassociator network. The first line for each note gives the input levels for the complete sounds. Inputs are in sound pressure level, with 1.00 representing the pressure level of the most intense harmonic for each sound. The output line gives the output levels for the complete sounds. The missfund line gives the output levels when the input for the fundamental frequency is set to zero. Note that the output for the fundamental is identical whether or not the fundamental input is present, and is an order of magnitude higher than outputs for other harmonics that are not present. Outputs for all harmonics are an order of magnitude higher for harmonics that are present than for harmonics that are not present.
Type Note
H01
H02
H03
H04
H05
H06
H07
H08
input a2
0.537
0.5994
0.3502
0.3676
0.3701
0.2748
0.3002
0.1801
output a2
0.3153
0.5362
0.3108
0.3726
0.3221
0.2723
0.3189
0.1734
missfund a2
0.3153
0.1535
0.2052
0.3738
0.3524
0.3808
0.2604
0.1607
input a3
0
0.6568
0
0.5297
0
0.3815
0
0.1908
output a3
0.0372
0.3367
0.0216
0.4611
0.028
0.3362
-0.0162
0.1704
missfund a3
-0.0619
0.3367
-0.0255
0.3872
0.0397
0.2421
4E-05
0.1847
input a4
0
0
0
0.8349
0
0
0
0.4595
output a4
0.0208
-0.0032
0.0011
0.5229
0.0028
0.061
0.006
0.3965
missfund a4
0.0373
-0.0692
-0.0098
0.5229
-0.0205
-0.0227
0.029
0.2576
input a5
0
0
0
0
0
0
0
0.849
output a5
-0.0076
-0.0217
0.0076
0.1108
0.0019
0.0227
-0.0041
0.3968
missfund a5
-0.0146
0.0168
-0.0126
-0.0777
-0.0114
-0.0301
-0.0108
0.3968
input a6
0
0
0
0
0
0
0
0
output a6
0.0214
0.0856
-0.0074
-0.0897
0.0118
0.0025
-0.0155
0.4005
missfund a6
0.0749
-0.0655
0.0414
-0.0287
0.0092
-0.0166
0.0024
-0.0257