Keepin’ it Real:
The Importance of Maintaining the Natural Color Spectrum of Video Recordings
Authors:
Rachael Frush Holt, Ph.D.
Department of Speech and Hearing Sciences,
Indiana University
raholt@indiana.edu
Tessa Bent
Department of Speech and Hearing Sciences,
Indiana University
tbent@indiana.edu
Luis Hernandez
Department of Psychological and Brain Sciences,
Indiana University
hernande@indiana.edu
Popular version of paper 3pSC15
Presented Wednesday, May 20 at 1:00 p.m. in Galleria North and South
157th ASA Meeting, Portland, OR
Your mother was partly right. You should look at people when they are talking, but not because it’s rude to do otherwise, but because faces provide many more visual cues for understanding speech than just speech sounds alone.
Prior research has found that people with normal hearing, and hearing loss, can better understand speech when they can both hear and see the speaker than they can with either alone. This is particularly true when listeners are trying to decipher speech in a crowded environment with competing sounds like a busy bar or restaurant.
One component of the visual signal that has not been investigated is the naturalness of the color spectrum. The relative color differences across the face might be important for distinguishing between important visible articulators on the face, such as the deep red, pink or brown of the lips relative to the off-white of the teeth. The appearance of color can vary naturally with changes in light and artificially when cameras exaggerate the color of light. Previous research suggests that color itself does not play a large role in the visual perception of speech for very simple tasks. Videos of a person saying individual syllables (such as “ba”) were equally understandable whether the videos were in full color or gray-scale (like a black-and-white movie). Still, the naturalness of the colors across the image certainly could influence understanding of a spoken message in more realistic communication situations.
We altered the color spectrum of video recordings of woman saying simple 5- to 7-word sentences using a video editing technique called “color-correction.” This technique can alter the color spectrum across a video image to make it appear more natural. Sometimes film appears to exaggerate the color of the light, because it cannot reproduce color in the same way our visual system does. Video editing color-correction techniques can fix recording artifacts by digitally redistributing the color spectrum across the image to better match the true appearance of the object and/or person that was recorded. An example of a sentence that has and has not been color-corrected is shown below.
Both versions are good quality; it is just that one has a more natural color spectrum to the eye than the other.
In our first experiment at Indiana University, both versions of the sentences were presented to two groups of normal-hearing adults with normal vision, who were asked to repeat each sentence. We found that when the videos were played without sounds, participants better recognized the words from the color-corrected videos than the original recordings. In short, when the color spectrum was more natural, people were better lipreaders than when it was less natural. Further, although all speech sounds showed some improvement in identification, it was speech sounds that are visible on the face (for example, “b,” which is made by putting the lips together) that showed the most improvement. Finally, when the videos were played with sound, participants’ recognition of the words was not affected by whether the videos were color-corrected or not. Being able to hear the speech clearly was enough to overcome any problems identifying words in the non-color-corrected videos.
In the second experiment, we replaced the auditory speech with noise (which sounds like static) that followed the loudness contour of the speech to look at the effect of the color spectrum on understanding of audiovisual speech. “Loudness contour” refers to the fact that speech sounds fluctuate in level over the course of a word or sentence. Sounds like “s” and “t” are quiet relative to vowels, like “ee” and “ah.” The noise that was used followed the changes in loudness over the course of each sentence. An example of a sentence in which the speech was replaced by noise is shown below. The video on the left is a non-color-corrected video and one the right is a color-corrected video.
LISTEN and WATCH: Noisy non-color-corrected sentence (QUICKTIME)
LISTEN and WATCH: Noisy color-corrected sentence (QUICKTIME)
New participants were asked to repeat back each sentence in one of three conditions: noisy non-color-corrected, noisy color-corrected and noisy color-inverted. In the noisy color-inverted condition the color-corrected videos’ color spectra were inverted (similar to a photo negative). The color-inverted condition was included to evaluate large color spectrum changes on the understanding of audiovisual speech. An example of a noisy color-inverted sentence is shown below.
We found that as long as listeners can hear the auditory speech signal, even if it is severely degraded (as in Experiment 2 in which speech was replaced with noise), small changes in the naturalness of the color spectrum did not affect word recognition. However, if the color spectrum is completely inverted, participants had more difficulty understanding the sentences.
The results suggest that when people must use only visual cues for speech understanding, a natural color spectrum enhances the perception of visible speech cues. However, if the auditory speech signal is available, even if severely degraded (as in the case when people have cochlear implants), small disturbances to the natural color spectrum do not significantly influence speech understanding, whereas large disturbances to the color spectrum do impair audiovisual speech understanding.