Institut de recherche criminelle de la Gendarmerie Nationale (IRCGN)
Telecom Paris Tech
CNRS, LTCI, Telecom Paris Tech
Popular version of paper 5aSCa5
"The question of disguised voice"
Presented Friday morning, July 4, 2008
Many applications including bank, multimedia, biometrics, integrate speaker recognition in their area. The current performance of speaker recognition can be considered as sufficient in many fields, but in forensic sciences, caution must be a priority because of the lack of robustness of systems. Nevertheless the problematic of identification is essential in forensic sciences. Voice is often the only link between an offender and investigators in most cases. Different examples can be quoted in which offenders send anonymous calls in order to ask for a ransom or to claim a terrorism attack. In most of criminal cases, offenders try to disguise their voices before sending an anonymous or miscellaneous call. The principle is to send a call by transforming one’s own voice by simple techniques. The most common one is to modify the frequency of one’s voice (higher or lower) or to mask one’s own mouth by a handkerchief for instance.
LISTEN: Sound 1 (normal voice)
LISTEN: Sound 2
LISTEN: Sound 3
LISTEN: Sound 4
The main disguise techniques have been studied in order to evaluate the power of disguise detection and identification. The aim is to be able to decide if a voice is disguised or not because the main drawback of a voice in a case of identification is that there is no rule to say whether the voice is normal or disguised. In forensic sciences, the risk to confuse a disguised voice of a suspect with the normal voice of an innocent is very important to take into account. Eliminating this risk is the first ambition of this work.
The impact of a disguise is illustrated by the following figure n°1. This scheme proposes the performance of speaker recognition of 25 persons by an automatic system. We noticed a significant degradation after using a disguise. The result shows that after disguise we have a chance on two to make confusion. That is not acceptable in a forensic perspective.
So, different approaches are possible to detect if a voice is disguised, but the result are not sufficient because most of the techniques are based on one or two specific disguises. Our approach consists in applying a statistical method in order to evaluate the probability that a voice is disguised and what kind of disguise is used. The first step is to extract some good representative features of the voice. We chose MFCC (Mel Frequency Cepstral Coefficient) which provide very good results in the field of speaker discrimination.
Results are promising and presented below:
Figure n°2 reveals identification results based on a GMM (Gaussian Mixture Model) approach for the disguised voices studied which are the most common in criminal cases. GMM evaluate the probability that a speech segment is represented by a model of Gaussian components. Each disguise is preliminary modelled by Gaussian components.
Another way to detect or identify a disguised voice is to find the best boundaries between the distributions of each disguised features. This is the principle of SVM (Support Vector Machine), the results of which are presented in Figure n°3
This figure reveals a good level of performance except for a low pitched voice. In order to increase the operational factor of our experiment, we included a specific noise in our speech sample. The chosen noise is the babble noise. This kind of noise represents a noise encountered in a cocktail party for instance, that is to say different persons that talk simultaneously.
This is what we hear in the following sound.
LISTEN: babble noise
The results of identification based on a SVM classifier are represented in figure n°4
This figure reveals the degradation level of identification in a noising environment. This experiment shows the difficulties to identify with a good level of performance some specific disguises.
To conclude the question of disguised voice detection as well as identification is very important in forensic sciences in order to avoid confusion. The incredible development of new communication media such as PDA, mobile phone, internet, IP voice and so on, reveals a very important area, where the voice is the only way to identify a person. In addition, some new applications like virtual worlds (Second life) appear as a new space to communicate by forgering one’s own identity in order to commit crimes or offences. To be able to identify a disguise could open doors to reverse the disguise process and identify the speaker.