2aAAa7 – Gunshot recordings from a criminal incident: who shot first?

Robert C. Maher – rob.maher@montana.edu
Electrical & Computer Engineering Department
Montana State University
P.O. Box 173780
Bozeman, MT 59717-3780

Popular version of paper 2aAAa7, “Gunshot recordings from a criminal incident: Who shot first?”
Presented Tuesday morning, May 24, 2016, 10:20 AM, Salon E
171st ASA Meeting, Salt Lake City

In the United States, criminal actions involving firearms are of ongoing concern to law enforcement and the public.  The FBI’s 2013 National Incident-Based Reporting System (NIBRS) report lists 50,721 assault incidents and 30,915 robbery incidents involving firearms that year [1].

As more and more law enforcement officers wear vest cameras and more and more citizens carry smartphones, the number of investigations involving audio forensic evidence continues to grow—and in some cases the audio recordings may include the sound of gunshots.

Is it possible to analyze a forensic audio recording containing gunshot sounds to discern useful forensic evidence?  In many cases the answer is yes.

Audio forensics, or forensic acoustics, involves evaluation of audio evidence for either a court of law or for some other official investigation [2].  Experts in audio forensics typically have special knowledge, training, and experience in the fields of acoustics, electrical engineering, and audio signal processing.

One common request in audio forensic investigations involving gunshots is “who fired first?”  There may be a dispute about the circumstances of a firearms incident, such as one party claiming that shots were fired in self-defense after the other party fired first, while the other party has the opposite claim.  Sometimes a dispute can arise if a witness reports that a law enforcement officer shot an armed but fleeing suspect without justification, while the officer claims that the suspect had fired.


Figure 1: Muzzle blast recording of a 9mm handgun obtained under controlled conditions [4].

The sound of a gunshot is often depicted in movies and computer games as a very dramatic “BOOM” sound that lasts for as long as a second before diminishing away.  But the actual muzzle blast of a common handgun is really only about 1 millisecond (one 1/1000th of a second) in duration (see Figure 1).  More than 20-30 meters away, most of the audible sound of a gunshot is actually sound waves reflected by nearby surfaces [3].

Let’s consider a simplified case example from an investigation in an unnamed jurisdiction.  In this case, a shooting incident on a city street involving two perpetrators was recorded by a residential surveillance system located down the street.  The camera’s field-of-view did not show the incident, but the microphone picked up the sounds of gunfire.  Based on witness reports and the identification of shell casings and other physical evidence at the scene, the police investigators determined that the two perpetrators were several meters apart and fired their handguns toward each other, one pointing north and the other pointing south.  Figuring out which gun was fired first could not be determined from the physical evidence at the scene nor from witness testimony, so attorneys for the suspects requested analysis of the audio recording to determine whether or not it could help answer the “who shot first?” question.

The waveform and the corresponding spectrogram from the portion of the recording containing the first two gunshot sounds are shown in Figure 2.  The spectrogram is a special kind of graph that depicts time on the horizontal axis and frequency on the vertical axis, with the brightness of the graph indicating the amount of sound energy present at a particular time in a particular frequency range.  The sound energy envelope for this same signal is shown in Figure 3.  The microphone picked up the direct sound of the gunshots, but also the reflected sound from the street, nearby buildings, and other obstacles, causing the relatively long duration of the two shots in the recording.

In this case, we note that the first gunshot has a distinctive echo (indicated by the arrow), while the second gunshot does not show this feature.  What might account for this peculiar difference?


Figure 2:  Sound waveform and spectrogram of two gunshots recorded by a residential surveillance system.  The arrow indicates the distinctive echo.


Figure 3:  Sound energy envelope for the two gunshots depicted in Figure 2.  The arrow indicates the echo.

Examining the neighborhood street where the shooting incident took place (Figure 4) revealed that there was a “T” intersection about 90 meters north of the shooting scene with a large building facing the street.  The length of the reflected sound path from the shooting site to the large building and back is therefore a little over 180 meters, which corresponds to the 0.54 seconds of time delay between the direct sound of the gunshot an the echo—which would account for the timing of the distinct reflection.  The microphone was located 30 meters south of the shooting scene.  But why would the observed reflection differ for the two firearms if they were located quite close together at the time of the gunfire?


Figure 4:  Sketch of the shooting scene (plan view)

Our conclusion was that the firearm pointing north toward the “T” intersection would likely produce a stronger reflection because the muzzle blast of a handgun is louder in the direction the gun is pointing [5]. Thus, the gun pointing toward the reflecting surface would produce a stronger reflected sound than the gun pointing away from the reflecting surface.

While the availability of additional acoustic evidence of firearm incidents can only be a positive development for the U.S. justice system, interpreting audio recordings of gunshots remains a challenge for audio forensic examiners for several reasons. First, the acoustical characteristics of gunshots must be studied carefully because the recording is likely to include sound reflections, diffraction, reverberation, background sounds, and other content that can interfere with the analysis.  Second, common audio recorders are intended for speech signals, and therefore they are not designed to capture the very brief and very intense sounds of gunfire.  Finally, the acoustical similarities and differences among different types of firearms are still the subject of research, so the notion of having a simple database of gunshot sounds to compare with an evidentiary recording is not yet feasible.


[1]  U.S. Department of Justice, 2013 National Incident-Based Reporting System (NIBRS) Data Tables (2013). Available at https://www.fbi.gov/about-us/cjis/ucr/nibrs/2013/data-tables . Accessed May 6, 2016.

[2]  Maher, R.C., Lending an ear in the courtroom: forensic acoustics, Acoustics Today, vol. 11, no. 3, pp. 22-29, 2015.

[3]  Maher, R.C., Acoustical characterization of gunshots, Proceedings of the IEEE SAFE Workshop on Signal Processing Applications for Public Security and Forensics, Washington, DC, pp. 109-113 (2007).

[4]  Maher, R.C. and Shaw, S.R., Gunshot recordings from digital voice recorders, Proceedings of the Audio Engineering Society 54th Conference, Audio Forensics—Techniques, Technologies, and Practice, London, UK (2014).

[5]  Maher, R.C. and Shaw, S.R., Directional aspects of forensic gunshot recordings, Proceedings of the Audio Engineering Society 39th Conference, Audio Forensics—Practices and Challenges, Hillerød, Denmark (2010).

1aSC9 – Challenges when using mobile phone speech recordings as evidence in a court of law

Balamurali B. T. Nair – bbah005@aucklanduni.ac.nz
Esam A. Alzqhoul – ealz002@aucklanduni.ac.nz
Bernard J. Guillemin – bj.guillemin@auckland.ac.nz

Dept. of Electrical & Computer Engineering,
Faculty of Engineering,
The University of Auckland,
Private Bag 92019, Auckland Mail Centre,
Auckland 1142, New Zealand.

Phone: (09) 373 7599 Ext. 88190
DDI: (09) 923 8190
Fax: (09) 373 7461

Popular version of paper 1aSC9 Impact of mismatch conditions between mobile phone recordings on forensic voice comparison
Presented Monday morning, October 27, 2014
168th ASA Meeting, Indianapolis

When Motorola’s vice president, Martin Cooper, made his first call from a mobile phone device, which priced about four thousand dollars back in 1983, one could not have imagined then that in just a few decades mobile phones would become a crucial and ubiquitous part of everyday life. Not surprisingly this technology is also being increasingly misused by the criminal fraternity to coordinate their activities, which range from threatening calls, to ransoms and even bank frauds and robberies.

Recordings of mobile phone conversations can sometimes be presented as major pieces of evidence in a court of law. However, identifying a criminal by their voice is not a straight forward task and poses many challenges. Unlike DNA and finger prints, an individual’s voice is far from constant and exhibits changes as a result of a wide range of factors. For example, the health condition of a person can substantially change his/her voice, and as a result the same words spoken on one occasion would sound different on another.

The process of comparing voice samples and then presenting the outcome to a court of law is technically known as forensic voice comparison. This process begins by extracting a set of features from the available speech recordings of an offender, whose identity obviously is unknown, in order to capture information that is unique to their voice. These features are then compared using various procedures with those of the suspect charged with the offence.

One approach that is becoming widely accepted nowadays amongst forensic scientists for undertaking forensic voice comparison is known as the likelihood ratio framework. The likelihood ratio addresses two different hypotheses and estimates their associated probabilities. First is the prosecution hypothesis which states that suspect and offender voice samples have the same origin (i.e., suspect committed the crime). Second is the defense hypothesis that states that the compared voice samples were spoken by different people who just happen to sound similar.

When undertaking this task of comparing voice samples, forensic practitioners might erroneously assume that mobile phone recordings can all be treated in the same way, irrespective of which mobile phone network they originated from. But this is not the case. There are two major mobile phone technologies currently in use today: the Global System for Mobile Communications (GSM) and Code Division Multiple Access (CDMA), and these two technologies are fundamentally different in the way they process speech. One difference, for example, is that the CDMA network incorporates a procedure for reducing the effect of background noise picked up by the sending-end mobile microphone, whereas the GSM network does not. Therefore, the impact of these networks on voice samples is going to be different, which in turn will impact the accuracy of any forensic analysis undertaken.

Having two mobile phone recordings, one for the suspect and another for the offender that originate from different networks represent a typical scenario in forensic case work. This situation is normally referred to as a mismatched condition (see Figure 1). Researchers at the University of Auckland, New Zealand, have conducted a number of experiments to investigate in what ways and to what extent such mismatch conditions can impact the accuracy and precision of a forensic voice comparison. This study used speech samples from 130 speakers, where the voice of each speaker had been recorded on three occasions, separated by one month intervals. This was important in order to account for the variability in a person’s voice which naturally occurs from one occasion to another. In these experiments the suspect and offender speech samples were processed using the same speech codecs as used in the GSM and CDMA networks. Mobile phone networks use these codecs to compress speech in order to minimize the amount of data required for each call. Not only this, the speech codec dynamically interacts with the network and changes its operation in response to changes occurring in the network. The codecs in these experiments were set to operate in a manner similar to what happens in a real, dynamically changing, mobile phone network.

mobile phone

Typical scenario in a forensic case work

The results suggest that the degradation in the accuracy of a forensic analysis under mismatch conditions can be very significant (as high as 150%). Surprisingly, though, these results also suggest that the precision of a forensic analysis might actually improve. Nonetheless, precise but inaccurate results are clearly undesirable. The researchers have proposed a strategy for lessening the impact of mismatch by passing the suspect’s speech samples through the same speech codec as the offender’s (i.e., either GSM or CDMA) prior to forensic analysis. This strategy has been shown to improve the accuracy of a forensic analysis by about 70%, but performance is still not as good as analysis under matched conditions.