The term ‘soundscape’ is widely used to describe the sonic landscape and can be considered the auditory equivalent of a visual landscape. Current soundscape research looks into the view of sound assessment in terms of perception and has been the subject of large scale projects such as the Positive Soundscapes Project (Davies et al. 2009) i.e. the emotional attributes associated with particular sounds. This research addresses the limitations of current noise assessment methods by taking into account the relationship between the acoustic environment and the emotional responses and behavioural characteristics of people living within it. Related research suggests that a variety of objective and subjective factors influence the effects of exposure to noise, including age, locale, cross-cultural differences (Guyot at el. 2005) and the time of year (Yang and Kang, 2005). A key aspect of this research area is the subjective effect of the soundscape on the listener. This paradigm emphasises the subjective perception of sound in an environment – and whether it is perceived as being positive or negative. This approach dovetails with advancing sound and music classification research which aims to categorise sounds in terms of their emotional impact on the listener.

Annoyance is one of the main factors which contribute to a negative view of environmental noise, and can lead to stress-related health conditions. Subjective perception of environmental sounds is dependent upon a variety of factors related to the sound, the geographical location and the listener. Noise maps used to communicate information to the public about environmental noise in a given geographic location are based on simple noise level measurements, and do not include any information regarding how perceptually annoying or otherwise the noise might be.

Selected locations for recording - image courtesy of Scottish Noise Mapping
Figure 1 Selected locations for recording – image courtesy of Scottish Noise Mapping

This study involved subjective assessment by a large panel of listeners (N=167) of a corpus of sixty pre-recorded urban soundscapes collected from a variety of locations around Glasgow City Centre (see figure 1). Binaural recordings were taken at three points during each 24 hour period in order to capture urban noise during day, evening and night. Perceived annoyance was measured using Likert and numerical scales and each soundscape measured in terms of arousal and positive/negative valence (see figure 2).

Figure 2 Arousal/Valance Circumplex Model Presented in Listening Tests

Coding of each of the soundscapes would be essential process in order to test the effects of the location on the variables provided by the online survey namely annoyance score (verbal), annoyance score (numeric), quadrant score, arousal score, and valence score. The coding was based on the environment i.e. urban (U), semi-open (S), or open (O); the density of traffic i.e. high (H), mid (M), low (L); and the distance form the main noise source (road traffic) using two criteria >10m (10+) and <10m (10-). The coding resulted in eight different location types; UH10-, UH10+, UM10+, UL10-, SM10+, SL10-, SL10+, and OL10+.

To capture quantitative information about the actual audio recordings themselves, the MIRToolkit for MATLAB was used to extract acoustical features from the dataset. Several functions were identified that could be meaningful for measuring the soundscapes in terms of loudness, spectral shape, but also rhythm, which could be thought of in not so musical terms but as the rate and distribution of events within a soundscape.

As expected, correlations between extracted features and locations suggest where there are many transient events, higher energy levels, and where the type of events include harsh and dissonant sounds i.e. heavy traffic, resulted in higher annoyance scores and higher arousal scores but perceived more negatively than quiet areas. In those locations where there are fewer transient events, lower energy levels, and there are less harsh and possibly more positive sounds i.e. birdsong, resulted in lower annoyance scores and lower arousal scores as well as being perceived more positively than busy urban areas. The results shed light on the subjective annoyance of environmental sound in a range of locations and provide the reader with an insight as to what psychoacoustic features may contribute to these views of urban soundscapes.


When Motorola’s vice president, Martin Cooper, made his first call from a mobile phone device, which priced about four thousand dollars back in 1983, one could not have imagined then that in just a few decades mobile phones would become a crucial and ubiquitous part of everyday life. Not surprisingly this technology is also being increasingly misused by the criminal fraternity to coordinate their activities, which range from threatening calls, to ransoms and even bank frauds and robberies.

Recordings of mobile phone conversations can sometimes be presented as major pieces of evidence in a court of law. However, identifying a criminal by their voice is not a straight forward task and poses many challenges. Unlike DNA and finger prints, an individual’s voice is far from constant and exhibits changes as a result of a wide range of factors. For example, the health condition of a person can substantially change his/her voice, and as a result the same words spoken on one occasion would sound different on another.

The process of comparing voice samples and then presenting the outcome to a court of law is technically known as forensic voice comparison. This process begins by extracting a set of features from the available speech recordings of an offender, whose identity obviously is unknown, in order to capture information that is unique to their voice. These features are then compared using various procedures with those of the suspect charged with the offence.

One approach that is becoming widely accepted nowadays amongst forensic scientists for undertaking forensic voice comparison is known as the likelihood ratio framework. The likelihood ratio addresses two different hypotheses and estimates their associated probabilities. First is the prosecution hypothesis which states that suspect and offender voice samples have the same origin (i.e., suspect committed the crime). Second is the defense hypothesis that states that the compared voice samples were spoken by different people who just happen to sound similar.

When undertaking this task of comparing voice samples, forensic practitioners might erroneously assume that mobile phone recordings can all be treated in the same way, irrespective of which mobile phone network they originated from. But this is not the case. There are two major mobile phone technologies currently in use today: the Global System for Mobile Communications (GSM) and Code Division Multiple Access (CDMA), and these two technologies are fundamentally different in the way they process speech. One difference, for example, is that the CDMA network incorporates a procedure for reducing the effect of background noise picked up by the sending-end mobile microphone, whereas the GSM network does not. Therefore, the impact of these networks on voice samples is going to be different, which in turn will impact the accuracy of any forensic analysis undertaken.

Having two mobile phone recordings, one for the suspect and another for the offender that originate from different networks represent a typical scenario in forensic case work. This situation is normally referred to as a mismatched condition (see Figure 1). Researchers at the University of Auckland, New Zealand, have conducted a number of experiments to investigate in what ways and to what extent such mismatch conditions can impact the accuracy and precision of a forensic voice comparison. This study used speech samples from 130 speakers, where the voice of each speaker had been recorded on three occasions, separated by one month intervals. This was important in order to account for the variability in a person’s voice which naturally occurs from one occasion to another. In these experiments the suspect and offender speech samples were processed using the same speech codecs as used in the GSM and CDMA networks. Mobile phone networks use these codecs to compress speech in order to minimize the amount of data required for each call. Not only this, the speech codec dynamically interacts with the network and changes its operation in response to changes occurring in the network. The codecs in these experiments were set to operate in a manner similar to what happens in a real, dynamically changing, mobile phone network.

mobile phone

Typical scenario in a forensic case work

The results suggest that the degradation in the accuracy of a forensic analysis under mismatch conditions can be very significant (as high as 150%). Surprisingly, though, these results also suggest that the precision of a forensic analysis might actually improve. Nonetheless, precise but inaccurate results are clearly undesirable. The researchers have proposed a strategy for lessening the impact of mismatch by passing the suspect’s speech samples through the same speech codec as the offender’s (i.e., either GSM or CDMA) prior to forensic analysis. This strategy has been shown to improve the accuracy of a forensic analysis by about 70%, but performance is still not as good as analysis under matched conditions.

It is 5 o’clock in the morning and only a hint of sunlight is visible on the horizon. Besides the sound of a light breeze swirling through the grass, all is quiet on the Nebraska prairie. Everything seems to be asleep. Then, suddenly, “whhooo-doo-doooohh” breaks the silence. The prairie-chickens have arrived.

The Greater Prairie-Chicken is a medium-sized grouse that lives on the prairies of central North America (Figure 1a) (Schroeder and Robb 1993). Prairie-chickens are well-known for their breeding activities in which the males congregate in groups each spring and perform elaborate courtship displays to attract females (Figure 1b). The areas where the males gather, called “leks,” are distributed across the landscape. Female prairie-chickens visit leks every morning to observe and compare males until a suitable one is chosen. After mating, females leave the leks to nest and raise their broods on their own, while the males remain on the leks and continue to perform courtship displays. Click the link to watch a video clip of prairie chickens lekking.

Prairie-Chicken Prairie-Chicken

Figure 1a: A male Greater Prairie-Chicken. Figure 1b: A male prairie-chicken performs a courtship display for a female.

These complex courtship behaviors do not occur in silence. Vocalization plays an important role in the mate choice behavior of prairie-chickens. As part of a larger study addressing the effects of electricity producing wind turbine farms on prairie-chicken ecology, we wanted to learn more about the acoustic properties of prairie-chicken calls. We did this by recording the sound of prairie-chicken vocalizations at leks in the Nebraska Sandhills. We visited the leks in the very early morning and set up audio recorders, which were placed close enough to prairie-chickens on their leks to obtain high quality recordings (Figure 2a). Sitting in a blind at the edges of leks (Figure 2b), we observed prairie-chickens while they were lekking and collected the audio recordings.

Prairie-Chickenwhalen_figure_2b - Prairie-Chicken

Figure 2a: We used audio recorders to record male prairie-chicken vocalizations at the leks. Figure 2b: We observed lekking prairie-chickens and recorded vocalizations by sitting in a blind at the edge of a lek.

Male Greater Prairie-Chickens use four prominent vocalizations while on the leks: the “boom,” “cackle,” “whine” and “whoop.” The four vocalizations are distinct and serve different purposes.

The boom is used as part of the courtship display, so one function is to attract mates. Booms travel a long distance across the prairie, so another purpose of the call is to advertise lek location to other prairie-chickens (Sparling 1981, 1983). Click to listen to a boom sound clip

or to watch a boom video clip we recorded at the leks.

The “cackles” are short calls typically given in rapid succession. Prairie-chickens use the cackle as an aggressive or territorial call (Sparling 1981, 1983) or as a warning to alert other prairie-chickens of potential danger, such as an approaching prairie falcon, coyote or other predator. Click to listen to a cackle sound clip.

The “whine” is slightly longer in duration than the cackle; whines and cackles are often used together. The purpose of the whine is similar to that of the cackle. It serves as an aggressive and territorial call, although it is thought that whines are somewhat less aggressive than cackles (Sparling 1981, 1983). Click to listen to a whine sound clip

or to watch a video clip of cackles and whines (the cackles are the shorter notes and the whines are the longer notes).

The “whoop” is used for mate attraction. Males typically use the whoop when females are present on the lek (Sparling 1981, 1983). Click to listen to a whoop sound clip

or to watch a whoop video clip.

We measured acoustic characteristics of the vocalizations captured on the recordings so we could evaluate their features in detail. We are using this information about the vocalizations in a study of the effects of wind turbine sound on Greater Prairie-Chickens (Figure 3). We hope to determine whether the vocalizations produced by prairie-chickens near a wind farm are different in any way from those produced by prairie-chickens farther away. For example, do the prairie chickens near wind turbines call at a higher pitch in response to wind turbine sound? Also, do the prairie chickens near wind turbines vocalize louder? Ultimately we would like to know if components of the prairie-chickens’ vocalizations are masked by the sounds of the wind turbines.


Figure 3: We are conducting a study of the effects of wind turbine noise on Greater Prairie-Chickens.

The effect of anthropogenic noise is an issue not limited to Greater Prairie-Chickens and wind turbines. As humans create increasingly noisy landscapes through residential and industrial development, vehicle traffic, air traffic and urban sprawl, the threats posed to birds and other wildlife are likely to be significant. It is important to be aware of the potential effects of anthropogenic sound and find ways to mitigate those effects as landscapes become noisier.



Investigations into the benefits of green roofs have shown that such roofs provide many environmental benefits, such as thermal conditioning, air cleaning and rain water absorption. Analysing the way green roofs are usually constructed suggests that they may have also two interesting acoustical properties: sound insulation and sound absorption. The first property would provide protection of the house’s interior from environmental noise produced outside the house. Sound absorption, on the other hand, would reduce the environmental noise in the environment itself, by dissipating sound energy that is being irradiated on to the roof from environmental noise sources. Thus, sound absorption can help to reduce environmental noise in urban settings. Despite of being an interesting characteristic, information regarding acoustic properties of green roofs and their effects on the noise environment is still sparse. This work looked into the sound absorption of two types of green roofs commercially available in Brazil: the alveolar and the hexa system.

Fig 1: illustration of the alveolar system (left) and hexa system (right)

Sound absorption can be quantified by means of a sound absorption coefficient α, which ranges between 0 and 1 and is usually a function of frequency. Zero means that all incident energy is being reflected back into the environment and α = 1 means that all energy is being dissipated in the layers of the material, here the green roof. To find out how much sound energy the alveolar and the hexa system absorb standardized measurements were made in a reverberant chamber according to ISO-354 for different variations of both systems. The alveolar system used a thin layer of 2.5 cm of soil like substrate with and without grass and a 4 cm layer of substrate only. The hexa system was measured with layers of 4 and 6 cm of substrate without vegetation and 6 cm of substrate with a layer of vegetation of sedum. For all systems, high absorption coefficients (α > 0.7) were found for medium and high frequencies. This was expected due to the highly porous structure of the substrate. Nevertheless the alveolar system with grass, the alveolar system with 4 cm of substrate, the hexa with 6 cm of substrate and the hexa with sedum already provide high absorption for frequencies as low as 250 or 400 Hz. Thus, these green roofs systems are particularly interesting in urban settings, as traffic noise is usually low frequency noise and is hardly absorbed by smooth surfaces such as pavements or façades.

absorbtion coefficient

Fig 2: absorption coefficient of the alveolar samples (left) and hexa samples (right).

In the next step of this research is intended to make computational simulations of the noise reduction provided by the hexa and alveolar system in different noisy situations such as near airports or intense urban traffic.

Real-world speech understanding in naturally “crowded” auditory soundscapes is a complex operation that acts upon an integrated speech-plus-noise signal.   Does all of the auditory “clutter” that surrounds speech make its way into our heads along with the speech? Or, do we perceptually isolate and discard background noise at an early stage of processing based on general acoustic properties that differentiate sounds from non-speech noise sources and those from human vocal tracts (i.e. speech)?

We addressed these questions by first examining the ability to tune into speech while simultaneously tuning out noise. Is this ability influenced by properties of the listener (their experience-dependent knowledge) as well as by properties of the signal (factors that make it more or less difficult to separate a given target from a given masker)? Listeners were presented with English sentences in a background of competing speech that was either English (matched-language, English-in-English recognition) or another language (mismatched-language, e.g. English-in-Mandarin recognition). Listeners were either native or non-native listeners of English and were either familiar or unfamiliar with the language of the to-be-ignored, background speech (English, Mandarin, Dutch, or Croatian). Overall, we found that matched-language speech-in-speech understanding (English-in-English) is significantly harder than mismatched-language speech-in-speech understanding (e.g. English-in-Mandarin). Importantly, listener familiarity with the background language modulated the magnitude of the mismatched-language benefit On a smaller time scale of experience, we also find that this benefit is modulated by short-term adaptation to a consistent background language within a test session. Thus, we conclude that speech understanding in conditions that involve competing background speech engages experience-dependent knowledge in addition to signal-dependent processes of auditory stream segregation.

Experiment Series 2 then asked if listeners’ memory traces for spoken words with concurrent background noise remain associated in memory with the background noise. Listeners were presented with a list of spoken words and for each word they were asked to indicate if the word was “old” (i.e. had occurred previously in the test session) or “new” (i.e. had not been presented over the course of the experiment). All words were presented with concurrent noise that was either aperiodic in a limited frequency band (i.e. like wind in the trees) or a pure tone. Importantly, both types of noise were clearly from a sound source that was very different from the speech source. In general, words were more likely to be correctly recognized as previously-heard if the noise on the second presentation matched the noise on the first presentation (e.g. pure tone on both first and second presentations of the word). This suggests that the memory trace for spoken words that have been presented in noisy backgrounds includes an association with the specific concurrent noise. That is, even sounds that quite clearly emanate from an entirely different source remain integrated with the cognitive representation of speech rather than being permanently discarded during speech processing.

These findings suggest that real-world speech understanding in naturally “crowded” auditory soundscapes involves an integrated speech-plus-noise signal at various stages of processing and representation. All of the auditory “clutter” that surrounds speech somehow makes its way into our heads along with the speech leaving us with exquisitely detailed auditory memories from which we build rich representations of our unique experiences.

Important note: The work in this presentation was conducted in a highly collaborative laboratory at Northwestern University. Critical contributors to this work are former group members Susanne Brouwer (now at Utrecht University, Netherlands), Lauren Calandruccio (now at UNC-Chapel Hill), and Kristin Van Engen (now at Washington University, St. Louis), and current group member, Angela Cooper.