Mylan Cook – email@example.com
Eric Todd – firstname.lastname@example.org
Kent L. Gee – email@example.com
Mark K. Transtrum – firstname.lastname@example.org
Brigham Young University
Provo, UT 84602
David S. Woolworth – email@example.com
Roland, Woolworth & Associates, LLC
356 CR 102
Oxford, MS 38655
Popular version of paper 2aNSb7
Presented Tuesday morning, December 03, 2019
178th ASA Meeting, San Diego, CA
Audio processing is often used to deal with a single person’s voice, but how do things change when dealing with an entire crowd? While it is relatively easy for a person to judge whether a crowd is booing or cheering, teaching a computer to differentiate between different crowd responses is a challenging problem. Of particular interest herein is the challenge of determining when a crowd is making a concentrated, unified, or focused effort. This research has applications in rewarding crowds, sales, and riot prevention.
Previous work has gone into studying crowds at basketball games using Machine Learning techniques such as K-means clustering. Using spectral sound levels—the loudness at different frequencies—K-means automatically divides our sound samples into different groups, separating levels of crowd noise from levels of band noise or PA system noise.
Video 1 presents a graphical representation of some of these features, and how they fluctuate with time; the colors in the video show the different sub-groups found, and by examination, the purple sub-group is found to consist primarily of audio that demonstrate the most focused crowd effort.
The purpose of this investigation is to determine if a similar process can be followed to find focused crowd efforts in another type of crowd, namely that at a Mardi Gras parade, as recorded from a microphone mounted on a float. There are some challenges here, arising from differences in frequency between crowds and because the audio from the Mardi Gras crowd shows very little variation—essentially the crowd is cheering the entire time, and so changes in crowd behavior get buried in the crowd’s clamorous cacophony.
There is still something we can do, however. Within the high-involvement basketball data sub-group we find two audio features—flux, which is the change in energy over time, and slope, which marks how quickly the energy increases as frequency increases—with very large numerical values. By setting a threshold for these values, we can mark all the Mardi Gras data that exceeds these values as likely to exhibit focused crowd involvement.
Video 2 presents a graphical representation of the Mardi Gras parade crowd noise, where audio segments which surpass the threshold and so are likely to contain a concentrated crowd effort are shown in green, and all other segments are shown in red. While validation is ongoing, these results show promise for being able to automatically identify focused crowd involvement in different types of crowds.