Fearless Steps: Taking the Next Step towards Advanced Speech Technology for Naturalistic Audio

John H. L. Hansen – john.hansen@utdallas.edu
Aditya Joglekar – aditya.joglekar@utdallas.edu
Meena Chandra Shekar – meena.chandrashekar@utdallas.edu
Abhijeet Sangwan – abhijeet.sangwan@utdallas.edu
CRSS: Center for Robust Speech Systems;
University of Texas at Dallas (UTDallas),
Richardson, TX – 75080, USA

Popular version of paper 3pSC10 “Fearless Steps: Taking the Next Step towards Advanced Speech Technology for Naturalistic Audio”
To be Presented between, Dec 2-6, 2019,
178th ASA Meeting, San Diego

J.H.L. Hansen, A. Joglekar, A. Sangwan, L. Kaushik, C. Yu, M.M.C. Shekhar, “Fearless Steps: Taking the Next Step towards Advanced Speech Technology for Naturalistic Audio,” 178th Acoustical Society of America, Session: 3pSC12, (Wed., 1:00pm-4:00pm; Dec. 4, 2019), San Diego, CA, Dec. 2-6, 2019.

NASA’s Apollo program represents one of the greatest achievements of humankind in the 20th century. During a span of 4 years (from 1968 to 1972), nine lunar missions were launched with 12 astronauts who walked on the surface of the moon. To monitor and assess this massive team challenge, all communications between NASA personnel and astronauts were recorded on 30-track 1-inch analog audio tapes. NASA recorded this in order to be able to review and determine best practices to improve success in subsequent Apollo missions. This analog audio collection essentially was set aside when the Apollo program was completed with Apollo-17, and all tapes stored in NASA’s tape archive. Clearly there are opportunities for research on this audio for both technology and historical purposes. The entire Apollo mission consists of well over ~150,000 hours. Through the Fearless Steps initiative, CRSS-UTDallas digitized 19,000 hours of audio data from Apollo missions: A-1, A-11 and A-13. The focus of this current effort is to contribute to the development of Spoken Language Technology based algorithms to analyze and understand various aspects of conversational speech. To achieve this goal, a new 30-track analog audio decoder was designed using NASA Soundscriber.

Figure 1: (left): The SoundScriber device used to decode 30 track analog tapes, and (right): The UTD-CRSS designed read-head decoder retrofitted to the SoundScriber [5]

To develop efficient speech technologies towards analyzing conversations and interactions, multiple sources of data such as interviews, flight journals, debriefs, and other text sources along with videos were used [8, 12, 13]. This initial research direction allowed CRSS-UTDallas to develop document linking and web application called ‘Explore Apollo’ wherein a user can access certain moments/stories in the Apollo-11 mission. Tools such as the exploreapollo.org enable us to display our findings in an interactive manner [10, 14]. A case in point is to illustrate team communication dynamics via a chord diagram. This diagram (Figure 2 (right)) illustrates the amount of conversation each astronaut has with each other during the mission, and the communication interactions with the capsule communicator (only personnel directly communicating with the astronauts). Analyses such as these provide an interesting insight into the interaction dynamics for astronauts in deep space.

Figure 2: (left): Explore Apollo Online Platform [14] and (right): Chord Diagram of Astronauts’ Conversations [9]

With a massive aggregated data, CRSS-UTDallas sought to explore the problem of automatic speech understanding using algorithmic strategies to answer the questions: (1) when were they talking; (2) who was talking; (3) what was being said; and (4) how were they talking. These questions formulated in technical terminologies are represented as the following tasks: (1) Speech Activity Detection [5], (2) Speaker Identification, (3) Automatic Speech Recognition and Speaker Diarization [6], (4) Sentiment and Topic Understanding [7].

The general task of recognizing what was being said at what time is called the “Diarization pipeline”. In an effort to answer these questions, CRSS-UTDallas developed solutions for automated diarization and transcript generation using Deep Learning strategies for speech recognition along with Apollo mission specific language models [9]. We further developed algorithms which would help answer the other questions including detecting speech activity, and speaker identity for segments of the corpus [6, 8].

Figure 3: Illustration of the Apollo Transcripts using the Transcriber tool

These transcripts allow us to search through the 19,000 hours of data to find keywords, phrases, or any other points on interest in a matter of seconds as opposed to listening to the audio for hours to search for the answers [10, 11]. The transcripts along with the complete Apollo-11 and Apollo-13 corpora are now freely available on the website fearlesssteps.exploreapollo.org

Audio: Air-to-ground communication during the Apollo-11 Mission

Figure 4: The Fearless Steps Challenge website

Phase one of the Fearless Steps Challenge [15] involved performing five challenge tasks on 100 hours of time and mission critical audio out of the 19,000 hours of the Apollo 11 mission. The five challenge tasks are:

Speech Activity Detection
Speaker Identification
Automatic Speech Recognition
Speaker Diarization
Sentiment detection.

The goal of this Challenge was to evaluate the performance of state-of-the-art speech and language systems for large task oriented teams with naturalistic audio in challenging environments. In the future, we aim to digitize all of the Apollo missions and make it publicly available.

A. Sangwan, L. Kaushik, C. Yu, J. H. L. Hansen and Douglas W. Oard. ”Houston, we have a solution: using NASA Apollo program to advance speech and language processing technology.” INTERSPEECH. 2013.
C. Yu, J. H. L. Hansen, and Douglas W. Oard. “`Houston, We Have a Solution’: A Case Study of the Analysis of Astronaut Speech During NASA Apollo 11 for Long-Term Speaker Modeling,” INTERSPEECH. 2014.
Douglas W. Oard, J. H. L. Hansen, A. Sangwan, B. Toth, L. Kaushik, and C. Yu. “Toward Access to Multi-Perspective Archival Spoken Word Content.” In Digital Libraries: Knowledge, Information, and Data in an Open Access Society, 10075:77–82. Cham: Springer International Publishing, 2016.
A. Ziaei, L. Kaushik, A. Sangwan, J. H.L. Hansen, & D. W. Oard, (2014). “Speech activity detection for NASA Apollo Space Missions: Challenges and Solutions.” (pp. 1544-1548) INTERSPEECH. 2013.
L. Kaushik, A. Sangwan, and J. H.L. Hansen. “Multi-Channel Apollo Mission Speech Transcripts Calibration,” 2799–2803. IINERSPEECH, 2017.
C. Yu and J. H. L. Hansen, “Active Learning Based Constrained Clustering For Speaker Diarization,” in IEEE/ACM Transactions on Audio, Speech, and Language Processing, vol. 25, no. 11, pp. 2188-2198, Nov. 2017. doi: 10.1109/TASLP.2017.2747097
L. Kaushik, A. Sangwan and J. H. L. Hansen, “Automatic Sentiment Detection in Naturalistic Audio,” in IEEE/ACM Transactions on Audio, Speech, and Language Processing, vol. 25, no. 8, pp. 1668-1679, Aug. 2017.
C. Yu, and J. H. L. Hansen. ”A study of voice production characteristics of astronaut speech during Apollo 11 for speaker modeling in space.” Journal of the Acoustic Society of America (JASA), 2017 Mar: 141(3):1605.
L. Kaushik. “Conversational Speech Understanding in highly Naturalistic Audio Streams” PhD Dissertation, The University of Texas at Dallas, 2017.
A. Joglekar, C. Yu, L. Kaushik, A. Sangwan, J. H. L. Hansen, “Fearless Steps Corpus: A Review Of The Audio Corpus For Apollo-11 Space Mission And Associated Challenge Tasks” In NASA Human Research Program Investigators’ Workshop (HRP), 2018.
L. Kaushik, A. Sangwan, J. H. L. Hansen, “Apollo Archive Explorer: An Online Tool To Explore And Study Space Missions” In NASA Human Research Program Investigators’ Workshop (HRP), 2017.
Apollo 11 Mission Overview: https://www.nasa.gov/-mission_pages/apollo/missions/apollo11.html
Apollo 11 Mission Reports: https://www.hq.nasa.gov/alsj/a11-/a11mr.html
Explore Apollo Document Linking Application: https://app.-exploreapollo.org/
Hansen, J. H., Joglekar, A., Shekar, M. C., Kothapally, V., Yu, C., Kaushik, L., & Sangwan, A. (2019). The 2019 inaugural fearless steps challenge: A giant leap for naturalistic audio. In Proc. Interspeech (Vol. 2019).

Additional News Releases from 2017-2019:
https://www.youtube.com/watch?v=CTJtRNMac0E&t
https://www.nasa.gov/feature/nasa-university-of-texas-at-dallas-reveal-apollo-11-behind-the-scenes-audio
https://www.foxnews.com/tech/engineers-restore-audio-recordings-from-apollo-11-mission
https://www.nbcnews.com/mach/science/nasa-releases-19-000-hours-audio-historic-apollo-11-mission-ncna903721
https://www.insidescience.org/news/thousands-hours-newly-released-audio-tell-backstage-story-apollo-11-moon-mission

3pSC10 – Fearless Steps: Taking the Next Step towards Advanced Speech Technology for Naturalistic Audio

Like this:

Keep reading!

3pSC10 – Fearless Steps: Taking the Next Step towards Advanced Speech Technology for Naturalistic Audio

Share this:

Like this:

Keep reading!

Search for papers by Acoustics Keyword