1pMU4 – Flow Visualization and Aerosols in Performance

Abhishek Kumar – abku6744@colorado.edu
Tehya Stockman – test7645@colorado.edu
Jean Hertzberg – jean.hertzberg@colorado.edu

University of Colorado Boulder
1111 Engineering Drive
Boulder, CO 80309

Popular version of 1pMU4 – Flow visualization and aerosols in performance
Presented Monday afternoon, May 23, 2022
182nd ASA Meeting in Denver, Colorado
Click here to read the abstract

Outbreaks from choir performances, such as the Skagit Valley Choir, showed that singing brought potential risk of COVID-19 infection. The risks of airborne infection from other musical performances, such as playing wind instruments or performing theater are less known. In addition, it is important to understand methods that can be used to reduce infection risk. In this study, we used a variety of methods, including flow visualization, aerosol and CO2 measurements to understand the different components that can lead to transmission risk from musical performance and risk mitigation. We have tested eight musical instruments, both brass and woodwinds, and also singing, with and without a mask/bell cover.

We started with the flow visualization of exhalations (from singers and voice actors) and resultant jets (from musical instruments) using (a) the schlieren method, and, (b) imaging with a laser sheet in a room filled with stage fog. These visualization tools helped identify the spatial location with maximum airflow (i.e. velocities) for aerosol and CO2 measurements, and showed the structure of the flows.

 

Figure 1: Schlieren method – proof of concept, opera singer. Courtesy: Flowvis.org

Figure 2: Laser sheet imaging – proof of concept, oboe. Courtesy: Flowvis.org

Our flow visualization velocity estimates indicated that using a barrier, such as a mask or a bell cover significantly reduced axial (exhaust direction) velocities. Keep in mind the jets observed using either method have the same composition as human exhalation, i.e. N2, O2, CO2, and trace gases.

Figure 3: Maximum measured axial velocities, with and without cover/mask Courtesy: Flowvis.org

We measured exhaled/exhausted CO2 and aerosol particles from the musicians. Our results indicate that aerosol spikes can be expected when there is a spike in CO2 measurements.

aerosols

Figure 4: Combined Aerosol and CO2 time series for singing. Courtesy: Tehya Stockman

aerosols

Figure 5: Aerosol data for performance with and without a mask/cover. Courtesy: Tehya Stockman

These results show that masks on instruments and singers while performing significantly decreases the amount of aerosols measured, thus providing one effective solution to reducing the risk of viral airborne transmission through aerosols. Musicians reported small differences in how the instruments felt, but very little difference in how they sounded.

4aSC3 – Talkers prepare their lips before audibly speaking – Is this the same thing as coarticulated speech?

Peter A. Krause – peter.krause066@csuci.edu
CSU Channel Islands
One University Dr.
Camarillo, CA 93012

Popular version of 4aSC3 – Understanding anticipatory speech postures: does coarticulation extend outside the acoustic utterance?
Presented 9:45 Thursday Morning, May 26, 2022
182nd ASA Meeting
Click here to read the abstract

A speech sound like /s/ not fixed. The sound at the beginning of “soon” is not identical to the sound at the beginning of “seen.” We call this contextual variability coarticulation.

Verbal recording of “soon” and “seen.” Listen closely to the subtle differences in the initial /s/ sound.

A spectrogram of the same recording of “soon” and “seen.” Note how the /s/ sounds have a slightly different distribution of intensity over the frequency range shown.

Some theoretical models explain coarticulation by assuming that talkers retrieve slightly different versions of their /s/ sound from memory, depending on the sound to follow. Others emphasize that articulatory actions overlap in time, rather than keeping to a regimented sequence: Talkers usually start rounding their lips for the /u/ (“oo”) sound in “soon” while still making the hissy sound of the /s/, instead of waiting for the voiced part of the vowel. (See picture below.) But even overlapping action accounts of coarticulation disagree on how “baked in” the coarticulation is. Does the dictionary in your head tell you that the word “soon” is produced with a rounded /s/? Or is there a more general, flexible process whereby if you know an /u/ sound is coming up, you round your lips if able?

coarticulation

A depiction of the same speaker’s lips when producing the /s/ sound in “seen” (top) and the /s/ sound in “soon” (bottom). Note that in the latter case, the lips are already rounded.

If the latter, it is reasonable to ask whether coarticulation only happens during the audible portions of speech. My work suggests that the answer is no! For example, I have shown that during word-reading tasks, talkers tend to pre-round their lips a bit if they have been led to believe that an upcoming (but not yet seen) word will include an /u/ sound. This effect goes away if the word is equally likely to have an /u/ sound or an /i/ (“ee”) sound. More recently, I have shown that talkers awaiting their turn in natural, bi-directional conversation anticipate their upcoming utterance with their lips, by shaping them in sound-specific ways. (At least, they do so when preparing very short phrases like “yeah” or “okay.” For longer phrases, this effect disappears, which remains an interesting mystery.) Nevertheless, talkers apparently “lean forward” into their speech actions some of the time. In my talk I will argue that much of what we call “coarticulation” may be a special case of a more general pattern relating speech planning to articulatory action. In fact, it may reflect processes generally at work in all human action planning.

coarticulation

Plots of lip area taken from my recent study of bi-directional conversation. Plots trace backward in time from the moment at which audible speech began (Latency 0). “Labially constrained” utterances are those requiring shrunken-down lips, like those starting with /p/ or having an early /u/ sound. Note that for short phrases, lip areas are partially set several seconds before audible speech begins.

2aSC2 – Identifying race from speech

Yolanda Holt1 -holty@ecu.edu
Tessa Bent2 -tbent@indiana.edu

1East Carolina University    2 Indiana University
600 Moye Boulevard           2631 East Discovery Parkway
Greenville, NC 27834            Bloomington, IN 47408

Popular version of 2aSC2 – Socio-ethnic expectations of race in speech perception
Presented Tuesday morning May 24, 2022
182nd ASA Meeting
Click here to read the abstract

Did I really have you at Hello?! Listening to a person we don’t see, we make mental judgments about the speaker such as their age, presenting sex (man or woman), regional dialect and sometimes their race. At times, we can accurately categorize the person from hearing just a single word, like hello. We wanted to know if listeners from the same community could listen to single words and accurately categorize the race of a speaker better than listeners from far away. We also wanted to know if regional dialect differences between Black and white speakers would interfere with accurate race identification.

In this listening experiment people from North Carolina and Indiana heard single words produced by 24 Black and white talkers from two communities in North Carolina. Both Black and white people living in the western North Carolina community near the Blue Ridge mountains are participating in the sound change event Southern Vowel Shift.

It is thought the Southern Vowel Shift makes the vowels in the word pairs heed and hid sound alike; and the vowels in the word pairs heyd and head sound alike. It is also thought that many white Southern American English speakers produce the vowel in the word whod with rounded lips.

In the eastern community near the Atlantic coast of North Carolina the Southern American English speakers don’t produce the vowels in the word pair heed and hid alike and neither do the vowels in the word pair heyd and head sound alike.  In this community it is also expected that many white Americans produce the vowel in the word whod with rounded lips.

Black and white men and women listeners from Indiana and North Carolina heard recordings of the eastern and western talkers saying the words heed, hid, heyd, head, had, hood, whod, hid, howed, hoyd in random order a total of 480 times.

 

The North Carolina listeners, as expected, completed the race categorization task with greater accuracy than Indiana listeners. Both listener groups categorized east and west white and east Black with around 80% accuracy. West Black talkers, participating in Southern sound change event, were the most difficult to categorize. They were identified at just above 55% accuracy.

We interpret the results to suggest when a talker’s speech does not meet the listener’s expectation it is difficult for listeners to categorize the race of the speaker.

In this experiment the white talkers from both communities were expected to produce the vowel in whod in a manner similar to each other. In contrast the west Black talkers were expected to produce several vowels, heed, hid, heyd, and head similar to their west white peers and differently than the east Black talkers. We thought this difference would make it difficult for listeners to accurately categorize the race of the west Black talkers by their speech alone. The results suggest listener accuracy in race identification is decreased when the speech produced doesn’t meet the listener’s mental expectations of what a talker should sound like.

Answer key to sound file (bb bb bb ww ww ww bb bb bb ww ww ww)

4aBA13 – In-vivo assessment of lymph nodes using quantitative ultrasound on a clinical scanner: a preliminary study

Cameron Hoerig, Ph.D., cah4016@med.cornell.edu
Weill Cornell Medicine
Department of Radiology
416 E 55th St., MR-007
New York, NY 10022

Popular version of 4aBA13 – In vivo assessment of lymph nodes using quantitative ultrasound on a clinical scanner: A preliminary study
Presented Thursday morning, May 26, 2022
182nd ASA Meeting, Denver
Click here to read the abstract

Cancer can spread through the body via the lymphatic system. When a primary tumor is found in a patient, biopsies may be performed on one or more nearby lymph nodes (LNs) to look for evidence of cancerous cells and aid in disease staging and treatment planning. LN biopsies typically involve first removing the node, slicing it into very thin sections (thinner than a human hair), and staining the sections. Next, a pathologist views these sections under a microscope to look for abnormal cells. Because the tissue sections are so thin and the node is comparatively large, it is infeasible for a pathologist to look at every slide for each LN. Consequently, small clumps of cancerous cells may be missed. Similarly, biopsies performed via fine needle aspiration (FNA) – wherein a very thin needle is used to extract very small tissue samples throughout a LN while it is still in the body – also comes with the risk of missing cancerous cells. As an example, the false-negative rate for biopsies on axillary lymph nodes is as high as 10%!

In this work, we are using an ultrasonography technique called quantitative ultrasound (QUS) to assess LNs in vivo and determine if metastatic cells are present without the need for biopsy. Different tissue types scatter the ultrasound wave in different ways. However, the processing that typically occurs in clinical scanners strips this information away before displaying conventional B-mode images. Examples of B-mode images from benign and metastatic lymph nodes are displayed in Fig. 1 along with optical microscopy pictures of corresponding FNA results. The microscopy images show a clear contrast in the microstructure between normal and cancerous cells that is not invisible in the ultrasound B-mode images.

ultrasound

(Left column) B-mode images of metastatic and benign lymph nodes. (Right column) The corresponding optical microscopy images of stained tissue samples from FNA biopsy show the difference in tissue microstructure between benign and metastatic lymph nodes.

QUS methods extract information from the ultrasonic signal before the typical image processing steps to make inferences about tissue microstructure. Theoretically, these methods are independent of the scanner and operator, meaning the same information can be obtained by any sonographer using any scanner and the information obtained depends only on the underlying tissue microstructure. QUS methods used in this study glean information about the scatterer diameter, effective acoustic concentration, and scatterer organization (randomly positioned vs organized).

ultrasound

Left and middle columns are representative color overlays of scatterer diameter and acoustic concentration from QUS processing. The right column is the resulting classification from the trained LDA.

We have thus far collected data on 16 LNs from 15 cancer patients with a known primary tumor. The same clinical GE Logiq E9 scanner was used to collect ultrasound echo data for QUS processing and for ultrasound-guided FNA. Metastatic status of the LNs was determined from the FNA results. QUS methods were applied to the US images to obtain a total of 9 parameters. From these, we determined scatterer diameter and effective acoustic concentration were most effective at differentiating benign and metastatic nodes. Using these two parameters as input to a linear discriminant analysis (LDA) – a type of machine learning algorithm – we correctly classified 95% of US images as containing a benign or metastatic LN. Examples of QUS parameter maps overlayed on B-mode images, and the resulting classification by LDA, are provided in Fig. 2. The associated ROC plot had an area under the ROC curve of 0.90, showing excellent ability of LDA to identify metastatic nodes from only two QUS parameters. These preliminary results demonstrate the feasibility of characterizing LNs in vivo at conventional frequencies using a clinical scanner, potentially offering a means to complement US-FNA practice and reduce unnecessary LN biopsies.

3aSC4 – Effects of two-talker child speech on novel word learning in preschool-age children

Tina M. Grieco-Calub, tina_griecocalub@rush.edu
Rush University Medical Center
Rush NeuroBehavioral Center

Popular version of 3aSC4 – Effects of two-talker child speech on novel word learning in preschool-age children
Presented Wednesday morning, May 25, 2022
182nd ASA Meeting in Denver, Colorado
Click here to read the abstract

One of the most important tasks for children during preschool and kindergarten is building vocabulary knowledge. This vocabulary is the foundation upon which later academic knowledge and reading skills are built. Children acquire new words through exposure to speech by other people including their parents, teachers, and friends. However, this exposure does not occur in a vacuum. Rather, these interactions often occur in situations where there are other competing sounds, including other people talking or environmental noise. Think back to a time when you tried to have a conversation with someone in a busy restaurant with multiple other conversations happening around you. It can be difficult to focus on the conversation of interest and ignore the other conversations in noisy settings.

Now, think about how a preschool- or kindergarten-aged child might navigate a similar situation, such as a noisy classroom. This child has less mature language and cognitive skills compared to you. Therefore, they have a harder time ignoring those irrelevant conversations to process what the teacher says. Also, children in classrooms must hear and understand the words they know and learn new words. Children who have a hard time ignoring the background noise can have a particularly hard time building essential vocabulary knowledge in classroom settings.

In this study, we are testing the extent to which background speech like what might occur in a preschool classroom influences word learning in preschool- and kindergarten-aged children. We are testing children’s ability to learn and remember unfamiliar words either in quiet and in a noise condition when two other children are talking in the background. In the noise condition, the volume of the teacher is slightly louder than the background talkers, like what a child would experience in a classroom. During the word learning task, children are first shown unfamiliar objects and are asked to repeat their names (e.g., This is a topin. You say topin; see attached movie clip). Children then receive training on the objects and their names. After training, children are asked to name each object. Children’s performance is quantified by how close their production of the object’s name is to the actual name. For example, a child might call the “topin” a “dobin”. Preliminary results suggest that children in quiet and in noise are fairly accurate at repeating the unfamiliar words:    they can focus on the teacher’s speech and repeat all the sounds of the word immediately regardless of condition. Children can also learn the words in both quiet and noise. However, children’s spoken productions of the words are less accurate when they are trained in noise than in quiet. These findings tentatively suggest that when there is background noise, children need more training to learn the precise sounds of words. We will be addressing this issue in future iterations of this study.