Acoustic Cloaking Using the Principles of Active Noise Cancellation
Jordan Cheer – firstname.lastname@example.org Institute of Sound and Vibration Research
University of Southampton
Popular version of paper 4pEA7, “Cancellation, reproduction and cloaking using sound field control”
Presented Thursday morning, December 1, 2016
172nd ASA Meeting, Honolulu
Loudspeakers are synonymous with audio reproduction and are widely used to play sounds people want to hear. Loudspeakers have also been used for the opposite purpose, to attenuate noise that people may not want to hear. Active noise cancellation technology is an example of this, which combines loudspeakers, microphones and digital signal processing to adaptively control unwanted noise sources .
More recently, the scientific community has focused attention on controlling and manipulating sound fields to acoustically cloak objects, with the aim of rendering objects acoustically invisible. A new class of engineered materials called metamaterials have already demonstrated this ability . However, acoustic cloaking has also been demonstrated using methods based on both sound field reproduction and active noise cancellation . Despite its demonstration there has been limited research exploring the physical links between acoustic cloaking, active noise cancellation and sound field reproduction. Therefore, we began exploring these links with the aim of developing active acoustic cloaking systems that build on the advanced knowledge of implementing both audio reproduction and active noise cancellation systems.
Acoustic cloaking attempts to control the sound scattered from a solid object. Using a numerical computer simulation, we therefore investigated the physical limits on active acoustic cloaking in the presence of a rigid scattering sphere. The scattering sphere, shown in Figure 1, was surrounded by an array of sources (loudspeakers) used to control the sound field, shown by the black dots surrounding the sphere in the figure. In the first instance we investigated the effect of the scattering sphere on a simple sound field.
Looking at a horizontal slice through the simulated sound field without a scattering object, shown in the second figure, modifications by the presence of the scattering sphere are obvious in comparison to the same slice when the object is present, seen in third figure. Scattering from the sphere distorts the sound field, rendering it acoustically visible.
Figure 1 – The geometry of the rigid scattering sphere and the array of sources, or loudspeakers used to control the sound field (black dots).
Figure 2 – The sound field due to an acoustic plane wave in the free field (without scattering).
Figure 3 – The sound field produced when an acoustic plane wave is incident on the rigid scattering sphere.
To understand the physical limitations on controlling this sound field, and thus implementing an active acoustic cloak, we investigated the ability of the array of loudspeakers surrounding the scattering sphere to achieve acoustic cloaking . In comparison to active noise cancellation, rather than attempting to cancel the total sound field, we only attempted to control the scattered component of the sound field and thus render the sphere acoustically invisible.
With active acoustic cloaking, the sound field appears undisturbed, where the scattered component has been significantly attenuated and results in a field, shown in the fourth figure, that is indistinguishable from the object-less simulation of the Figure 2.
Figure 4 – The sound field produced when active acoustic cloaking is used to attempt to cancel the sound field scattered by a rigid scattering sphere and thus render the scattering sphere acoustically ‘invisible’.
Our results indicate active acoustic attenuation can be achieved using an array of loudspeakers surrounding a sphere that would otherwise scatter sound detectably. In this and related work, further investigations showed that the performance of active acoustic cloaking is most effective when the loudspeakers are in close proximity to the object being cloaked. This may lead to design concepts involving acoustic sources embedded in objects for acoustic cloaking or control of the scattered sound field.
Future work will attempt to demonstrate the performance of active acoustic cloaking experimentally and overcome significant challenges of not only controlling the scattered sound field, but detecting it using an array of microphones.
 P. Nelson and S. J. Elliott, Active Control of Sound, 436 (Academic Press, London) (1992).
 L. Zigoneanu, B.I. Popa, and S.A. Cummer, “Three-dimensional broadband omnidirectional acoustic ground cloak”. Nat. Mater, 13(4), 352-355, (2014).
 E. Friot and C. Bordier, “Real-time active suppression of scattered acoustic radiation”, J. Sound Vib., 278, 563–580 (2004).
 J. Cheer, “Active control of scattered acoustic fields: Cancellation, reproduction and cloaking”, J. Acoust. Soc. Am., 140 (3), 1502-1512 (2016).
Musical mind control: Human speech takes on characteristics of background music
Department of Linguistics, University of Canterbury
20 Kirkwood Avenue, Upper Riccarton
Christchurch, NZ, 8041
Popular version of paper 1aNS4, “Musical mind control: Acoustic convergence to background music in speech production.”
Presented Monday morning, November 28, 2016
172nd ASA Meeting, Honolulu
People often adjust their speech to resemble that of their conversation partners – a phenomenon known as speech convergence. Broadly defined, convergence describes automatic synchronization to some external source, much like running to the beat of music playing at the gym without intentionally choosing to do so. Through a variety of studies a general trend has emerged where we find people automatically synchronizing to various aspects of their environment 1,2,3. With specific regard to language use, convergence effects have also been observed in many linguistic domains such as sentence-formation4, word-formation 5, and vowel production6 (where differences in vowel production are well associated with perceived accentedness 7,8). This prevalence in linguistics raises many interesting questions about the extent to which speakers converge. This research uses a speech-in-noise paradigm to explore whether or not speakers also converge to non-linguistic signals in the environment: Specifically, will a speaker’s rhythm, pitch, or intensity (which is closely related to loudness) be influenced by fluctuations in background music such that the speech echoes specific characteristics of that background music (for example, if the tempo of background music slows down, will that influence those listening to unconsciously decrease their speech rate)?
In this experiment participants read passages aloud while hearing music through headphones. Background music was composed by the experimenter to be relatively stable with regard to pitch, tempo/rhythm, and intensity, so we could manipulate and test only one of these dimensions at a time, within each test-condition. We imposed these manipulations gradually and consistently toward a target, which can be seen in Figure 1, and would similarly return to the level at which they started after reaching that target. We played the participants music with no experimental changes in between all manipulated sessions. (Examples of what participants heard in headphones are available as sound-files 1 and 2]
Fig. 1 Using software designed for digital signal processing (analyzing and altering sound), manipulations were applied in a linear fashion (in a straight line) toward a target – this can be seen above as the blue line, which first rises and then falls. NOTE: After manipulations reach their target (the target is seen above as a dashed, vertical red line), the degree of manipulation would then return to the level at which it started in a similar linear fashion. Graphic captured while using Praat 9 to increase and then decrease the perceived loudness of the background music.
Data from 15 native speakers of New Zealand English were analyzed using statistical tests that allow effects to vary somewhat for each participant where we observed significant convergence in both the pitch and intensity conditions. Analysis of the Tempo condition, however, has not yet been conducted. Interestingly, these effects appear to differ systematically based on a person’s previous musical training. While non-musicians demonstrate the predicted effect and follow the manipulations, musicians appear to invert the effect and reliably alter aspects of their pitch and intensity in the opposite direction of the manipulation (see Figure 2). Sociolinguistic research indicates that under certain conditions speakers will emphasize characteristics of their speech to distinguish themselves socially from conversation partners or groups, as opposed to converging with them6. It seems plausible then that, given a relatively heightened ability to recognize low-level variations of sound, musicians may on some cognitive level be more aware of the variation in their sound environment, and as a result similarly resist the more typical effect. However, more work is required to better understand this phenomenon.
Fig. 2 The above plots measure pitch on the y-axis (up and down on the left edge), and indicate the portions of background music that have been manipulated on the x- axis (across the bottom). The blue lines show that speakers generally lower their pitch as an un-manipulated condition progresses. However the red lines show that when global pitch is lowered during a test-condition, such lowering is relatively more dramatic for non-musicians (left plot) and that the effect is reversed by those with musical training (right plot). NOTE: A follow-up model further accounts for the relatedness of Pitch and Intensity and shows much the same effect.
This work indicates that speakers are not only influenced by human speech partners in production, but also, to some degree, by noise within the immediate speech environment, which suggests that environmental noise may constantly be influencing certain aspects of our speech production in very specific and predictable ways. Human listeners are rather talented when it comes to recognizing subtle cues in speech 10, especially compared to computers and algorithms that can’t yet match this ability. Some language scientists argue these changes in speech occur to make understanding easier for those listening 11. That is why work like this is likely to resonate in both academia and the private sector, as a better understanding of how speech will change in different environments contributes to the development of more effective aids for the hearing impaired, as well as improvements to many devices used in global communications.
Sound-file 1. An example of what participants heard as a control condition (no experimental manipulation) in between test-conditions.
Sound-file 2. An example of what participants heard as a test condition (Pitch manipulation, which drops 200 cents/one full step).
1. Hill, A. R., Adams, J. M., Parker, B. E., & Rochester, D. F. (1988). Short-term entrainment of ventilation to the walking cycle in humans. Journal of Applied Physiology, 65(2), 570-578.
2. Will, U., & Berg, E. (2007). Brain wave synchronization and entrainment to periodic acoustic stimuli. Neuroscience letters, 424(1), 55-60.
3. McClintock, M. K. (1971). Menstrual synchrony and suppression. Nature, Vol 229, 244-245.
4. Branigan, H. P., Pickering, M. J., McLean, J. F., & Cleland, A. A. (2007). Syntactic alignment and participant role in dialogue. Cognition, 104(2), 163-197.
5. Beckner, C., Rácz, P., Hay, J., Brandstetter, J., & Bartneck, C. (2015). Participants Conform to Humans but Not to Humanoid
Robots in an English Past Tense Formation Task. Journal of Language and Social Psychology, 0261927X15584682.
Retreived from: http://jls.sagepub.com.ezproxy.canterbury.ac.nz/content/early/2015/05/06/0261927X15584682.
6. Babel, M. (2012). Evidence for phonetic and social selectivity in spontaneous phonetic imitation. Journal of Phonetics, 40(1), 177-189.
7. Major, R. C. (1987). English voiceless stop production by speakers of Brazilian Portuguese. Journal of Phonetics, 15, 197—
8. Rekart, D. M. (1985) Evaluation of foreign accent using synthetic speech. Ph.D. dissertation, the Lousiana State University.
9. Boersma, P., & Weenink, D. (2014). Praat: Doing phonetics by computer (Version 5.4.04) [Computer program]. Retrieved
10. Hay, J., Podlubny, R., Drager, K., & McAuliffe, M. (under review). Car-talk: Location-specific speech production and
11. Lane, H., & Tranel, B. (1971). The Lombard sign and the role of hearing in speech. Journal of Speech, Language, and Hearing Research, 14(4), 677-709.
Popular version of paper 1pMU4, “Optimal insertion timing of symbolic music to induce laughter in video content.”
Presented Monday afternoon, November 28, 2016
172nd ASA Meeting, Honolulu
In television variety shows or comedy programs various sound effects and music are combined with humorous scenes to induce more pronounced laughter from viewers or listeners . The aim of our study was to clarify the optimum insertion timing of symbolic music to induce laughter in video contents. Symbolic music is music that is associated with a special meaning such as something funny as a sort of “punch line” to emphasize their humorous nature.
Fig. 1 Sequence of video and audio tracks in the video editing timeline
We conducted a series of rating experiments to explore the best timing for insertion of such music into humorous video contents. We also examined the affects of audiovisual contents. The experimental stimuli were four short video contents, which were created by mixing the two video (V1 & V2) and four music clips (M1, M2, M3 & M4).
The rating experiments clarified that insertion timing of symbolic music contributed to inducing laughter of video contents. In the case of a purely comical scene (V1), we found the optimal insertion time for high funniness rating was the shortest, at 0-0.5 seconds. In the case of a tragicomic scene, a humorous accident (V2), the optimal insertion time was longer, at 0.5-1 seconds after the scene; i.e., a short pause before the music was effective to increase funniness.
Fig. 2 Subjective evaluation value for the funniness in each insertion timing of symbolic music for each video scene.
Furthermore, the subjective evaluation value rating experiments showed that optimal timing was associated with the highest impressiveness of the videos, the highest evaluations, the highest congruence between moving pictures and sounds, and inducement of maximum laughter. We discovered all of the correlation coefficients are
very high, seen in the table summarizing the test.
Table 1 Correlation coefficient between the optimal timing for symbolic music and the affects for audiovisual contents.
** p< .01
In television variety shows or comedy programs, when symbolic music is dubbed over the video as a punch line just after the humorous scenes, insertion of a short pause of between half a second and a full second is very effective at emphasizing the humor of scenes, and increasing the impressiveness of viewer-listeners.
1. Kim, K.H., et al., F. Effectiveness of Sound Effects and Music to Induce Laugh in Comical Entertainment Television Show. The 13th International Conference on Music Perception and Cognition, 2014. CD-ROM.
2. Kim, K.H., et al., Effects of Music and Sound Effects to Increase Laughter in Television Programs. Media & Information Resources, 2014. 21(2): 15-28. (in Japanese with English abstract).
Popular version of paper, 5aSC43, “Appropriateness of acoustic characteristics on perception of disaster warnings.”
Presented Friday morning, December 2, 2016
172nd ASA Meeting, Honolulu
As you might know, Japan has often been hit by natural disasters, such as typhoons, earthquakes, flooding, landslides, and volcanic eruptions. According to the Japan Institute of Country-ology and Engineering , 20.5% of all the M6 and greater earthquakes in the world occurred in Japan, and 0.3% of deaths caused by natural disasters worldwide were in Japan. These numbers seem quite high compared with the fact that Japan occupies only 0.28% of the world’s land mass.
Municipalities in Japan issue and announce evacuation calls to local residents through the community wireless system or home receiver when a disaster is approaching; however, there have been many cases reported in which people did not evacuate even after they heard the warnings . This is because people tend to not believe and disregard warnings due to a normalcy bias . Facing this reality, it is necessary to find a way to make evacuation calls more effective and trustworthy. This study focused on the influence of acoustic characteristics (voice gender, pitch, and speaking rate) of a warning call on the listeners’ perception of the call and tried to make suggestions for better communication.
Three short warnings were created: 1) Kyoo wa ame ga furimasu. Kasa wo motte dekakete kudasai. ‘It’s going to rain today. Please take an umbrella with you.’ 2) Ookina tsunami ga kimasu. Tadachini hinan shitekudasai. ‘A big tsunami is coming. Please evacuate immediately.’ and 3) Gakekuzure no kiken ga arimasu. Tadachini hinan shitekudasai. ‘There is a risk of landslide. Please evacuate immediately.’ A female and a male native speaker of Japanese, who both have relatively clear voices and good articulation, read the warnings out aloud at a normal speed (see Table 1 for the acoustic information of the utterances), and their utterances were recorded in a sound attenuated booth with a high quality microphone and recording device. Each of the female and male utterances was modified using the acoustic analysis software PRAAT  to create stimuli with 20% higher or lower pitch and 20% faster or slower speech rate. The total number of tokens created was 54 (3 warning types x 2 genders x 3 pitch levels x 3 speech rates), but only 4 of the warning 1) tokens were used in the perception experiment as practice stimuli.
Table 1: Acoustic Data of Normal Tokens
34 university students listened to each stimulus through the two speakers placed on the right and left front corners in a classroom (930cm x 1,500cm). Another group of 42 students and 11 people from the public listened to the same stimuli through one speaker placed on the front in a lab (510cm x 750cm). All of the participants rated each token on 1-to-5 scale (1: lowest, 5: highest) in terms of Intelligibility, Reliability, and Urgency.
Figure 1 summarizes the evaluation responses (n=87) in a bar chart, with the average scores calculated from the ratings on a 1-5 scale for each combination of the acoustic conditions. Taking Intelligibility, for example, the average score was the highest when the calls were spoken with a female voice, with normal speed and normal pitch. Similar results are seen for Reliability as well. On the other hand, respondents felt a higher degree of Urgency for both faster speed and higher pitch.
Figure 1. Evaluation responses (bar graph, in percent) and Average scores (data labels and
line graph on 1 – 5 scale)
The data were then analyzed with an analysis of variance (ANOVA, Table 2). Figure 2 illustrates the same results as bar charts. It was confirmed that for all of Intelligibility, Reliability, and Urgency, the main effect of speaking speed was the most dominant. In particular, Urgency can be influenced by the speed factor alone by up to 43%.
Table 2: ANOVA results
Figure 2: Decomposed variances in stacked bar charts based on the ANOVA results
Finally, we calculated the expected average evaluation scores, with respect to different levels of speed, to find out how much influence speed has on Urgency, with a female speaker and normal pitch (Figure 3). Indeed, by setting speed to fast, the perceived Urgency can be raised to the highest level, even at the expense of Intelligibility and Reliability to some degrees. Based on these results, we argue that the speech rate may effectively be varied depending on the purpose of an evacuation call, whether it prioritizes Urgency, or Intelligibility and Reliability.
Figure 3: Expected average evaluation scores on 1-5 scale, setting female voice and normal
Japan Institute of Country-ology and Engineering (2015). Kokudo wo shiru [To know the national land]. Retrieved from: http://www.jice.or.jp/knowledge/japan/commentary09.
2. Nakamura, Isao. (2008). Dai 6 sho Hinan to joho, dai 3 setsu Hinan to jyuumin no shinri [Chapter 6 Evacuation and Information, Section 3 Evacuation and Residents’ Mind]. In H. Yoshii & A. Tanaka (Eds.), Saigai kiki kanriron nyuumon [Introduction to Disaster Management Theory] (pp.170-176). Tokyo: Kobundo.
Drabek, Thomas E. (1986). Human System Responses to Disaster: An Inventory of Sociological Findings. NY: Springer-Verlag New York Inc.
Boersma, Paul & Weenink, David (2013). Praat: doing phonetics by computer [Computer program]. Retrieved from: http://www.fon.hum.uva.nl/praat/.
Moscow State Univerity Moscow RUSSIAN FEDERATION
Center for Industrial and Medical Ultrasound Applied Physics Laboratory University of Washington Seattle, WashingtonUNITED STATES
Popular version of paper 4pBA1, “Kidney stone pushing and trapping using focused ultrasound beams of different structure.”
Presented Thursday afternoon, December 1, 2016 at 1:00pmHAST.
172nd ASA Meeting, Honolulu
Urinary stones (such as kidney or bladder stones) are an important health care problem. One in 11 Americans now has urinary stone disease (USD), and the prevalence is increasing. According to a 2012 report from the National Institute of Diabetes and Kidney and Digestive Diseases (Urological Diseases in America), the direct medical cost of USD in the United States is $10 billion annually, making it the most expensive urologic condition.
Our lab is working to develop more effective and more efficient ways to treat stones. Existing treatments such as shock wave lithotripsy or ureteroscopy are minimally invasive, but can leave behind stone fragments that remain in the kidney and potentially regrow into larger stones over time. We have successfully developed and demonstrated the use of ultrasound to noninvasively move stones in the kidney of human subjects. This technology, called ultrasonic propulsion (UP), uses ultrasound to apply a directional force to the stone, propelling it in a direction away from the sonic source, or transducer. Some stones need to be moved towards the ureteropelvic junction (the exit from the kidney that allows stones to pass through the ureter to the bladder) to aid their passage. In other cases, this technology may be useful to relieve an obstruction caused by a stone that may just need a nudge or a rotation to pass, or at least to allow urine to flow and decompress the kidney.
While UP is able to help stones pass, it is limited in how the stones can be manipulated by an ultrasound transducer in contact with the skin from outside the body. Some applications require the stone to be moved sideways or towards the transducer rather than away from it.
To achieve more versatile manipulation of stones, we are developing a new strategy to effectively trap a stone in an ultrasound beam. Acoustic trapping has been explored by several other researchers, particularly for trapping and manipulating cells, bubbles, droplets, and particles much smaller than length of the sound wave. Different configurations have been used to trap particles in standing waves and focused fields. By trapping the stone in an ultrasound beam, we can then move the transducer or electronically steer the beam to move the stone with it.
In this work, we accomplished trapping through the use of vortex beams. Typical focused beams create a single region of high ultrasound pressure, producing radiation force away from the transducer. Vortex beams, on the other hand, are focused beams that create a tube-shaped intensity pattern with a spiraling wave front (Fig. 1). The ultrasound pressure in the middle is very low, while the pressure around the center is high. The result is that there is a component of the ultrasound radiation force pushing the stone towards the center and trapping it in the middle of the beam. In addition to trapping, such a beam can apply a torque to the stone and can rotate it.
To test this idea, we simulated the radiation force on spheres of different materials (including stones) to determine how each would respond in a vortex beam. An example is shown in Fig 2. A lateral-axial cross section of the beam is displayed, with a spherical stone off-center in the tube-shaped beam. The red arrow shows that the force on the sphere is away from the center because the stone is outside of the vortex. Once the center of the stone crosses the peak, the force is directed inward. Usually, there is also some force away from the transducer still, but the object can be trapped against a surface.
We also built transducers and electrically excited them to generate the vortex in experiments. At first, we used the vortex to trap, rotate, and drag an object on the water surface (Fig. 3). By changing the charge of the vortex beam (the rate of spiraling generated by the transducer), we controlled the diameter of the vortex beam, as well as the direction and speed at which the objects rotated. We also tested manipulation of objects placed deep in a water tank. Glass or plastic beads and kidney stones placed on a platform of tissue-mimicking material. By physically shifting the transducer, we were able to move these objects a specified distance and direction along the platform (Fig 4). These results are best seen in videos at apl.uw.edu/pushing stones.
We have since worked on developing vortex beams with a 256-element focused array transducer. Our complex array can electronically move the beam and drag the stone without physically moving the transducer. In a highly focused transducer, such as our array, sound can even be focused beyond the stone to generate an axial high pressure spot to help trap a stone axially or even pull the stone toward the transducer.
There are several ways in which this technology might be useful for kidney stones. In some cases, it might be employed in gathering small particles together and moving them collectively, holding a larger stone in place for fragmentation techniques such as lithotripsy, sweeping a stone when the anatomy inhibits direct radiation force away from the transducer, or, as addressed here dragging or pulling a stone. In the future, we expect to continue developing phased array technology to more controllably manipulate stones. We are also working to develop and validate new beam profiles, and electronic controls to remotely gather the stone and move it to a new location. We expect that this sort of tractor beam could also have applications in manufacturing, such as ever shrinking electronics, and even in space.
This work was supported by RBBR 14-02- 00426, NIH NIDDK DK43881 and DK104854, and NSBRI through NASA NCC 9-58.
O.A. Sapozhnikov and M.R. Bailey. Radiation force of an arbitrary acoustic beam on an elastic sphere in a fluid. – J. Acoust. Soc. Am., 2013, v. 133, no. 2, pp. 661-676.
A.V. Nikolaeva, S.A. Tsysar, and O.A. Sapozhnikov. Measuring the radiation force of megahertz ultrasound acting on a solid spherical scatterer. – Acoustical Physics, 2016, v. 62, no. 1, pp. 38-45.
J.D. Harper, B.W. Cunitz, B. Dunmire, F.C. Lee, M.D. Sorensen, R.S. Hsi, J. Thiel, H. Wessells, J.E. Lingeman, and M.R. Bailey. First in human clinical trial of ultrasonic propulsion of kidney stones. – J. Urology, 2016, v. 195, no. 4 (Part 1), pp. 956–964.
Figure 1. The cross section at the focus for the ultrasound pressure of a vortex beam. The pressure magnitude (left) has a donut-shape distribution, whereas the phase (right) has a spiral-shape structure. A stone can be trapped at the center of the ring.
Figure 2. The simulated tubular field such as occurs in a vortex beam and its force on a stone. In this simulation the transducer is on the left and the ultrasound propagates to the right. The arrow indicates the force which depends on the position of the stone (see video ‘Vortex beam for transversal trapping.wmv’)
Figure 3. A small object trapped in the center of a vortex beam on the water surface. The ring-shaped impression due to radiation force on the surface can be seen. The phase difference between each sector element of the transducer affects the diameter of the beam and the spin rate. The 2 mm plastic object floating on the surface is made to rotate by the vortex beam (see video ‘’ Vortex beam rotates piece of plastic on water surface.wmv).
Figure 4. A focused vortex beam transducer in water (shown on the top) traps one of the styrofoam beads (shown in the bottom) and translates it in lateral direction (see video ‘Trapping and moving styrofoam beads underwater.wmv’)
Popular version of poster, “Writer recognition with a sound in hand-writing”
172nd ASA Meeting, Honolulu
We can notice a car approaching by noise it makes on the road or can recognize a person by the sound of their footsteps. There are many studies analyzing and recognizing these noises. In the computer security industry, studies have even been proposed to estimate what is being typed from the sound of typing on the keyboard  and extracting RSA keys through noises made by a PC .
Of course, there is a relationship between a noise and its cause and that noise, therefore, contains information. The sound of a person writing, or “hand writing sound,” is one of the noises in our everyday environment. Previous studies have addressed the recognition of handwritten numeric characters by using the resulting sound, finding an average recognition of 88.4%. Based on this study, we seek the possibility of recognizing and identifying a writer by using the sound of their handwriting. If accurate identification is possible, it could become a method of signature verification without having to ever look at the signature.
We used the handwriting sounds of nine participants, conducting recognition experiments. We asked them to write the same text, which were names in Kanji, the Chinese characters, under several different conditions, such as writing slowly or writing on a different day. Figure 1 shows an example of a spectrogram of the hand-writing sound we analyzed. The bottom axis represents time and the vertical axis shows frequency. Colors represent the magnitude – or intensity – of the frequencies, where red indicates high intensity and blue is low.
The spectrogram showed features corresponding to the number of strokes in the Kanji. We used a recognition system based on a hidden Markov model (HMM) – typically used for speech recognition –, which represents transitions of spectral patterns as they evolve in time. The results showed an average identification rate of 66.3%, indicating that writer identification is possible in this manner. However, the identification rate decreased under certain conditions, especially a slow writing speed.
To improve performances, we need to increase the number of hand writing samples and include various written texts as well as participants. We also intend to include writing of English characters and numbers. We expect that Deep Learning, which is attracting increasing attention around the world, will also help us achieve a higher recognition rate in future experiments.
Zhuang, L., Zhou, F., and Tygar, J. D., Keyboard Acoustic Emanations Revisited, ACM Transactions on Information and Systems Security, 2009, vol.13, no.1, article 3, pp.1-26.
Genkin, D., Shamir, A., and Tromer, E., RSA Key Extraction via Low-Bandwidth Acoustic Cryptanalysis, Proceedings of CRYPTO 2014, 2014, pp.444-461.
Kitano, S., Nishino, T. and Naruse, H., Handwritten digit recognition from writing sound using HMM, 2013, Technical Report of the Institute of Electronics, Information and Communication Engineers, vol.113, no.346, pp.121-125.