Shhh! Smart Tech at Work: Zoning in on Target Sounds Amid the Noise
Jingya Yang – jing.ya161@gmail.com
Department of Power Mechanical Engineering, National Tsing Hua University, Hsinchu, -, 300, Taiwan
Popular version of 1aSP2 – Target-Direction Sound Extraction Using a Hybrid DSP/Deep Learning Approach
Presented at the 187th ASA Meeting
Read the abstract at https://eppro01.ativ.me//web/index.php?page=IntHtml&project=ASAFALL24&id=3771518
–The research described in this Acoustics Lay Language Paper may not have yet been peer reviewed–
In a noisy world, capturing clear audio from specific directions can be a game-changer. Imagine a system that can zero in on a target sound, even amid background noise. This is the goal of Target Directional Sound Extraction (TDSE), a process designed to isolate sounds from a particular direction, while filtering out unwanted noise.
Our team has developed an innovative TDSE system that combines Digital Signal Processing (DSP) and deep learning. Traditional sound extraction relies on signal processing, but it struggles when multiple sounds come from various directions or when using fewer microphones. Deep learning can help, but it sometimes results in distorted audio. By integrating DSP-based spatial filtering with a deep neural network (DNN), our system extracts clear target audio with minimal interference, even with limited microphones.
The system relies on spatial filtering techniques like beamforming and blocking. Beamforming serves as a signal estimator, enhancing sounds from the target direction, while blocking acts as a noise estimator, suppressing sounds from the target direction and leaving other unwanted noises intact. Using a deep learning model, our system processes spatial features and sound embeddings (unique characteristics of the target sound), yielding clear, isolated audio. In our tests, this method improved sound quality by 3-9 dB and performed well with different microphone setups, even those not used during training.
TDSE could transform various industries, from virtual meetings to entertainment, by enhancing audio clarity in real time. Our system’s design offers flexibility, making it adaptable for real-world applications where clear directional audio is crucial.
This approach is an exciting step toward more robust, adaptive audio processing systems, allowing users to capture target sounds even in challenging environments.