Dissertations, Theses, and Capstone Projects
Date of Degree
2-2025
Document Type
Dissertation
Degree Name
Ph.D.
Program
Computer Science
Advisor
Michael I Mandel
Committee Members
Rivka Levitan
Lei Xie
Keelan Evanini
Subject Categories
Computer Engineering | Signal Processing
Keywords
speech enhancement, speech dereverberation, multi-channel, beamforming
Abstract
Traditional single-channel speech enhancement and separation methods focus on enhancing the target speech signal by suppressing the noise and interfering speech signal. The methods suffer from nonlinear distortion brought by the algorithm, which hurts the intelligibility of the speech and also downstream tasks such as automatic speech recognition (ASR). We propose a method that leverages multi-channel input that robustly reduces the nonlinear speech distortion. We first demonstrate a better time-frequency mask estimation can help improve the mask based MVDR beamforming algorithm. Then we propose a novel mask-dependent training criterion to improve the phase estimation for speech separation. Additionally, we propose an end-to-end multi-channel neural network (WPD++) that can simultaneously separate and dereverberate the multi-channel noisy speech mixture. Finally, we show that integrating self-supervised learning models into the multi-channel speech enhancement and dereverberation network further reduces the word error rate (WER) metric for the downstream ASR task.
Recommended Citation
Ni, Zhaoheng, "Toward More Intelligible Simultaneous Multi-channel Speech Enhancement and Recognition" (2025). CUNY Academic Works.
https://academicworks.cuny.edu/gc_etds/6163