Dissertations, Theses, and Capstone Projects

Date of Degree

2-2025

Document Type

Dissertation

Degree Name

Ph.D.

Program

Computer Science

Advisor

Michael I Mandel

Committee Members

Rivka Levitan

Lei Xie

Keelan Evanini

Subject Categories

Computer Engineering | Signal Processing

Keywords

speech enhancement, speech dereverberation, multi-channel, beamforming

Abstract

Traditional single-channel speech enhancement and separation methods focus on enhancing the target speech signal by suppressing the noise and interfering speech signal. The methods suffer from nonlinear distortion brought by the algorithm, which hurts the intelligibility of the speech and also downstream tasks such as automatic speech recognition (ASR). We propose a method that leverages multi-channel input that robustly reduces the nonlinear speech distortion. We first demonstrate a better time-frequency mask estimation can help improve the mask based MVDR beamforming algorithm. Then we propose a novel mask-dependent training criterion to improve the phase estimation for speech separation. Additionally, we propose an end-to-end multi-channel neural network (WPD++) that can simultaneously separate and dereverberate the multi-channel noisy speech mixture. Finally, we show that integrating self-supervised learning models into the multi-channel speech enhancement and dereverberation network further reduces the word error rate (WER) metric for the downstream ASR task.

This work is embargoed and will be available for download on Monday, February 01, 2027

Share

COinS