Dissertations, Theses, and Capstone Projects

Date of Degree

6-2026

Document Type

Master's Thesis

Degree Name

Master of Arts

Program

Linguistics

Advisor

Kyle Gorman

Subject Categories

Computational Linguistics

Keywords

grapheme-to-phoneme conversion, side-pronunciation, multi-source learning

Abstract

This thesis introduces G&P2P, a multi-source framework for grapheme-to-phoneme (G2P) conversion that integrates side pronunciations from multiple lexical resources. Unlike traditional single-source approaches, G&P2P fuses data from multi-sourced pronunciation dictionaries—including CELEX, PronLex, NETTalk, and WikiPron—through several fusion strategies. The goal is to improve model performance on out-of-vocabulary words through multi-source learning. Experiments were conducted with attentive LSTM, pointer-generator LSTM, and pointer-generator Transformer architectures. Models were trained on combinations of datasets and evaluated using word error rate (WER) across five random seeds.

Results show that fusing expert-curated dictionaries such as CELEX and PronLex consistently improves accuracy, achieving an 11.81-point absolute error reduction in WER with the pointer-generator LSTM. In contrast, incorporating noisy, crowd-sourced resources (e.g., WikiPron) might degrade performance. A Friedman test confirmed no significant differences across fusion strategies, indicating that dataset quality rather than fusion method is the critical factor. Moreover, LSTM-based architectures with pointer mechanisms outperformed Transformer-based models in nearly all conditions. These findings highlight the value of high-quality multi-source supervision in G2P modeling and offer practical guidance for the future development of speech technology.

Share

COinS