Dissertations, Theses, and Capstone Projects
Date of Degree
6-2026
Document Type
Master's Thesis
Degree Name
Master of Arts
Program
Linguistics
Advisor
Kyle Gorman
Subject Categories
Computational Linguistics
Keywords
grapheme-to-phoneme conversion, side-pronunciation, multi-source learning
Abstract
This thesis introduces G&P2P, a multi-source framework for grapheme-to-phoneme (G2P) conversion that integrates side pronunciations from multiple lexical resources. Unlike traditional single-source approaches, G&P2P fuses data from multi-sourced pronunciation dictionaries—including CELEX, PronLex, NETTalk, and WikiPron—through several fusion strategies. The goal is to improve model performance on out-of-vocabulary words through multi-source learning. Experiments were conducted with attentive LSTM, pointer-generator LSTM, and pointer-generator Transformer architectures. Models were trained on combinations of datasets and evaluated using word error rate (WER) across five random seeds.
Results show that fusing expert-curated dictionaries such as CELEX and PronLex consistently improves accuracy, achieving an 11.81-point absolute error reduction in WER with the pointer-generator LSTM. In contrast, incorporating noisy, crowd-sourced resources (e.g., WikiPron) might degrade performance. A Friedman test confirmed no significant differences across fusion strategies, indicating that dataset quality rather than fusion method is the critical factor. Moreover, LSTM-based architectures with pointer mechanisms outperformed Transformer-based models in nearly all conditions. These findings highlight the value of high-quality multi-source supervision in G2P modeling and offer practical guidance for the future development of speech technology.
Recommended Citation
Peng, Chun-Yi, "G&P2P: A Multi-Source Approach to Grapheme to Phoneme Conversion" (2026). CUNY Academic Works.
https://academicworks.cuny.edu/gc_etds/6707
