Date of Degree

9-2021

Document Type

Dissertation

Degree Name

Ph.D.

Program

Linguistics

Advisor

William Sakas

Committee Members

Kyle Gorman

Alla Rozovskaya

Subject Categories

Computational Linguistics

Keywords

homograph disambiguation, label imputation, natural language processing, machine learning, deep learning, token classification

Abstract

This dissertation presents the first implementation of label imputation for the task of homograph disambiguation using 1) transcribed audio, and 2) parallel, or translated, corpora. For label imputation from parallel corpora, a hypothesis of interlingual alignment between homograph pronunciations and text word forms is developed and formalized. Both audio and parallel corpora label imputation techniques are tested empirically in experiments that compare homograph disambiguation model performance using: 1) hand-labeled training data, and 2) hand-labeled training data augmented with label-imputed data. Regularized, multinomial logistic regression and pre-trained ALBERT, BERT, and XLNet language models fine-tuned as token classifiers are developed for homograph disambiguation. Model performance after training on parallel corpus-based, label-imputed augmented data shows improvement over training on hand-labeled data alone in classes with low prevalence samples. Four homograph disambiguation data sets generated during the work on the dissertation are made available to the research community. In addition, this dissertation offers a novel typology of homographs with practical implications for both the label imputation process and homograph disambiguation.

Recommended Citation

Seale, Jennifer M., "Label Imputation for Homograph Disambiguation: Theoretical and Practical Approaches" (2021). CUNY Academic Works.
https://academicworks.cuny.edu/gc_etds/4518

Download

Included in

Computational Linguistics Commons

COinS

Dissertations, Theses, and Capstone Projects

Label Imputation for Homograph Disambiguation: Theoretical and Practical Approaches

Date of Degree

Document Type

Degree Name

Program

Advisor

Committee Members

Subject Categories

Keywords

Abstract

Recommended Citation

Included in

Browse

Author Corner

Search

Links

Dissertations, Theses, and Capstone Projects

Label Imputation for Homograph Disambiguation: Theoretical and Practical Approaches

Author

Date of Degree

Document Type

Degree Name

Program

Advisor

Committee Members

Subject Categories

Keywords

Abstract

Recommended Citation

Included in

Share

Browse

Author Corner

Search

Links