Dissertations, Theses, and Capstone Projects
Date of Degree
2-2024
Document Type
Thesis
Degree Name
M.A.
Program
Linguistics
Advisor
Kyle Gorman
Subject Categories
Computational Linguistics
Keywords
Low-resource languages, Finno-Ugric languages, Uralic languages, Ingrian, Computational morphology, Morphophonology
Abstract
This paper will present a dual method toward data enrichment for low-resource languages. Using Yoyodyne -- a Fairseq-inspired neural library for small-vocabulary sequence-to-sequence generation -- a morphological generation task was tested across labeled data encompassing multiple stages of enrichment for the low-resource language Ingrian. Due to limitations in the available data for Ingrian, weighted finite-state transducers (WFSTs) were used to generate an expanded vocabulary via HFST's toolkit for Uralic languages, and GiellaLT, a source for FST-driven lexica for low-resource languages. Further stages of experimentation used labeled data from related, higher-resource languages (Finnish, Estonian) to encourage cross-lingual transfer in the interest of paradigm completion.
Recommended Citation
Harrison, Andrea M., "Consonant (De)gradation in Ingrian?" (2024). CUNY Academic Works.
https://academicworks.cuny.edu/gc_etds/5677