Date of Degree
Low-resource languages, Finno-Ugric languages, Uralic languages, Ingrian, Computational morphology, Morphophonology
This paper will present a dual method toward data enrichment for low-resource languages. Using Yoyodyne -- a Fairseq-inspired neural library for small-vocabulary sequence-to-sequence generation -- a morphological generation task was tested across labeled data encompassing multiple stages of enrichment for the low-resource language Ingrian. Due to limitations in the available data for Ingrian, weighted finite-state transducers (WFSTs) were used to generate an expanded vocabulary via HFST's toolkit for Uralic languages, and GiellaLT, a source for FST-driven lexica for low-resource languages. Further stages of experimentation used labeled data from related, higher-resource languages (Finnish, Estonian) to encourage cross-lingual transfer in the interest of paradigm completion.
Harrison, Andrea M., "Consonant (De)gradation in Ingrian?" (2024). CUNY Academic Works.