Dissertations, Theses, and Capstone Projects

Phonologically-Informed Speech Coding for Automatic Speech Recognition-based Foreign Language Pronunciation Training

Anthony J. Vicario, The Graduate Center, City University of New YorkFollow

Date of Degree

2-2020

Document Type

Thesis

Degree Name

M.A.

Program

Linguistics

Advisor

Kyle Gorman

Subject Categories

Computational Linguistics | Linguistics | Phonetics and Phonology

Keywords

NLP, Computational Phonology, Automatic Speech Recognition, CAPT, Machine Learning, Neural Networks

Abstract

Automatic speech recognition (ASR) and computer-assisted pronunciation training (CAPT) systems used in foreign-language educational contexts are often not developed with the specific task of second-language acquisition in mind. Systems that are built for this task are often excessively targeted to one native language (L1) or a single phonemic contrast and are therefore burdensome to train. Current algorithms have been shown to provide erroneous feedback to learners and show inconsistencies between human and computer perception. These discrepancies have thus far hindered more extensive application of ASR in educational systems.

This thesis reviews the computational models of the human perception of American English vowels for use in an educational context; exploring and comparing two types of acoustic representation: a low-dimensionality "linguistically-informed" formant representation and more traditional Mel frequency cepstral coefficients (MFCCs). We first compare two algorithms for phoneme classification (support vector machines and long short-term memory recurrent neural networks) trained on American English vowel productions from the TIMIT corpus. We then conduct a perceptual study of non-native English vowel productions perceived by native American English speakers. We compare the results of the computational experiment and the human perception experiment to assess human/model agreement. Dissimilarities between human and model classification are explored. More phonologically-informed audio signal representations should create a more human-aligned, less L1-dependent vowel classification system with higher interpretability that can be further refined with more phonetic- and/or phonological-based research. Results show that linguistically-informed speech coding produces results that better align with human classification, supporting use of the proposed coding for ASR-based CAPT.

Recommended Citation

Vicario, Anthony J., "Phonologically-Informed Speech Coding for Automatic Speech Recognition-based Foreign Language Pronunciation Training" (2020). CUNY Academic Works.
https://academicworks.cuny.edu/gc_etds/3636

Download

Included in

Computational Linguistics Commons, Phonetics and Phonology Commons

COinS

CUNY Academic Works

Dissertations, Theses, and Capstone Projects

Phonologically-Informed Speech Coding for Automatic Speech Recognition-based Foreign Language Pronunciation Training

Date of Degree

Document Type

Degree Name

Program

Advisor

Subject Categories

Keywords

Abstract

Recommended Citation

Included in

Browse

Search

Author Corner

Links

CUNY Academic Works

Dissertations, Theses, and Capstone Projects

Phonologically-Informed Speech Coding for Automatic Speech Recognition-based Foreign Language Pronunciation Training

Author

Date of Degree

Document Type

Degree Name

Program

Advisor

Subject Categories

Keywords

Abstract

Recommended Citation

Included in

Share

Browse

Search

Author Corner

Links