Dissertations, Theses, and Capstone Projects
Date of Degree
9-2024
Document Type
Thesis
Degree Name
M.A.
Program
Linguistics
Advisor
Rivka Levitan
Subject Categories
Computational Linguistics | Linguistics
Keywords
Synthesizing Speech, Text-to-Speech, TTS, Tacotron2, Adverbs, Complex Adverbial Phrases
Abstract
This study investigates the usage of adverbial modifiers from audiobook data as a resource for training speech synthesizers with a greater range of speech descriptions. The Tacotron2 text-to-speech (TTS) model was used for the purposes of this study. Utilizing the LibriTTS dataset, the Tacotron2 model is trained under two experimental conditions: one incorporating adverbial modifiers into the input text and the other without. The dataset preprocessing involves embedding descriptions using word embeddings and encoding speaker IDs with machine learning techniques. Additionally, the model architecture includes a prosody encoder inspired by prior research. Evaluation of the trained models involves subjective assessments by human listeners. Results from the evaluation provide insights into the efficacy of adverbial modifiers in controlling speech synthesis with a wider range of speech descriptions and show weakly positive but promising results.
Recommended Citation
Akande, Zainab T., "Controlling Emotional Text to Speech Using Complex Adverbial Phrases" (2024). CUNY Academic Works.
https://academicworks.cuny.edu/gc_etds/6027