Dissertations, Theses, and Capstone Projects

Date of Degree

9-2024

Document Type

Thesis

Degree Name

M.A.

Program

Linguistics

Advisor

Rivka Levitan

Subject Categories

Computational Linguistics | Linguistics

Keywords

Synthesizing Speech, Text-to-Speech, TTS, Tacotron2, Adverbs, Complex Adverbial Phrases

Abstract

This study investigates the usage of adverbial modifiers from audiobook data as a resource for training speech synthesizers with a greater range of speech descriptions. The Tacotron2 text-to-speech (TTS) model was used for the purposes of this study. Utilizing the LibriTTS dataset, the Tacotron2 model is trained under two experimental conditions: one incorporating adverbial modifiers into the input text and the other without. The dataset preprocessing involves embedding descriptions using word embeddings and encoding speaker IDs with machine learning techniques. Additionally, the model architecture includes a prosody encoder inspired by prior research. Evaluation of the trained models involves subjective assessments by human listeners. Results from the evaluation provide insights into the efficacy of adverbial modifiers in controlling speech synthesis with a wider range of speech descriptions and show weakly positive but promising results.

Share

COinS