Dissertations, Theses, and Capstone Projects
Date of Degree
6-2014
Document Type
Thesis
Degree Name
M.A.
Program
Linguistics
Advisor
Andrew Rosenberg
Subject Categories
Computer Sciences | Linguistics
Keywords
Babel, burstiness, cache, keyword search, spoken term detection, word-burst
Abstract
State of the art technologies for speech recognition are very accurate for heavily studied languages like English. They perform poorly, though, for languages wherein the recorded archives of speech data available to researchers are relatively scant. In the context of these low-resource languages, the task of keyword search within recorded speech is formidable. We demonstrate a method that generates more accurate keyword search results on low-resource languages by studying a pattern not exploited by the speech recognizer. The word-burst, or burstiness, pattern is the tendency for word utterances to appear together in bursts as conversational topics fluctuate. We give evidence that the burstiness phenomenon exhibits itself across varied languages. Using burstiness features to train a machine-learning algorithm, we are able to assess the likelihood that a hypothesized keyword location is correct and adjust its confidence score accordingly, yielding improvements in the efficacy of keyword search in low-resource languages.
Recommended Citation
Richards, Justin, "Echolocation: Using Word-Burst Analysis to Rescore Keyword Search Candidates in Low-Resource Languages" (2014). CUNY Academic Works.
https://academicworks.cuny.edu/gc_etds/273