Date of Degree

6-2014

Document Type

Thesis

Degree Name

M.A.

Program

Linguistics

Advisor(s)

Andrew Rosenberg

Subject Categories

Computer Sciences | Linguistics

Keywords

Babel, burstiness, cache, keyword search, spoken term detection, word-burst

Abstract

State of the art technologies for speech recognition are very accurate for heavily studied languages like English. They perform poorly, though, for languages wherein the recorded archives of speech data available to researchers are relatively scant. In the context of these low-resource languages, the task of keyword search within recorded speech is formidable. We demonstrate a method that generates more accurate keyword search results on low-resource languages by studying a pattern not exploited by the speech recognizer. The word-burst, or burstiness, pattern is the tendency for word utterances to appear together in bursts as conversational topics fluctuate. We give evidence that the burstiness phenomenon exhibits itself across varied languages. Using burstiness features to train a machine-learning algorithm, we are able to assess the likelihood that a hypothesized keyword location is correct and adjust its confidence score accordingly, yielding improvements in the efficacy of keyword search in low-resource languages.

 
 

To view the content in your browser, please download Adobe Reader or, alternately,
you may Download the file to your hard drive.

NOTE: The latest versions of Adobe Reader do not support viewing PDF files within Firefox on Mac OS and if you are using a modern (Intel) Mac, there is no official plugin for viewing PDF files within the browser window.