Dissertations and Theses

Date of Award

2020

Document Type

Thesis

First Advisor

Jie Wei

Second Advisor

Michael Grossberg

Third Advisor

Zhigang Zhu

Keywords

speaker verification, edge device machine learning, Arduino, speech embedding, speaker separation

Abstract

The continued shrinking of processors and other physical hardware in concert with development of embeddable machine learning frameworks has enabled new use cases placing machine learning directly in the “wild”. The problem of speaker verification, for a long time, has been deployed to perform inference on systems with significant computations resources. More recently, these systems have been built for smaller, cheaper devices which can be placed in people's homes or other edge locations. Here, we aim to demonstrate that a reasonably accurate, generalizable, text-independent speaker verification system can be built, trained, and, ultimately, deployed onto a microcontroller with as a little as 1MB of flash memory. That is, a system which should be able to enroll new speakers onto the device in an online fashion. Previous research has demonstrated that online enrollment through use of a generalizable speaker verification model using speaker-specific embeddings is possible. Recent work has outlined embeddable systems which work on mobile phones and Internet-of-Things (IoT) devices which can be located in user's homes and other disparate locations with limited access to computational hardware. So far, however, the feasibilty of building such a system for a microcontroller has not been established. As mentioned, these systems have been successfully deployed on larger edge devices, here our aim to explore the possibility of doing so on a single microcontroller. We use a concatenation of the publically available LibriSpeech and VoxCeleb datasets to train several small, generalizable, speaker verification models. Models trained on this data include those implementing recurrent and convolutional neural network architectures. In order to deploy our inference system to a microcontroller, we re-produce a log-mel spectrogram framework implemented in Python to our target device supported language: C++. We show that it is possible to build a reasonably accurate,EER ≤11%, generalizable text-independent speaker verification model which will fit on even the smallest microcontroller. In conjuction with our log-mel spectrogram implementation in C++, it is possible to deploy this system in its entirety onto an Arduino Nano device with an on-board microphone for online speaker enrollment and inference. The field of edge device machine learning (TinyML) is an active area of research. Our contribution demonstrates the possibility of building systems which can perform inference on a form small microcontroller, accepting the trade-offs inherit in the problem.

Included in

Data Science Commons

Share

COinS
 
 

To view the content in your browser, please download Adobe Reader or, alternately,
you may Download the file to your hard drive.

NOTE: The latest versions of Adobe Reader do not support viewing PDF files within Firefox on Mac OS and if you are using a modern (Intel) Mac, there is no official plugin for viewing PDF files within the browser window.