Date of Award
speaker verification, edge device machine learning, Arduino, speech embedding, speaker separation
The continued shrinking of processors and other physical hardware in concert with development of embeddable machine learning frameworks has enabled new use cases placing machine learning directly in the “wild”. The problem of speaker verification, for a long time, has been deployed to perform inference on systems with significant computations resources. More recently, these systems have been built for smaller, cheaper devices which can be placed in people's homes or other edge locations. Here, we aim to demonstrate that a reasonably accurate, generalizable, text-independent speaker verification system can be built, trained, and, ultimately, deployed onto a microcontroller with as a little as 1MB of flash memory. That is, a system which should be able to enroll new speakers onto the device in an online fashion. Previous research has demonstrated that online enrollment through use of a generalizable speaker verification model using speaker-specific embeddings is possible. Recent work has outlined embeddable systems which work on mobile phones and Internet-of-Things (IoT) devices which can be located in user's homes and other disparate locations with limited access to computational hardware. So far, however, the feasibilty of building such a system for a microcontroller has not been established. As mentioned, these systems have been successfully deployed on larger edge devices, here our aim to explore the possibility of doing so on a single microcontroller. We use a concatenation of the publically available LibriSpeech and VoxCeleb datasets to train several small, generalizable, speaker verification models. Models trained on this data include those implementing recurrent and convolutional neural network architectures. In order to deploy our inference system to a microcontroller, we re-produce a log-mel spectrogram framework implemented in Python to our target device supported language: C++. We show that it is possible to build a reasonably accurate,EER ≤11%, generalizable text-independent speaker verification model which will fit on even the smallest microcontroller. In conjuction with our log-mel spectrogram implementation in C++, it is possible to deploy this system in its entirety onto an Arduino Nano device with an on-board microphone for online speaker enrollment and inference. The field of edge device machine learning (TinyML) is an active area of research. Our contribution demonstrates the possibility of building systems which can perform inference on a form small microcontroller, accepting the trade-offs inherit in the problem.
Duffy, Thomas P., "Edge Device Speaker Verification" (2020). CUNY Academic Works.