Publications and Research
Document Type
Article
Publication Date
2023
Abstract
This article presents the Brazilian Portuguese-Russian (BraPoRus) corpus, whose goal is to collect, analyze, and preserve for posterity the spoken heritage Russian still used today in Brazil by approximately 1,500 elderly bilingual heritage Russian–Brazilian Portuguese speakers. Their unique 100-year-old variety of moribund Russian is disappearing because it has not been passed to their descendants born in Brazil. During the COVID-19 pandemic, we remotely collected 170 h of speech samples in heritage Russian from 26 participants (Mage = 75.7 years) in naturalistic settings using Zoom or a phone call. To estimate the quality of collected data, we focus on two methodological challenges, automatic transcription and acoustic quality of remote recordings. First, we find that among commercially available transcription programs, Sonix far outperforms Google Transcribe and Vocalmatic on the measure of word error rate (WER). Second, we also establish that the acoustic quality of the remote recordings was adequate for intonational and speech rate analysis. Moreover, this remote method of collecting and analyzing speech samples works successfully with elderly bilingual participants who speak a heritage language different from their dominant societal language, and it can become a new norm when face-to-face communication with elderly participants is not possible.