Dissertations, Theses, and Capstone Projects
Date of Degree
6-2026
Document Type
Doctoral Dissertation
Degree Name
Doctor of Philosophy
Program
Speech-Language-Hearing Sciences
Advisor
Valerie Shafer
Committee Members
Isabelle Barrière
Martin Chodorow
Irina Sekerina
Subject Categories
Applied Linguistics | Computational Linguistics | First and Second Language Acquisition | Social and Behavioral Sciences
Keywords
Developmental Language Disorder, Machine Learning, Language Acquisition, Russian
Abstract
This study investigated a machine learning (ML) approach to identifying Developmental Language Disorder (DLD) in Russian-speaking children using narrative data. ML methods can capture subtle linguistic patterns that distinguish typical and atypical development, which is especially important in cross-linguistic contexts where morphosyntactic variation affects the manifestation of DLD. Diagnosis remains challenging in less-studied languages due to limited knowledge of language-specific deficits and a lack of validated assessment tools. This study evaluated whether ML algorithms can provide a more efficient alternative to traditional screening methods.
Two binary classification studies were conducted using corpus data: 1) classification of narratives told by Russian 4- to 9-year-old monolingual children with typical development (TD) versus those with DLD; 2) classification of narratives told by four- to nine-year old Russian-Dutch bilingual TD versus Russian monolingual children with DLD. The studies tested (1) whether computational linguistic features would outperform traditional language sample analysis (LSA) measures employed in a clinical setting; and (2) whether n-gram features (word and part-of-speech (POS)) would yield superior classification performance. Feature models were trained and tested using Logistic Regression and Support Vector Machine classifiers.
The results supported both hypotheses. Computational features outperformed traditional LSA measures and achieved high sensitivity, specificity, and F1 scores at or above clinical threshold in both studies. POS n-grams demonstrated the highest performance, capturing diagnostic patterns in part-of-speech distributions among monolingual and bilingual participants, while word n-grams identified lexical-semantic differences not reflected in traditional clinical measures among groups. These findings support the viability of ML-based approaches as efficient, narrative-based pre-screening tools for DLD in highly inflectional languages. Some best-performing features also demonstrated potential for cross-content and cross-linguistic DLD markers. This study is the first one to apply automated methods to Russian child language narratives and to compare bilingual TD and monolingual DLD groups, demonstrating the clinical utility of ML-based approach for DLD identification in multilingual contexts.
Recommended Citation
Aharodnik, Katsiaryna, "A Machine Learning Approach to Disentangling Developmental Language Disorder from Typical Development in Russian-Speaking Children" (2026). CUNY Academic Works.
https://academicworks.cuny.edu/gc_etds/6785
Included in
Applied Linguistics Commons, Computational Linguistics Commons, First and Second Language Acquisition Commons
