Dissertations, Theses, and Capstone Projects

Date of Degree

6-2026

Document Type

Doctoral Dissertation

Degree Name

Doctor of Philosophy

Program

Speech-Language-Hearing Sciences

Advisor

Valerie Shafer

Committee Members

Isabelle Barrière

Martin Chodorow

Irina Sekerina

Subject Categories

Applied Linguistics | Computational Linguistics | First and Second Language Acquisition | Social and Behavioral Sciences

Keywords

Developmental Language Disorder, Machine Learning, Language Acquisition, Russian

Abstract

This study investigated a machine learning (ML) approach to identifying Developmental Language Disorder (DLD) in Russian-speaking children using narrative data. ML methods can capture subtle linguistic patterns that distinguish typical and atypical development, which is especially important in cross-linguistic contexts where morphosyntactic variation affects the manifestation of DLD. Diagnosis remains challenging in less-studied languages due to limited knowledge of language-specific deficits and a lack of validated assessment tools. This study evaluated whether ML algorithms can provide a more efficient alternative to traditional screening methods.

Two binary classification studies were conducted using corpus data: 1) classification of narratives told by Russian 4- to 9-year-old monolingual children with typical development (TD) versus those with DLD; 2) classification of narratives told by four- to nine-year old Russian-Dutch bilingual TD versus Russian monolingual children with DLD. The studies tested (1) whether computational linguistic features would outperform traditional language sample analysis (LSA) measures employed in a clinical setting; and (2) whether n-gram features (word and part-of-speech (POS)) would yield superior classification performance. Feature models were trained and tested using Logistic Regression and Support Vector Machine classifiers.

The results supported both hypotheses. Computational features outperformed traditional LSA measures and achieved high sensitivity, specificity, and F1 scores at or above clinical threshold in both studies. POS n-grams demonstrated the highest performance, capturing diagnostic patterns in part-of-speech distributions among monolingual and bilingual participants, while word n-grams identified lexical-semantic differences not reflected in traditional clinical measures among groups. These findings support the viability of ML-based approaches as efficient, narrative-based pre-screening tools for DLD in highly inflectional languages. Some best-performing features also demonstrated potential for cross-content and cross-linguistic DLD markers. This study is the first one to apply automated methods to Russian child language narratives and to compare bilingual TD and monolingual DLD groups, demonstrating the clinical utility of ML-based approach for DLD identification in multilingual contexts.

Share

COinS