Dissertations, Theses, and Capstone Projects
Date of Degree
9-2022
Document Type
Dissertation
Degree Name
Ph.D.
Program
Psychology
Advisor
Virginia Valian
Advisor
Martin Chodorow
Committee Members
Kyle Gorman
Sandeep Prasada
Subject Categories
Cognitive Psychology | Computational Linguistics | Developmental Psychology | First and Second Language Acquisition
Keywords
language acquisition, corpus linguistics, computational models, utterance length, syntactic development
Abstract
How early do children produce multiword utterances? Do children's early utterances reflect abstract syntactic knowledge or are they the result of data-driven learning? We examine this issue through corpus analysis, computational modeling, and adult simulation experiments. Chapter 1 investigates when children start producing multiword utterances; we use corpora to establish the development of multiword utterances and a probabilistic computational model to account for the quantitative change of early multiword utterances. We find that multiword utterances of different lengths appear early in acquisition and increase together, and the length growth pattern can be viewed as a probabilistic and dynamic process.
Chapter 2 asks whether very early combinatorial speech reflects abstract syntactic knowledge or simply item-based learning driven by linguistic input. We use different language models (LMs) to track syntactic and lexical development separately. The results show that the syntactic structure behind children’s early combinatorial speech may exceed the development of word combinations acquired from the learning input. Chapter 3 investigates whether the ungrammatical utterances produced by children at an early age (such as 'key-open-door') have adult-like syntactic structure despite their incorrect word choices or missing words, or whether those sequences come from data-driven learning of words without syntactic knowledge. We ask a) adult native speakers, b) statistical LMs, and c) deep neural LMs to produce intelligible utterances from scrambled children's multiword utterances (e.g., 'door-key-open'). We found that the statistical LMs involving local statistical learning trained on child-directed speech can account for the production of those early multiword utterances. The predictive fit of a simple statistical model is as good as or even better than human subjects and the neural model which assumes more complex learning mechanisms and was trained on larger size data. Taken together, the three chapters provide a new, systematic account of when and how children's very early combinatorial speech develops.
Recommended Citation
Xu, Qihui, "Linguistic Abstractions in Children’s Very Early Utterances" (2022). CUNY Academic Works.
https://academicworks.cuny.edu/gc_etds/5061
Included in
Cognitive Psychology Commons, Computational Linguistics Commons, Developmental Psychology Commons, First and Second Language Acquisition Commons