Date of Degree

6-2026

Document Type

Doctoral Dissertation

Degree Name

Doctor of Philosophy

Program

Psychology

Advisor

Patricia J. Brooks

Committee Members

Wei Wang

Irina Sekerina

Martin Chodorow

Jay Verkuilen

Subject Categories

Cognitive Psychology | Cognitive Science | Developmental Psychology

Keywords

Lexical Networks, Mental Lexicon, Network Science

Abstract

Lexical networks are graphs where words are treated as nodes, and relations between words are treated as edges. The adoption of lexical networks in psycholinguistics is impeded by a lack of readily available inferential methods, which limits work to purely descriptive analyses with little ability to make inferences about what variables account for lexical network structure. The current research explores the use of latent space network models as a method for analyzing lexical networks. Latent space network models are latent variable regression models, which model the presence or strength of edges by estimating pairwise distances between nodes in a latent space. This dissertation comprises 4 papers using latent space network models to examine lexical networks constructed from word-association data.

Chapter 2 presents an initial application of latent space network models examining lexical networks constructed from participants with below- and above-average vocabulary sizes (n = 22 per group). Participants completed a repeated word-association task, and overlapping responses were used to construct weighted cue-cue word association networks for each group. Using latent space network models, we examined the impact of distributional, taxonomic, and phonological similarity, as well as the absolute difference between words’ concreteness, age of acquisition, and frequency. We found effects of distributional and taxonomic similarity in both networks, with effects of phonological similarity and concreteness only in the below-average network. In terms of latent spaces, qualitatively, the above-average network appeared to have a more clustered space, suggestive of greater word differentiation. However, the model fit was lacking, suggesting that the “out-of-the-box” application of latent space network models to word-association networks was potentially problematic. As such, this paper is akin to a pilot study, assessing the appropriateness of using latent space network models with lexical networks

Chapter 3 sought to improve model fit by exploring alternative network estimation methods in order to avoid the sparseness often associated with word-association networks. To do so, we introduce a method for inducing a cue-cue word-association subgraph from a larger cue-response word-association network. We suggest using the weighted geodesic distance between cues in the larger network, where weights correspond to the reciprocal of the number of times a response was given to a cue. This method finds the path between cues with the strongest word associations and sums their weights to produce a distance metric. The resulting graph is a normally distributed fully connected weighted cue-cue graph. Using the same dataset from chapter 2, we found that model performance had dramatically improved, which allowed us to explore higher dimensional latent spaces. Here, we found that distributional similarity was significant in the above-average vocabulary group but not the below-average vocabulary group. We confirmed findings of additional clustering in the above-average network. However, we were still unable to make direct comparisons between networks.

Chapter 4 compares two modeling strategies for analyzing the lexical networks of children (n = 21, ages 7 to 11 years) and young adults (n = 21). Using robust metrics of lexical features, we applied a traditional mixed-effect modeling approach and a novel multinetwork bipartite extension of the latent space model. The mixed-effects approach treated the relational features (distributional, taxonomic, and phonological similarities) between cues and responses, as well as the word-level features (concreteness, age of acquisition, and frequency) of the response as separate outcome variables and modeled at the individual level. The multinetwork bipartite latent space model operates directly on the cue-response adjacency matrices. It accommodates multiple networks by modeling each network’s latent space separately, but fitting covariate effects across both networks, along with interaction terms to compare the influence of covariates across networks. The mixed-effect models revealed group effects for distributional, taxonomic, and phonological similarities, such that adults relied more on distributional and taxonomic similarity for their word associations, while children relied more on phonological similarity. We also found that these similarities decreased over time. For word-level features, we found that adults gave more concrete responses, and that responses became less concrete and tended to be acquired later in life across repetitions. The multinetwork bipartite model revealed effects of distributional similarity and phonological similarity, with adults relying more on cooccurrence statistics and children relying more on phonology. There were slight differences in network structure, with adults having a more spread-out space and slightly more clusters in their network than children.

Chapter 5 applies the same traditional and multinetwork bipartite analyses to the datasets described in chapters 2 and 3, with the model extended to capture heterogeneous degree distributions. Remarkably, we found very few differences between the below- and above-average vocabulary groups. The mixed-model analysis revealed patterns of decreasing similarities across the three repetitions, with no group differences. Regarding features of responses, we found that participants with larger vocabulary sizes tended to give words acquired at later ages and that were more frequent in the language. We found stronger edge weights for responses that were distributionally similar to the cue, as well as responses that were more frequent in the language. Additionally, we found that the strongest associations tended to be given in the first repetition, and that the above-average group tended to have more uniformly strong word associations across repetitions than the below-average group. We found no other differences in the features of word associations between groups. The largest differences between networks were in the structure of their latent spaces. The above-average network had many more clusters than the below-average network, confirming qualitative results reported in chapters 2 and 3.

The chapters presented in this dissertation are a collection of published and submitted papers that iterate on the investigation of word-association networks based on vocabulary knowledge, and the feasibility of applying latent space network models to such data. As such, there is considerable overlap between paper between papers. In particular, chapters 4 and 5 have a large amount of overlap in their introductions.

Recommended Citation

Gravelle, Christopher D., "An Inferential Analysis of Lexical Networks" (2026). CUNY Academic Works.
https://academicworks.cuny.edu/gc_etds/6625

Download

Included in

Cognitive Psychology Commons, Cognitive Science Commons, Developmental Psychology Commons

COinS

Dissertations, Theses, and Capstone Projects

An Inferential Analysis of Lexical Networks

Date of Degree

Document Type

Degree Name

Program

Advisor

Committee Members

Subject Categories

Keywords

Abstract

Recommended Citation

Included in

Browse

Author Corner

Search

Links

Dissertations, Theses, and Capstone Projects

An Inferential Analysis of Lexical Networks

Author

Date of Degree

Document Type

Degree Name

Program

Advisor

Committee Members

Subject Categories

Keywords

Abstract

Recommended Citation

Included in

Share

Browse

Author Corner

Search

Links