Date of Degree


Document Type


Degree Name



Educational Psychology


Jay Verkuilen

Committee Members

Howard Everson

Wei Wang

Subject Categories

Bilingual, Multilingual, and Multicultural Education | Educational Methods | Educational Psychology | Educational Technology | Science and Mathematics Education


text mining, language of assessment, mathematics assessment, assessment equity, machine learning methods, parts of speech analysis


The following is a five-chapter dissertation surrounding the use of text mining techniques for better understanding the language of mathematics items from standardized tests to improve linguistic equity of these items to support assessment of English Language Learners.

Introduction: The dissertation begins with an overview of the problem that English Language Learners are likely not able to demonstrate their full mathematical ability due to the construct irrelevant variance caused by these items being written in English. This introduction also introduces the idea of text mining as a methodology for use in exploring this test design issue.

Article 1: This article presents an exploratory study of the vocabulary used in released math test items for grades 3-8. The author collected and cleaned the data to arrive at a final corpus of 5674 math problems. Next, a series of text mining techniques were performed including the “bag of words” approach, sentiment analysis, and Latent Dirichlet Allocation (LDA). The bag of words approach generated an overall word list for the entire corpus, by grade level, and by mathematical domain. For each of these lists, the majority of the words found were polysemous, meaning they had multiple meanings, which is inappropriate for ELLs. The sentiment analysis results showed that there was not any obvious negative sentiment found in these items. Finally, the LDA results showed that there were 9 latent topics found within the language of these items.

Article 2: This article is an exploratory study of the state of the parts of speech used in released math standardized test items for grades 3-8. The author collected and cleaned the data to arrive at a corpus of 5674 math problems. Next, a series of parts of speech analyses were performed to better understand the grammatical structures used within current mathematics items, as well as a bigrams and trigrams analysis of the most commonly used phrases found within these items. The variation in parts of speech and readability of these items was tracked across grade levels and was found to become more complicated as the grade level increased. The grammatical parts of speech were also used to predict the item difficulty for those items (N = 1627) with some of these parts of speech being found to negatively correlated with item difficulty estimates.

Article 3: This article describes the development of an open-source text parser for multiple-choice mathematics items intended for students in grades 3-8. To train this parser, initially, seven machine learning classification algorithms were used to predict item difficulty as measured by p-value. The most accurate of these models was a special kind of Support Vector Machine called a Support Vector Classifier which had almost 50% accuracy. This parser was trained to estimate approximate item difficulty level as well as to identify problematic vocabulary words, estimate the readability of the question, and support the user to know which problematic parts of speech are being used in the item. Math Item Parse is operational but is still in a prototype stage because a larger training set is needed to improve the model accuracy.

Final Discussion: The dissertation concludes with a short discussion that describes how these findings impact educators, test developers, methodologists, and policy makers, and discusses the biggest limitations of this dissertation and offers some next steps.