Dissertations, Theses, and Capstone Projects

Propensity Score Analysis and Machine Learning: A Comparison and Application to Post-Secondary Education Data

Khudodod Khudododov, The Graduate Center, City University of New YorkFollow

Date of Degree

6-2023

Document Type

Dissertation

Degree Name

Ph.D.

Program

Social Welfare

Advisor

Michael A. Lewis

Committee Members

Alexis Kuerbis

Harriet Goodman

Maria Rodriguez

Hal Salzman

Subject Categories

Quantitative, Qualitative, Comparative, and Historical Methodologies | Social Statistics | Social Work

Keywords

education research, machine-learning, causal inference, causal estimation, STEM education, STEM research

Abstract

Estimation methods to identify the causal relationships between dependent and independent variables are fundamental to social science research. For social workers, these methods provide crucial knowledge about different factors' complex relationships with a particular issue. Such knowledge helps social workers be better micro, mezzo, and macro change agents.

Different causal estimation methods exist, from randomized controlled studies to methods involving observational studies. In observational studies, which is the focus of this dissertation, participants self-select into intervention. This behavior makes causal estimation more challenging. Since participants self-select into intervention or treatment, there are observed and unobserved differences between participants in the intervention and control groups. One dominant and well-known method to address this challenge is propensity score matching.

Logistic regression has traditionally been the primary approach in calculating propensity scores. However, other approaches, particularly those using machine learning models, are becoming more prominent. While in its nascent stage, several studies used simulated and actual data applying machine learning to causal estimation. Nevertheless, research in this area still needs to be expanded. This study follows a similar approach, comparing two machine learning models to the logistic regression model, thus aiming to add to this knowledge.

Using data from the National Center for Education Statistics (NCES), the Baccalaureate & Beyond longitudinal study 2008/12, this research compared two machine learning models, namely the Random Forest (RF) and the Gradient Boosted Machine (GBM), with logistic regression. All three models were used to calculate the probabilities of assignment to intervention, also known as propensity scores. In this study, the intervention group is students graduating with Science, Technology, Engineering, and Mathematics (STEM) majors, and the comparison group is students graduating with non-STEM majors. Observed covariates included students' background characteristics, high school performance, scholastic scores, and early college performance. The three models were assessed in how well they predicted assignment into intervention and reduced differences in observed characteristics between the intervention and comparison group.

Results indicated that all three models did well in overall prediction accuracy. However, the logistic regression model had a lower sensitivity score than both machine learning models. Additionally, the Random Forest model reduced differences in observed characteristics between the intervention and comparison groups among the three models. In contrast, the logistic model did better than the Gradient Boosted Machine (GBM). Furthermore, both machine learning models increased the differences in those observed characteristics that were not different among the intervention and control groups.

Recommended Citation

Khudododov, Khudodod, "Propensity Score Analysis and Machine Learning: A Comparison and Application to Post-Secondary Education Data" (2023). CUNY Academic Works.
https://academicworks.cuny.edu/gc_etds/5290

Download

Included in

Quantitative, Qualitative, Comparative, and Historical Methodologies Commons, Social Statistics Commons, Social Work Commons

COinS

CUNY Academic Works

Dissertations, Theses, and Capstone Projects

Propensity Score Analysis and Machine Learning: A Comparison and Application to Post-Secondary Education Data

Date of Degree

Document Type

Degree Name

Program

Advisor

Committee Members

Subject Categories

Keywords

Abstract

Recommended Citation

Included in

Browse

Search

Author Corner

Links

CUNY Academic Works

Dissertations, Theses, and Capstone Projects

Propensity Score Analysis and Machine Learning: A Comparison and Application to Post-Secondary Education Data

Author

Date of Degree

Document Type

Degree Name

Program

Advisor

Committee Members

Subject Categories

Keywords

Abstract

Recommended Citation

Included in

Share

Browse

Search

Author Corner

Links