Balancing Inference and Prediction in Institutional Research: A Practical Comparison of Logistic Regression With Machine Learning Techniques in Modeling Student Persistence
Date of Degree
In higher education research, causal inference has traditionally been the focus over predictive power, with statistical models designed to understand and explain the relationships between variables. In the field of institutional research in particular, there is a growing need to not only understand these causal relationships, but also predict what is likely to occur in the future (Which students are most likely to succeed at our college, and who should we admit? Which students are we most likely to lose to attrition, and how can we engage them? Which students are most likely to struggle academically, and what interventions can we provide?). While many institutional researchers are adept in statistical analysis, machine learning methods—widely touted as being more nimble and powerful at making predictions—are still relatively untapped in the field.
Using a standard institutional research dataset from a large public urban university system, this study compared the efficacy of conventional logistic regression with several machine learning classification techniques on predicting secondary educational outcomes. The analysis found that theory-based logistic regression performed similarly overall to the machine learning methods, though the types of predictions made by each model varied. A discussion about the practical use of these methods for institutional researchers follows.
Weingarten, Alison, "Balancing Inference and Prediction in Institutional Research: A Practical Comparison of Logistic Regression With Machine Learning Techniques in Modeling Student Persistence" (2023). CUNY Academic Works.