Dissertations, Theses, and Capstone Projects

Date of Degree

2-2025

Document Type

Capstone Project

Degree Name

M.S.

Program

Data Analysis & Visualization

Advisor

Michelle McSweeney

Subject Categories

Adult and Continuing Education | Artificial Intelligence and Robotics | Categorical Data Analysis | Curriculum and Social Inquiry | Digital Humanities | Educational Assessment, Evaluation, and Research | Higher Education and Teaching | Inequality and Stratification | Language and Literacy Education | Language Interpretation and Translation | Other Social and Behavioral Sciences | Quantitative, Qualitative, Comparative, and Historical Methodologies | Race and Ethnicity

Keywords

Automated essay scoring, Human-AI comparative analysis, Bias in AI, Educational Equity, Language Proficiency

Abstract

This study evaluates the capabilities and limitations of large language models (LLMs), specifically OpenAI’s ChatGPT-4o, in grading essays from students in the City University of New York’s Language Immersion Program. The program serves English language learners with diverse linguistic and demographic backgrounds, offering intensive language instruction to prepare students for academic success in college. Using a dataset of 30 pre- and post-program essays scored by program instructors and ChatGPT-4o under three paradigms, this research explores the alignment between human and AI-generated scores across five rubric-based competency areas. Findings reveal that ChatGPT-4o aligns moderately with human grading, with the strongest agreement in essays’ critical response and organization, but with significant discrepancies in areas such as word choice and grammar, where ChatGPT-4o frequently assigns lower scores. Though preliminary and directional only, further demographic analysis highlights how Black-identifying students consistently receive lower scores compared to other groups, suggesting the presence of algorithmic biases that can perpetuate educational inequities. AI has unquestionably been powerful as a supplement to human efforts in education assessment, but its limitations in interpreting nuance and its impact on equity raise critical concerns. The paper argues that AI tools should be viewed as complementary aids rather than as replacements, and contributes to the growing discourse on the role of LLMs in educational settings.

beninbar-Capstone-main.zip (1638 kB)
Archived GitHub repo files

Share

COinS