Date of Award
Spring 5-2-2025
Document Type
Thesis
Degree Name
Master of Arts (MA)
Department
Computer Science
First Advisor
Dr. Anita Raja
Second Advisor
Dr. Ioannis Stamos
Third Advisor
Dr. Subash Shankar
Academic Program Adviser
Dr. Subash Shankar
Abstract
Missing data is pervasive in healthcare, where incomplete observations commonly arise from patient dropout, sensor failures, or privacy constraints. This research presents an investigation into handling such data, focusing on (1) Missingness-Aware Dynamic Ensemble Weighting (MDEW), (2) feature selection under varying missing rates, (3) autoencoder-based imputation (ODAE), and (4) a meta-feature analysis guiding pipeline selection. We evaluate our experiments on four diverse datasets, Cleveland Heart Disease, Diabetic Retinopathy, Breast Cancer Wisconsin, EEG Eye State. Our research shows that MDEW adaptively selects imputer classifier pipelines, outperforming single model and uniform averaging baselines at moderate to high missingness 10% to 50%. Filter based feature selection often enhances performance at moderate rates, while the baseline can prevail under extreme data loss. ODAE offers promise in reconstructing data under severe missingness, though its benefits vary with dataset scale and missingness mecha- nism. Finally, meta features illuminate why particular pipelines excel, underscoring the path toward automatic, dataset aware recommendations. Overall, these findings promote flexible, context specific strategies to robustly handle missing data in complex healthcare settings.
Recommended Citation
Dominguez Sulca, Dylan, "Dynamic Approaches to Missing Data in Healthcare: Evaluating Ensemble Models, Feature Selection, and Meta-Features" (2025). CUNY Academic Works.
https://academicworks.cuny.edu/hc_sas_etds/1299
