Date of Award

Spring 5-2-2025

Document Type

Thesis

Degree Name

Master of Arts (MA)

Department

Computer Science

First Advisor

Dr. Anita Raja

Second Advisor

Dr. Ioannis Stamos

Third Advisor

Dr. Subash Shankar

Academic Program Adviser

Dr. Subash Shankar

Abstract

Missing data is pervasive in healthcare, where incomplete observations commonly arise from patient dropout, sensor failures, or privacy constraints. This research presents an investigation into handling such data, focusing on (1) Missingness-Aware Dynamic Ensemble Weighting (MDEW), (2) feature selection under varying missing rates, (3) autoencoder-based imputation (ODAE), and (4) a meta-feature analysis guiding pipeline selection. We evaluate our experiments on four diverse datasets, Cleveland Heart Disease, Diabetic Retinopathy, Breast Cancer Wisconsin, EEG Eye State. Our research shows that MDEW adaptively selects imputer classifier pipelines, outperforming single model and uniform averaging baselines at moderate to high missingness 10% to 50%. Filter based feature selection often enhances performance at moderate rates, while the baseline can prevail under extreme data loss. ODAE offers promise in reconstructing data under severe missingness, though its benefits vary with dataset scale and missingness mecha- nism. Finally, meta features illuminate why particular pipelines excel, underscoring the path toward automatic, dataset aware recommendations. Overall, these findings promote flexible, context specific strategies to robustly handle missing data in complex healthcare settings.

Available for download on Monday, March 23, 2026

Share

COinS