Date of Award

Spring 5-2-2025

Document Type

Thesis

Degree Name

Master of Arts (MA)

Department

Computer Science

First Advisor

Dr. Anita Raja

Second Advisor

Dr. Ioannis Stamos

Third Advisor

Dr. Subash Shankar

Academic Program Adviser

Dr. Subash Shankar

Abstract

Missing data is pervasive in healthcare, where incomplete observations commonly arise from patient dropout, sensor failures, or privacy constraints. This research presents an investigation into handling such data, focusing on (1) Missingness-Aware Dynamic Ensemble Weighting (MDEW), (2) feature selection under varying missing rates, (3) autoencoder-based imputation (ODAE), and (4) a meta-feature analysis guiding pipeline selection. We evaluate our experiments on four diverse datasets, Cleveland Heart Disease, Diabetic Retinopathy, Breast Cancer Wisconsin, EEG Eye State. Our research shows that MDEW adaptively selects imputer classifier pipelines, outperforming single model and uniform averaging baselines at moderate to high missingness 10% to 50%. Filter based feature selection often enhances performance at moderate rates, while the baseline can prevail under extreme data loss. ODAE offers promise in reconstructing data under severe missingness, though its benefits vary with dataset scale and missingness mecha- nism. Finally, meta features illuminate why particular pipelines excel, underscoring the path toward automatic, dataset aware recommendations. Overall, these findings promote flexible, context specific strategies to robustly handle missing data in complex healthcare settings.

Share

COinS
 
 

To view the content in your browser, please download Adobe Reader or, alternately,
you may Download the file to your hard drive.

NOTE: The latest versions of Adobe Reader do not support viewing PDF files within Firefox on Mac OS and if you are using a modern (Intel) Mac, there is no official plugin for viewing PDF files within the browser window.