Dissertations, Theses, and Capstone Projects

Date of Degree

2-2024

Document Type

Capstone Project

Degree Name

M.S.

Program

Data Analysis & Visualization

Advisor

Howard T. Everson

Subject Categories

Cardiovascular Diseases | Data Science

Keywords

Heart Disease Analysis, Supervised Machine Learning, Unsupervised Machine learning

Abstract

Heart disease, a leading cause of mortality worldwide, presents complex challenges in public health due to its varied manifestations. Accurate diagnosis and patient stratification are essential for effective management and improved outcomes. In response, this study employed machine learning techniques to analyze heart disease data obtained from UCI Machine Learning Repository, aiming to enhance patient care through advanced data analysis.

The study began with the application of K-Nearest Neighbors (KNN) classification, which categorized patients into 'Disease' and 'No Disease' groups. This preliminary step provided initial insights into the structure of the dataset. Subsequently, K-means clustering was applied in two rounds, maintaining the same k-value but varying the initialization parameter in the subsequent round. The approach evaluated the clustering's consistency and significance by testing the impact of different initializations on the patient groupings.

Principal Component Analysis (PCA) was utilized to visualize the clusters formed by K-means. The dataset was dimensionally reduced to its first two principal components (PC1 and PC2), creating a two-dimensional representation that captures the majority of the data’s variance. This visualization assisted in interpreting the clustering results, offering a clear view of the patient groups in a reduced-dimensional space.

Complementing K-means, hierarchical clustering was performed in two rounds, each using different linkage criteria, to understand the data's hierarchical structure and natural groupings.

By integrating KNN, K-means with PCA visualization, and hierarchical clustering, the study presented a comprehensive analysis of heart disease patient data. The conclusions drawn provide valuable insights into patient categorization, which are pivotal for targeted treatments, and contribute to the ongoing efforts to mitigate the global impact of heart disease.

HeartDiseaseAnalysis-GithubRepository.zip (13178 kB)
Export of GitHub repo at time of deposit

Share

COinS