Dissertations, Theses, and Capstone Projects
Date of Degree
2-2024
Document Type
Capstone Project
Degree Name
M.S.
Program
Data Analysis & Visualization
Advisor
Howard T. Everson
Subject Categories
Cardiovascular Diseases | Data Science
Keywords
Heart Disease Analysis, Supervised Machine Learning, Unsupervised Machine learning
Abstract
Heart disease, a leading cause of mortality worldwide, presents complex challenges in public health due to its varied manifestations. Accurate diagnosis and patient stratification are essential for effective management and improved outcomes. In response, this study employed machine learning techniques to analyze heart disease data obtained from UCI Machine Learning Repository, aiming to enhance patient care through advanced data analysis.
The study began with the application of K-Nearest Neighbors (KNN) classification, which categorized patients into 'Disease' and 'No Disease' groups. This preliminary step provided initial insights into the structure of the dataset. Subsequently, K-means clustering was applied in two rounds, maintaining the same k-value but varying the initialization parameter in the subsequent round. The approach evaluated the clustering's consistency and significance by testing the impact of different initializations on the patient groupings.
Principal Component Analysis (PCA) was utilized to visualize the clusters formed by K-means. The dataset was dimensionally reduced to its first two principal components (PC1 and PC2), creating a two-dimensional representation that captures the majority of the data’s variance. This visualization assisted in interpreting the clustering results, offering a clear view of the patient groups in a reduced-dimensional space.
Complementing K-means, hierarchical clustering was performed in two rounds, each using different linkage criteria, to understand the data's hierarchical structure and natural groupings.
By integrating KNN, K-means with PCA visualization, and hierarchical clustering, the study presented a comprehensive analysis of heart disease patient data. The conclusions drawn provide valuable insights into patient categorization, which are pivotal for targeted treatments, and contribute to the ongoing efforts to mitigate the global impact of heart disease.
Recommended Citation
Cinar, Mukadder, "Clustering of Patients with Heart Disease" (2024). CUNY Academic Works.
https://academicworks.cuny.edu/gc_etds/5688
Export of GitHub repo at time of deposit