Dissertations, Theses, and Capstone Projects
Date of Degree
6-2022
Document Type
Capstone Project
Degree Name
M.S.
Program
Data Analysis & Visualization
Advisor
Howard T. Everson
Subject Categories
Other Computer Engineering
Keywords
Supervised machine learning
Abstract
Type II diabetes is a disease that affects how the body regulates and uses sugar (glucose) as a fuel. This chronic disease results in too much sugar circulating in the bloodstream. High blood sugar levels can lead to circulatory, nervous, and immune systems disorders. Machine learning (ML) techniques have proven their strength in diabetes diagnosis. In this paper, we aimed to contribute to the literature on the use of ML methods by examining the value of a number of supervised machine learning algorithms such as logistic regression, decision tree classifiers, random forest classifiers, and support vector classifiers to identify factors and indicators (such as pregnancy, blood pressure, etc.) that may lead to more accurate predictions and classifications of Type II diabetes in women. By identifying these indicators, women will be able to take the necessary actions to prevent the onset of Type II diabetes. To apply these ML techniques,the Pima Indian Women Diabetes dataset was downloaded from the Kaggle website. Different experiments were conducted on the dataset. Each machine learning algorithm was trained on unscaled data using a balanced and unbalanced dataset and again using scaled data with a balanced and unbalanced dataset. Consequently, sixteen models were generated to evaluate the different ML classifiers' performance and select the best model. The results of these analyses are presented, and model-based findings are contrasted.
Recommended Citation
Benarbia, Meriem, "A Machine Learning Approach to Predicting the Onset of Type II Diabetes in a Sample of Pima Indian Women" (2022). CUNY Academic Works.
https://academicworks.cuny.edu/gc_etds/4895
GitHub repository containing the Pima-Indians-diabetes Dataset and Jupyter notebook