Date of Degree

6-2022

Document Type

Capstone Project

Degree Name

M.S.

Program

Data Analysis & Visualization

Advisor

Howard T. Everson

Subject Categories

Other Computer Engineering

Keywords

Supervised machine learning

Abstract

Type II diabetes is a disease that affects how the body regulates and uses sugar (glucose) as a fuel. This chronic disease results in too much sugar circulating in the bloodstream. High blood sugar levels can lead to circulatory, nervous, and immune systems disorders. Machine learning (ML) techniques have proven their strength in diabetes diagnosis. In this paper, we aimed to contribute to the literature on the use of ML methods by examining the value of a number of supervised machine learning algorithms such as logistic regression, decision tree classifiers, random forest classifiers, and support vector classifiers to identify factors and indicators (such as pregnancy, blood pressure, etc.) that may lead to more accurate predictions and classifications of Type II diabetes in women. By identifying these indicators, women will be able to take the necessary actions to prevent the onset of Type II diabetes. To apply these ML techniques,the Pima Indian Women Diabetes dataset was downloaded from the Kaggle website. Different experiments were conducted on the dataset. Each machine learning algorithm was trained on unscaled data using a balanced and unbalanced dataset and again using scaled data with a balanced and unbalanced dataset. Consequently, sixteen models were generated to evaluate the different ML classifiers' performance and select the best model. The results of these analyses are presented, and model-based findings are contrasted.

Data-Analysis-Capstone-Spring-2022-main.zip (1010 kB)
GitHub repository containing the Pima-Indians-diabetes Dataset and Jupyter notebook

Share

COinS