<font size="8">**Final Report**

<font size="5">**Introduction**

**What is Diabetes?**

Diabetes is a metabolic condition where the body is unable to regulate blood sugar levels effectively (American Diabetes Association, 2013). It is a common disease, with 38.5% of men and 32.8% of women in the US at risk of the condition as reported in the year 2000 (Gray et al., 2015). There exists 2 types of diabetes: Type I and Type II. Type I diabetes affects around 5-10% of those with diabetes and is when the body doesn’t *produce* insulin (a blood sugar regulating hormone) and is therefore unable to regulate blood sugar levels. Type II diabetes is when the body either doesn’t produce *enough* insulin or doesn’t use it effectively and this type accounts for around 90-95% of those diagnosed with diabetes (American Diabetes Association, 2013).

**Diagnosing Diabetes**

The standard for diabetes diagnosis is dictated by ones' blood test results that show Hemoglobin A1c (a component of blood) levels ≥ 6.5 (American Diabetes Association (2013), Patel et al., 2023). Interestingly, a study conducted on factors associated with diabetes strongly suggest that Body Mass Index (BMI) is associated with diabetes. The results suggested that those with even moderately higher BMI's are associated with an increased risk of developing diabetes (Gray et al., 2015., Patel et al., 2023). Thus, for this project, we aim to answer the question: Can we predict a patient's diabetes diagnosis based on their blood glucose level (mg/dL) and BMI (*kg/m2)?
    
**Dataset & Question**

The dataset we will be using for this project contains demographic and laboratory variables on African-American patients including height, weight, gender, age, Hemoglobin A1c level, blood pressure etc. The dataset was initially compiled by Mohamadreza Momeni to use for machine learning models in diabetes diagnosis.

**Biases in diabetes literature review**

The motivation for using this dataset is to encourage equity in medical research by using data from a racially diverse sample. A 2023 study on the diagnosis of diabetes has found that current literature on the diagnosis of diabetes is biased as a large number of diabetes diagnosis models are based on data collected largely from non-hispanic Whites. This implicates a dangerous overdiagnosis of diabetes among non-hispanic Whites *and* an underdiagnosis of diabetes among non-hispanic Blacks (Cronjé et al., 2023). Thus, we have chosen to conduct our project using this dataset as it consists of African-American participants with the aim of avoiding biases in diagnostic models and equity in healthcare by contributing diverse data within the diabetes literature. 
    
<font size="5">**References**

American Diabetes Association. (2013). Diagnosis and Classification of Diabetes Mellitus. Diabetes Care, 37(1), S81–S90. https://doi.org/10.2337/dc14-S081

Cronjé, Héléne T., Katsiferis, Aleandros, Elsenburg, Leonie K., Andersen, Theo O., Rod, Naja H. Varga, Tibor V. (2023). Assessing racial bias in type 2 diabetes risk prediction algorithms. PLOS Glob Public Health. 2023; 3(5), e0001556. https://doi: 10.1371/journal.pgph.0001556 

Gray, Natallia., Picone, Gabriel., Sloan, Frank., Yashkin, Arseniy. (2015). The Relationship between BMI and Onset of Diabetes Mellitus and its Complications. National Library of Medicine, 108(1), 29-36. https://doi: 10.14423/SMJ.0000000000000214

Momeni, Mohamadreza. (2023). Diabetes. Version 1 . Retrieved Oct 24, 2023 from https://www.kaggle.com/datasets/imtkaggleteam/diabetes 

Patel, B. J., Mehta, D. N., Vaghani, A., & Patel, K. (2023). Correlation of Body Mass Index (BMI) with Saliva and Blood Glucose Levels in Diabetic and Non-Diabetic Patients. Journal of pharmacy & bioallied sciences, 15(Suppl 2), S1204–S1207. https://doi.org/10.4103/jpbs.jpbs_159_23


<font size="5">**Methods**
    
-- Description of methods -- write this after we've done all the code --

Please run the following cell to load the library packages necessary

In [1]:
library(rvest)
library(tidyverse)
library(tidymodels)
install.packages("themis")
library(themis)
set.seed(0102)

── [1mAttaching packages[22m ─────────────────────────────────────── tidyverse 1.3.2 ──
[32m✔[39m [34mggplot2[39m 3.4.2     [32m✔[39m [34mpurrr  [39m 1.0.1
[32m✔[39m [34mtibble [39m 3.2.1     [32m✔[39m [34mdplyr  [39m 1.1.1
[32m✔[39m [34mtidyr  [39m 1.3.0     [32m✔[39m [34mstringr[39m 1.5.0
[32m✔[39m [34mreadr  [39m 2.1.3     [32m✔[39m [34mforcats[39m 0.5.2
── [1mConflicts[22m ────────────────────────────────────────── tidyverse_conflicts() ──
[31m✖[39m [34mdplyr[39m::[32mfilter()[39m         masks [34mstats[39m::filter()
[31m✖[39m [34mreadr[39m::[32mguess_encoding()[39m masks [34mrvest[39m::guess_encoding()
[31m✖[39m [34mdplyr[39m::[32mlag()[39m            masks [34mstats[39m::lag()
── [1mAttaching packages[22m ────────────────────────────────────── tidymodels 1.0.0 ──

[32m✔[39m [34mbroom       [39m 1.0.2     [32m✔[39m [34mrsample     [39m 1.1.1
[32m✔[39m [34mdials       [39m 1.1.0     [32m✔[39m [34mtune    

In [2]:
#1: Loading Data from URL

URL <- 'https://raw.githubusercontent.com/wmma2/group_18_project/main/diabetes.csv'
diabetes_data <- read_csv(URL)

diabetes_data

[1mRows: [22m[34m403[39m [1mColumns: [22m[34m19[39m
[36m──[39m [1mColumn specification[22m [36m────────────────────────────────────────────────────────[39m
[1mDelimiter:[22m ","
[31mchr[39m  (3): location, gender, frame
[32mdbl[39m (16): id, chol, stab.glu, hdl, ratio, glyhb, age, height, weight, bp.1s,...

[36mℹ[39m Use `spec()` to retrieve the full column specification for this data.
[36mℹ[39m Specify the column types or set `show_col_types = FALSE` to quiet this message.


id,chol,stab.glu,hdl,ratio,glyhb,location,age,gender,height,weight,frame,bp.1s,bp.1d,bp.2s,bp.2d,waist,hip,time.ppn
<dbl>,<dbl>,<dbl>,<dbl>,<dbl>,<dbl>,<chr>,<dbl>,<chr>,<dbl>,<dbl>,<chr>,<dbl>,<dbl>,<dbl>,<dbl>,<dbl>,<dbl>,<dbl>
1000,203,82,56,3.6,4.31,Buckingham,46,female,62,121,medium,118,59,,,29,38,720
1001,165,97,24,6.9,4.44,Buckingham,29,female,64,218,large,112,68,,,46,48,360
1002,228,92,37,6.2,4.64,Buckingham,58,female,61,256,large,190,92,185,92,49,57,180
1003,78,93,12,6.5,4.63,Buckingham,67,male,67,119,large,110,50,,,33,38,480
1005,249,90,28,8.9,7.72,Buckingham,64,male,68,183,medium,138,80,,,44,41,300
1008,248,94,69,3.6,4.81,Buckingham,34,male,71,190,large,132,86,,,36,42,195
1011,195,92,41,4.8,4.84,Buckingham,30,male,69,191,medium,161,112,161,112,46,49,720
1015,227,75,44,5.2,3.94,Buckingham,37,male,59,170,medium,,,,,34,39,1020
1016,177,87,49,3.6,4.84,Buckingham,45,male,69,166,large,160,80,128,86,34,40,300
1022,263,89,40,6.6,5.78,Buckingham,55,female,63,202,small,108,72,,,45,50,240
