# 

In [None]:
install.packages("tidymodels")

Installing package into ‘/usr/local/lib/R/site-library’
(as ‘lib’ is unspecified)



In [None]:
library(tidyverse)
library(repr)
library(tidymodels)

── [1mAttaching packages[22m ─────────────────────────────────────── tidyverse 1.3.1 ──

[32m✔[39m [34mggplot2[39m 3.4.2     [32m✔[39m [34mpurrr  [39m 1.0.1
[32m✔[39m [34mtibble [39m 3.2.1     [32m✔[39m [34mdplyr  [39m 1.1.2
[32m✔[39m [34mtidyr  [39m 1.3.0     [32m✔[39m [34mstringr[39m 1.5.0
[32m✔[39m [34mreadr  [39m 2.1.4     [32m✔[39m [34mforcats[39m 1.0.0

── [1mConflicts[22m ────────────────────────────────────────── tidyverse_conflicts() ──
[31m✖[39m [34mdplyr[39m::[32mfilter()[39m masks [34mstats[39m::filter()
[31m✖[39m [34mdplyr[39m::[32mlag()[39m    masks [34mstats[39m::lag()

── [1mAttaching packages[22m ────────────────────────────────────── tidymodels 1.1.0 ──

[32m✔[39m [34mbroom       [39m 1.0.4     [32m✔[39m [34mrsample     [39m 1.1.1
[32m✔[39m [34mdials       [39m 1.2.0     [32m✔[39m [34mtune        [39m 1.1.1
[32m✔[39m [34minfer       [39m 1.0.4     [32m✔[39m [34mworkflows   [39m 1.1.3
[32m✔

##Introduction

Heart disease is typically diagnosed through an angiogram which is an xray that looks at the heart, major arteries and blood vessels. Even a simple routine angiogram ranges from 675 CAD - 2200 CAD what if there was an easier and less costly way to diagnose heart disease?

In this project we look at attributes derived from a simple routine checkup  like blood pressure and cholestrol levels along with age to diagnose heart disease. 

The dataset used look at 76 different attributes, 14 of which were used by researchers by data analysis vary from identifiers like age or sex to max heart rate to fasting blood sugar levels. Additionally they used datasets from 4 different locations: Cleveland, Hungary, Switzerland, and the VA Long Beach. In our analysis we combined the cleveland and Hungary datasets. 


##Preliminary exploratory data analysis 

---



In [None]:
heart_data <- read_delim('Data/processed.hungarian.data', col_names = FALSE) |> as_tibble()
#reads the raw data and converts it into a tibble with no column names
colnames(heart_data) <- c('age', 'sex', 'chest_pain', 'trestbps', 'chol', 'fbs', 'restecg', 'max_hr', 
              'exang', 'oldpeak', 'slope', 'ca', 'thal', 'num') 
#assigns names to the columns based on information from the website
heart_data <- mutate(heart_data, sex = as.factor(sex), chest_pain = as.factor(chest_pain), num = as.factor(num), 
    fbs = as.factor(fbs), restecg = as.factor(restecg), exang = as.factor(exang), thal = as.factor(thal), 
    slope = as.factor(slope), ca = as.factor(ca)) |>
#converts all categorical attributes into factors
filter(chol != '?', trestbps != '?', max_hr != '?', fbs != '?') |>
#removes missing values from the cholesterol, resting blood pressure, fasting blood sugar and maximum 
#heart rate columns
mutate(chol = as.double(chol), trestbps = as.double(trestbps), max_hr = as.double(max_hr))
#converts all numerical attributes into doubles
heart_data

[1mRows: [22m[34m294[39m [1mColumns: [22m[34m14[39m
[36m──[39m [1mColumn specification[22m [36m────────────────────────────────────────────────────────[39m
[1mDelimiter:[22m ","
[31mchr[39m (9): X4, X5, X6, X7, X8, X9, X11, X12, X13
[32mdbl[39m (5): X1, X2, X3, X10, X14

[36mℹ[39m Use `spec()` to retrieve the full column specification for this data.
[36mℹ[39m Specify the column types or set `show_col_types = FALSE` to quiet this message.


age,sex,chest_pain,trestbps,chol,fbs,restecg,max_hr,exang,oldpeak,slope,ca,thal,num
<dbl>,<fct>,<fct>,<dbl>,<dbl>,<fct>,<fct>,<dbl>,<fct>,<dbl>,<fct>,<fct>,<fct>,<fct>
28,1,2,130,132,0,2,185,0,0,?,?,?,0
29,1,2,120,243,0,0,160,0,0,?,?,?,0
30,0,1,170,237,0,1,170,0,0,?,?,6,0
31,0,2,100,219,0,1,150,0,0,?,?,?,0
32,0,2,105,198,0,0,165,0,0,?,?,?,0
32,1,2,110,225,0,0,184,0,0,?,?,?,0
32,1,2,125,254,0,0,155,0,0,?,?,?,0
33,1,3,120,298,0,0,185,0,0,?,?,?,0
34,0,2,130,161,0,0,190,0,0,?,?,?,0
34,1,2,150,214,0,1,168,0,0,?,?,?,0


In [None]:
heart_select <- select(heart_data, chol, fbs, num)
summary(heart_select)
heart_plot <- ggplot(heart_select, aes(x = chol, y = fbs, shape = num)) + 
  geom_point() +

ERROR: ignored

In [None]:
split <- initial_split(heart_select, prop = 0.75, strata = num) # prop is % of data we want to use as training data  
train <- training(split)   
test <- testing(split)


##Methods

The variables of importance for our analysis were blood pressure and cholesterol levels along with age, as cheaper ways to diagnosis of heart disease. 

We will use scatterplots to show the correlations between blood pressure (on the x axis) and age (on the y axis) and the colour indicated heart disease diagnosis. 

##Expected Outcomes and Significance

Based on some preliminary research we expect to find some relationship between heart disease and blood pressure (high blood pressure indicates an increase risk for heart disease diagnosis), cholesterol and blood pressure are also linked, thus indicating that cholesterol levels could be related to heart disease diagnosis. Age has been linked to all three (higher age, increased risk for high blood pressure, high cholesterol and heart disease). 

The impact is clear, if we could diagnose heart disease used other indicators aside from an angiogram, that means less costly procedure for physicians and patients, that means increased access to heart disease diagnosis

Future questions may relate to rethinking current standard diagnosis practices in medicine and exploring easier, and less expensive options. 