# Heart Disease Graph

Author: Baoye Liu

Course Project, UC Irvine, Math 10, Summer 2023

## Introduction

Introduce your project here.  Maybe 3 sentences.

This dataset comprises several medical predictor variables and a target variable, which indicates the presence of heart disease in the patient. The dataset can be used to develop a model that predicts the probability of a
heart aƩack based on various parameters.


## Whole Code

In [1]:
import pandas as pd
import altair as alt
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LinearRegression
from sklearn.ensemble import RandomForestRegressor
from sklearn.metrics import mean_squared_error

heart_data = pd.read_csv('heart.csv')

columns_to_keep = [
    'age', 'sex', 'cp', 'trestbps', 'chol', 'fbs', 'restecg', 'thalach', 'exang', 'oldpeak', 'slope', 'ca', 'thal'
]
heart_data_filtered = heart_data[columns_to_keep]

even_data = heart_data_filtered.iloc[::2]
odd_data = heart_data_filtered.iloc[1::2]
heart_data_filtered = pd.concat([even_data, odd_data], axis=0).sort_index()

chart1 = alt.Chart(heart_data_filtered).mark_bar().encode(
    alt.X("thalach:Q", bin=True),
    y='count()',
).properties(title="Distribution of Maximum Heart Rate Achieved")

chart2 = alt.Chart(heart_data_filtered).mark_bar().encode(
    alt.X("fbs:O"),
    y='count()',
).properties(title="Distribution of Fasting Blood Sugar > 120 mg/dl")

X = heart_data_filtered.drop('thalach', axis=1)
y = heart_data_filtered['thalach']
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=42)

model = LinearRegression()
model.fit(X_train, y_train)
y_pred = model.predict(X_test)

mse = mean_squared_error(y_test, y_pred)
print(f"Mean Squared Error for 'maximum heart rate achieved' using Linear Regression: {mse:.2f}")

predictions_df = pd.DataFrame({'Actual': y_test, 'Predicted': y_pred})
chart3 = alt.Chart(predictions_df).mark_point().encode(
    x='Actual',
    y='Predicted'
).properties(title="Linear Regression: Actual vs Predicted Maximum Heart Rate Achieved")

rf_model = RandomForestRegressor(n_estimators=100, random_state=42)
rf_model.fit(X_train, y_train)
rf_y_pred = rf_model.predict(X_test)

rf_mse = mean_squared_error(y_test, rf_y_pred)
print(f"Mean Squared Error for 'maximum heart rate achieved' using Random Forest: {rf_mse:.2f}")

rf_predictions_df = pd.DataFrame({'Actual': y_test, 'Predicted by RF': rf_y_pred})
chart4 = alt.Chart(rf_predictions_df).mark_point().encode(
    x='Actual',
    y='Predicted by RF'
).properties(title="Random Forest: Actual vs Predicted Maximum Heart Rate Achieved")

chart1 & chart2 | (chart3 & chart4)

Mean Squared Error for 'maximum heart rate achieved' using Linear Regression: 342.67
Mean Squared Error for 'maximum heart rate achieved' using Random Forest: 59.53


## Summary

Either summarize what you did, or summarize the results.  Maybe 3 sentences.

Analysis of how different factors like Maximum Heart Rate Achieved and fasting blood sugar.
Visual representaƟon of the data using plots and charts to uncover patterns. Creaton of a
machine learning model to predict the likelihood of heart disease based on given parameters. 

## References

Your code above should include references.  Here is some additional space for references.

* What is the source of your dataset(s)?

Lapp, David. “Heart Disease Dataset.” Kaggle, 6 June 2019, www.kaggle.com/datasets/johnsmith88/heart-disease-dataset?resource=download. 

* List any other references that you found helpful.

Gurav, Suraj. “Python Pandas Tricks: 3 Best Methods to Join Datasets.” Medium, Towards Data Science,
    17 May 2022, towardsdatascience.com/python-pandas-tricks-3-best-methods4a909843f5bc#a00b. 

“Sklearn.Ensemble.Randomforestregressor.” Scikit, scikit-learn.org/stable/modules/generated/sklearn.ensemble.RandomForestRegressor.html. Accessed 13 Sept. 2023. 

<a style='text-decoration:none;line-height:16px;display:flex;color:#5B5B62;padding:10px;justify-content:end;' href='https://deepnote.com?utm_source=created-in-deepnote-cell&projectId=4dec4af1-a5de-4ffa-a53a-1f225a934813' target="_blank">
 </img>
Created in <span style='font-weight:600;margin-left:4px;'>Deepnote</span></a>