## Predicting Heart Disease Using Machine Learning

We attempt to create a machine learning model that detects heart disease based on the medical records of patients.  
This notebook uses various Python libraries for data science and machine learning.

## Approach

1. Problem definition  
2. Data  
3. Evaluation  
4. Features  
5. Modeling  
6. Experimentation

## Problem definition

Given certain clinical records of a patient, is it possible to detect the presence of heart disease?  
The machine learning problem is **supervised learning / binary classification**.

## Data

The data we use is the Cleveland Heart Disease Dataset, which is publicly available:  
[UCI Machine Learning Repository](https://archive.ics.uci.edu/dataset/45/heart+disease)  
[Kaggle](https://www.kaggle.com/datasets/redwankarimsony/heart-disease-data)

## Evaluation

We are trying to achieve 95% accuracy with the proof of concept model to pursue the project further.

## Features

**Data Dictionary (information about each data feature)**  
* id (unique id for each patient)  
* age (age of patient in years)  
* origin (place of study)  
* sex (1=male, 0=female)  
* cp (chest pain type: 1=typical angina, 2=atypical angina, 3=non-anginal, 4=asymptomatic)  
* trestbps (resting blood pressure in mmHg on admission to hospital)  
* chol (serum cholesterol in mg/dl)  
* fbs (whether fasting blood sugar > 120 mg/dl, 1=true, 0=false)  
* restecg (resting electrocardiographic results: 0=normal, 1=ST/T abnormality, 2=left ventricular hypertrophy)  
* thalach (maximum heart rate achieved)  
* exang (exercise-induced angina: 1=yes, 0=no)  
* oldpeak (ST segment depression induced by exercise relative to resting)  
* slope (slope of the peak exercise ST segment: 1=upsloping, 2=flat, 3=downsloping)  
* ca (number of major vessels (0-3) colored by fluoroscopy)  
* thal (3=normal, 6=fixed defect, 7=reversible defect)  
* num (the predicted attribute: 0=no heart disease, 1=heart disease)

## Preparing the Tools

Python libraries Numpy, Pandas, and Matplotlib are used for data analysis and manipulation.  
Python library Scikit-Learn is used for machine learning.

In [2]:
### importing exploratory data analysis (EDA) tools
import numpy, pandas, seaborn
from matplotlib import pyplot

### rendering plots inside this notebook
%matplotlib inline

### importing sklearn model selection tools
from sklearn.model_selection import train_test_split, cross_val_score, RandomizedSearchCV, GridSearchCV

### importing sklearn machine learning algorithms
from sklearn.linear_model import LogisticRegression
from sklearn.neighbors import KNeighborsClassifier
from sklearn.ensemble import RandomForestClassifier

### importing sklearn model evaluation tools
from sklearn.metrics import confusion_matrix, classification_report
from sklearn.metrics import precision_score, recall_score, f1_score
from sklearn.metrics import RocCurveDisplay