# Heart Attack Prediction

The purpose of this notebook is to train a Logistic Regression model for predicting the chances of a heart attack. The dataset used to train the model can be found at https://www.kaggle.com/datasets/rashikrahmanpritom/heart-attack-analysis-prediction-dataset?select=heart.csv.

### Import libraries

In this section, we import all the libraries needed for preprocessing of dataset and training and deployment of the model.

In [60]:
import sagemaker
from sagemaker.sklearn.model import SKLearnModel
from sagemaker import get_execution_role
import numpy as np

### Preprocessing of data

In this section, we extract data from the file "heart.csv" into a numpy array and split it into training and testing data. We also extract features and labels of both training and testing dataset. 

In [49]:
rawdata = np.genfromtxt("heart.csv", delimiter=',', skip_header=1)
train = rawdata[:int(len(rawdata) * 0.8)]
test = rawdata[int(len(rawdata) * 0.8):]
Xtr = train[:, :-1]
Ytr = train[:, -1]
Xts = test[:, :-1]
Yts = test[:, -1]

### Training

In this section, we setup the Logistic Regression model of SageMaker in 'ml.m5.xlarge' instance, prepare the data in "RecordIO Protobuf" format that is expected by SageMaker models and train the model with it.

In [None]:
linear = sagemaker.LinearLearner(role = get_execution_role(),
                                 instance_count = 1,
                                 instance_type = 'ml.m5.xlarge',
                                 predictor_type='regressor',
                                 sagemaker_session=sagemaker.Session())
train_data_records = linear.record_set(Xtr.astype(np.float32), labels=Ytr.astype(np.float32), channel='train')
linear.fit(train_data_records)

### Endpoint

In this section, we create an endpoint on 'ml.t2.medium' instance for hosting the model for making predictions.

In [None]:
predictor = linear.deploy(initial_instance_count=1, instance_type='ml.t2.medium')