# ML for Bioinformatics
## HW6 - Long short-term memory

---

Name: 

Student No.:

---

# ECG Heartbeat Classification

In this exercise you will implement an LSTM neural network that can classify ambulatory ECG recordings into 5 different classes. 

# Collect Data
We use a preprocessed version of a famous datasets in heartbeat classification, [the MIT-BIH Arrhythmia Dataset](https://physionet.org/content/mitdb/1.0.0/). 
The signals in this dataset correspond to electrocardiogram (ECG) shapes of heartbeats for the normal case and the cases affected by different arrhythmias and myocardial infarction. We will use the Arrhythmia Dataset used in [this paper](https://arxiv.org/abs/1805.00794) which is the preprocessed arrhythmia dataset consisting of signals that are preprocessed and segmented, with each segment corresponding to a heartbeat in the dataset. This dataset is composed of 109446 samples which are classified into 5 categories. You must download the dataset from [here](https://drive.google.com/file/d/1a8IetOZkvnq8D8K6k8EdMsFGeqE2Hv5M/view?usp=sharing). It is also accessible on [kaggle](https://www.kaggle.com/shayanfazeli/heartbeat).

The training and test data is located in mitbih_train.csv and mitbih_test.csv files. Locate these files in dataset folder next to this notebook. If you are running this notebook on colab, the following code does this phase.

In [None]:
from google.colab import drive

drive.mount('/content/drive', force_remount=True)

FOLDERNAME = 'MLB_RNN_Assignment/dataset'

assert FOLDERNAME is not None, "[!] Enter the foldername."

%cd drive/My\ Drive
%cp -r $FOLDERNAME ../../
%cd ../../
%cd dataset
!unzip 29414_37484_bundle_archive.zip
!rm 29414_37484_bundle_archive.zip
%cd ..

In [None]:
import numpy as np
import pandas as pd
import seaborn as sns
import matplotlib.pyplot as plt
from sklearn.metrics import classification_report
from sklearn.model_selection import train_test_split
from sklearn.metrics import f1_score
from sklearn.metrics import confusion_matrix
from sklearn.utils import class_weight
import torch
import torch.nn as nn
import torch.nn.functional as F
import torch.optim as optim
from torch.utils.data import Dataset, DataLoader

# Create Custom Dataset
You need to create your custom dataset so you can define a dataloader to use in the train and test phase. Your class must implement init, getitem and len methods.

In [None]:
class HeartbeatDataset(Dataset):
    
    def __init__(self, csv_file):
        #TODO
        pass

    def __getitem__(self, index):
        #TODO
        pass

    def __len__(self):
        #TODO
        pass

batch_size = 64

train_set = HeartbeatDataset('dataset/mitbih_train.csv')
train_loader = DataLoader(dataset=train_set,
                          batch_size=batch_size,
                          shuffle=True)                          

test_set = HeartbeatDataset('dataset/mitbih_test.csv')
test_loader = DataLoader(dataset=test_set,
                        batch_size=batch_size,
                         shuffle=True)                         

# Create LSTM Classifier
Now you are ready to implement your LSTM neural network which inherits from nn.Module. The structure of the network must be as follows:

  (lstm): LSTM(1, hidden_dim) <br>
  (fully_connected): Linear(in_features=hidden_dim, out_features=middle_dim, bias=True) <br>
  (fully_connected): Linear(in_features=middle_dim, out_features=5, bias=True) <br>
  (softmax): Softmax(dim=1)
)

At each step the value of signal is given as input to the LSTM block. The hidden state and cell state get updated and the final output is computed by two fully connected after the final hidden state with a softmax layer at the end.



In [None]:
class LSTMClassifier(nn.Module):

    def __init__(self, input_dim, hidden_dim, fc_dim, target_dim):
        #TODO
        pass

    def forward(self, x):
        #TODO
        pass

# Define loss function and optimizer
Next you can instance a classifier, loss function and optimizer. You are free to change the loss function, the optimizer and their parameters along with dimensions of the hidden state and the fully connected layer of the classifier.

In [None]:
lstm_classifier = LSTMClassifier(1, 64, 32, 5)
criterion = nn.CrossEntropyLoss()
optimizer = optim.Adam(lstm_classifier.parameters(), lr=0.01, momentum=0.9)
print(lstm_classifier)

# Train the classifier
The following code runs the training phase using what you have build so far. You can edit the code.

In [None]:
epoch_num = 5
print_every = 100

for epoch in range(epoch_num):
    running_loss = 0
    for i, data in enumerate(train_loader):
        inputs, labels = data                  
        optimizer.zero_grad()
        outputs = lstm_classifier(inputs)        
        loss = criterion(outputs, labels)
        loss.backward()
        optimizer.step()
        running_loss += loss.item()
        if (i+1)%print_every == 0:
            print('[%d, %5d] loss: %.3f' % (epoch + 1, i + 1, running_loss / print_every))
            running_loss = 0

# Test the classifier
The following code runs the LSTM classifier on test data and reports the results for each class.

In [None]:
correct = 0
total = 0
class_correct = [0 for i in range(5)]
class_total = [0 for i in range(5)]
with torch.no_grad():
    for data in test_loader:
        inputs, labels = data        
        outputs = lstm_classifier(inputs)
        _, predicted = torch.max(outputs.data, 1)
        c = (predicted  == labels).squeeze()        
        for i in range(labels.size(0)):
            label = labels[i]
            correct += c[i].item()
            total += 1
            class_correct[label] += c[i].item()
            class_total[label] += 1

for i in range(5):
    print('Accuracy of %5s : %2d %%' % (
        classes[i], 100 * class_correct[i] / class_total[i]))
    print(class_correct[i], class_total[i])


print('%d / %d' % (correct, total))
print('Accuracy: %0.2f' % (100 * correct / total))

# Useful Links and Acknowledgements

[An Effective LSTM Recurrent Network to Detect Arrhythmia on Imbalanced ECG Dataset](https://www.hindawi.com/journals/jhe/2019/6320651/)

[Classify ECG Signals Using Long Short-Term Memory Networks
](https://https://www.mathworks.com/help/signal/examples/classify-ecg-signals-using-long-short-term-memory-networks.html)

[Sequence Models and Long-Short Term Memory Networks](https://https://pytorch.org/tutorials/beginner/nlp/sequence_models_tutorial.html)

[LSTMs for Time Series in PyTorch](https://www.jessicayung.com/lstms-for-time-series-in-pytorch/)

[ECG Heartbeat Classification: A Deep Transferable Representation
](https://https://arxiv.org/abs/1805.00794)

[MIT-BIH Arrhythmia Database](https://https://physionet.org/content/mitdb/1.0.0/)