# Homework Template

This is a template homework for CS498 Deep Learning for Healthcare.

In [1]:
import os
import random
import numpy as np
import pandas as pd
import torch
import torch.nn as nn
import torch.nn.functional as F

# set seed
seed = 7
random.seed(seed)
np.random.seed(seed)
torch.manual_seed(seed)
os.environ["PYTHONHASHSEED"] = str(seed)

## 2 ECG Data Classification [? points]

In this section, you will implement an advanced CNN+RNN model with attention mechanism to classify ECG recordings. Specifically, we face a binary classification problem, and the goal is to distinguish atrial fibrillation(AF), an alternative rhythm, from the normal sinus rhythm. We will be using a fraction of the data in the public [Physionet 2017 Challenge](https://physionet.org/content/challenge-2017/1.0.0/). More details can be found in the link.

ECG recordings were sampled at 300Hz, and for the purpose of this task, the data we use is separated into 10-second-segments. 

### 2.1 Load the Data

Because the preprocessing of the data requires a tremendous amount of memory and time, for the sake of this homework, the data has already been preprocessed. 

Specifically, for each raw data (an ECG recording sampled at 300Hz), we did the following:
1. split the dataset into training/validation/test sets with a ratio of [placeholder]
2. for each recording, we normalize the data to have a mean of 0 and a standard deviation of 1
3. slide and cut the recording into overlapping 10-second-segments (stride =500 for class 0, 50 for class 1 to oversample).
4. use FIR bandpass filter to transform the data from 1 channel to 4 channels.

Due to the resource constraints, the data and knowledge features have already been computed. You will need to write a dataloader for the training and test dataset.

In [2]:
data_path = r'G:\MINA\data\challenge2017\100_cached_data_permuted7'
train_dict = pd.read_pickle(os.path.join(data_path, 'train.pkl'))
test_dict = pd.read_pickle(os.path.join(data_path, 'test.pkl'))

print(f"There are {len(train_dict['Y'])} training data, {len(test_dict['Y'])} test data")
print(f"Shape of X: {train_dict['X'][:, 0,:].shape} = (#channels, n)")
print(f"Shape of beat feature: {train_dict['K_beat'][:, 0, :].shape} = (#channels, n)")
print(f"Shape of rhythm feature: {train_dict['K_rhythm'][:, 0, :].shape} = (#channels, M)")
print(f"Shape of frequency feature: {train_dict['K_freq'][:, 0, :].shape} = (#channels, 1)")

There are 1696 training data, 425 test data
Shape of X: (4, 3000) = (#channels, n)
Shape of beat feature: (4, 3000) = (#channels, n)
Shape of rhythm feature: (4, 60) = (#channels, M)
Shape of frequency feature: (4, 1) = (#channels, 1)


In [3]:
from torch.utils.data import Dataset

class ECGDataset(Dataset):

    def __init__(self, data_dict):
        """
        TODO: init the Dataset instance.
        """
        ### BEGIN SOLUTION
        self.X, self.Y, self.K_beat, self.K_thythm, self.K_freq = data_dict['X'], data_dict['Y'], data_dict['K_beat'], data_dict['K_rhythm'], data_dict['K_freq']
        ### END SOLUTION

    def __len__(self):
        """
        TODO: Denotes the total number of samples
        """

        ### BEGIN SOLUTION
        return len(self.Y)
        ### END SOLUTION

    def __getitem__(self, i):
        """
        TODO: Generates one sample of data
            return the ((X, K_beat, K_rhythm, K_freq), Y) for the i-th data.
            Be careful about which dimension you are indexing.
        """

        ### BEGIN SOLUTION
        return (self.X[:, i, :], self.K_beat[:, i, :], self.K_thythm[:, i, :], self.K_freq[:, i, :]), self.Y[i]
        ### END SOLUTION

In [4]:
from torch.utils.data import DataLoader
def load_data(dataset, batch_size=128):
    """
    Return a DataLoader instance basing on a Dataset instance, with batch_size specified.
    Note that since the data has already been shuffled, we set shuffle=False
    """
    def my_collate(batch):
        """
        :param batch: this is essentially [dataset[i] for i in [...]]
        batch[i] should be ((Xi, Ki_beat, Ki_rhythm, Ki_freq), Yi)
        TODO: write a collate function such that it outputs ((X, K_beat, K_rhythm, K_freq), Y)
            each output variable is a batched version of what's in the input *batch*
            For each output variable - it should be either float tensor or long tensor (for Y). If applicable, channel dim precedes batch dim
            e.g. the shape of each Xi is (# channels, n). In the output, X should be of shape (batch_size, # channels, n)
        """
        ### BEGIN SOLUTION
        X = torch.tensor([[_x[0][0][c] for _x in batch] for c in range(4)], dtype=torch.float)
        K_beat = torch.tensor([[_x[0][1][c] for _x in batch] for c in range(4)], dtype=torch.float)
        K_rhythm = torch.tensor([[_x[0][2][c] for _x in batch] for c in range(4)], dtype=torch.float)
        K_freq = torch.tensor([[_x[0][3][c] for _x in batch] for c in range(4)], dtype=torch.float)
        Y = torch.tensor([_x[1] for _x in batch], dtype=torch.long)
        ### END SOLUTION
        return (X, K_beat, K_rhythm, K_freq), Y

    return torch.utils.data.DataLoader(dataset, batch_size=batch_size, shuffle=False, collate_fn=my_collate)


train_loader = load_data(ECGDataset(train_dict))
test_loader = load_data(ECGDataset(test_dict))

## 3 Model Defintions [? points]

Now, let us implement a model that involves RNN, CNN and attention mechanism. More specifically, we will implement [MINA: Multilevel Knowledge-Guided Attention for Modeling Electrocardiography Signals](https://www.ijcai.org/Proceedings/2019/0816.pdf).

### 3.1 Knowledge-guided attention 
[Placeholder for explanation] 

In [5]:
import torch.nn.functional as F

class KnowledgeAttn(nn.Module):
    def __init__(self, input_features, attn_dim):
        """
        This is the general knowledge-guided attention module.
        It will transform the input and knowledge with 2 linear layers, computes attention, and then aggregate.
        :param input_features: the number of features for each
        :param attn_dim: the number of hidden nodes in the attention mechanism
        TODO:
            define the following 2 linear layers WITHOUT bias (with the names provided)
                att_W: a Linear layer of shape (input_features + n_knowledge, attn_dim)
                att_v: a Linear layer of shape (attn_dim, 1)
            init the weights using self.init() (already given)
        """
        super(KnowledgeAttn, self).__init__()
        self.input_features = input_features
        self.attn_dim = attn_dim
        self.n_knowledge = 1

        ### BEGIN SOLUTION
        self.att_W = nn.Linear(self.input_features + self.n_knowledge, self.attn_dim, bias=False)
        self.att_v = nn.Linear(self.attn_dim, 1, bias=False)
        ### END SOLUTION

        self.init()

    def init(self):
        nn.init.normal_(self.att_W.weight)
        nn.init.normal_(self.att_v.weight)

    @classmethod
    def attention_sum(cls, x, attn):
        """

        :param x: of shape (-1, D, nfeatures)
        :param attn: of shape (-1, D, 1)
        TODO: return the weighted sum of x along the middle axis with weights even in attn. output shoule be (-1, nfeatures)
        """
        ### BEGIN SOLUTION
        return torch.sum(torch.mul(attn, x), 1)
        ### END SOLUTION


    def forward(self, x, k):
        """
        :param x: shape of (-1, D, input_features)
        :param k: shape of (-1, D, 1)
        :return:
            out: shape of (-1, input_features), the aggregated x
            attn: shape of (-1, D, 1)
        TODO:
            concatenate the input x and knowledge k together (on the last dimension)
            pass the concatenated output through the learnable Linear transforms
                first att_W, then tanh, then att_v
                the output shape should be (-1, D, 1)
            to get attention values, apply softmax on the output of linear layer
                You could use F.softmax(). Be careful which dimension you apply softmax over
            aggregate x using the attention values via self.attention_sum, and return
        """
        ### BEGIN SOLUTION
        tmp = torch.cat([x, k], dim=-1)
        e = self.att_v(torch.tanh(self.att_W(tmp)))
        attn = F.softmax(e, 1)
        out = self.attention_sum(x, attn)
        ### END SOLUTION
        return out, attn

In [6]:
'''
AUTOGRADER CELL. DO NOT MODIFY THIS.
'''

def float_tensor_equal(a, b, eps=1e-3):
    return torch.norm(a-b).abs().max().tolist() < eps

def testKnowledgeAttn():
    m = KnowledgeAttn(2, 2)
    m.att_W.weight.data = torch.tensor([[0.3298,  0.7045, -0.1067],
                                        [0.9656,  0.3090,  1.2627]], requires_grad=True)
    m.att_v.weight.data = torch.tensor([[-0.2368,  0.5824]], requires_grad=True)

    x = torch.tensor([[[-0.6898, -0.9098], [0.0230,  0.2879], [-0.2534, -0.3190]],
                      [[ 0.5412, -0.3434], [0.0289, -0.2837], [-0.4120, -0.7858]]])
    k = torch.tensor([[ 0.5469,  0.3948, -1.1430], [0.7815, -1.4787, -0.2929]]).unsqueeze(2)
    out, attn = m(x, k)

    tout = torch.tensor([[-0.2817, -0.2531], [0.2144, -0.4387]])
    tattn = torch.tensor([[[0.3482], [0.4475], [0.2043]],
                          [[0.5696], [0.1894], [0.2410]]])
    assert float_tensor_equal(attn, tattn), "The attention values are wrong"
    assert float_tensor_equal(out, tout), "output of the attention module is wrong"
testKnowledgeAttn()


MINA has three different knowledge guided attention mechanisms:
 - Beat Level $K_\alpha$: extract beat knowledge which is represented by the first-order difference and a convolutional operation $Conv_\alpha$ for each segment
 - Rhythm Level $K_\beta$: extract rhythm features represented by the standard deviation on each segment
 - Frequency Level $K_\gamma$: frequency features are represented by the power spectral density (PSD), which is a popular measure of energy in signal processing.
 
As a result, let's define three Modules that handle Beat/Rhythm/Frequency level information separately. Note that although the input has 4 channels, we actually need to handle each channel separately because they have different meanings after we did the FIR. Thus, we will need 4 BeatNet, 4 RhythmNet, and 1 FreqNet. The pipeline is the following:
    BeatNet_i -> RhythmNet_i -> FreqNet
 

In [7]:
class BeatNet(nn.Module):
    #Attention for the CNN step/ beat level/local information
    def __init__(self, n=3000, T=50,
                 conv_out_channels=64):
        """
        :param n: size of each 10-second-data
        :param T: size of each smaller segment used to capture local information in the CNN stage
        :param conv_out_channels: also called number of filters/kernels
        TODO: We will define a network that does two things. Specifically:
            1. use one 1-D convolutional layer to capture local informatoin, on x and k_beat (see forward())
                conv: The kernel size should be set to 32, and the number of filters should be set to *conv_out_channels*. Stride should be *conv_stride*
                conv_k: same as conv, except that it has only 1 filter instead of *conv_out_channels*
            2. an attention mechanism to aggregate the convolution outputs. Specifically:
                attn: KnowledgeAttn with input_features equaling conv_out_channels, and attn_dim=att_cnn_dim
        """
        super(BeatNet, self).__init__()
        self.n, self.M, self.T = n, int(n/T), T
        self.conv_out_channels = conv_out_channels
        self.conv_kernel_size = 32
        self.conv_stride = 2
        ### BEGIN SOLUTION
        self.conv = nn.Conv1d(in_channels=1,
                              out_channels=self.conv_out_channels,
                              kernel_size=self.conv_kernel_size,
                              stride=self.conv_stride)

        self.conv_k = nn.Conv1d(in_channels=1,
                                out_channels=1,
                                kernel_size=self.conv_kernel_size,
                                stride=self.conv_stride)
        ### END SOLUTION

        self.att_cnn_dim = 8
        ### BEGIN SOLUTION
        self.attn = KnowledgeAttn(self.conv_out_channels, self.att_cnn_dim)
        ### END SOLUTION

    def forward(self, x, k_beat):
        """
        :param x: shape (batch, n)
        :param k_beat: shape (batch, n)
        :return:
            out: shape (batch * M, T)
            alpha: shape (batch * M, N, 1) where N is a result of convolution
        TODO:
            reshape the data - convert x/k_beat of shape (batch, n) to (batch * M, 1, T), where n = MT
            apply convolution on x and k_beat
                pass the reshaped x through self.conv, and then ReLU
                pass the reshaped k_beat through self.conv_k, and then ReLU
            (at this step, you might need to swap axes to align the dimensions depending on how you defined the layers)
            pass the conv'd x and conv'd knowledge through attn to get the output (*out*) and alpha
        """
        ### BEGIN SOLUTION
        x = x.view(-1, self.T).unsqueeze(1)
        k_beat = k_beat.view(-1, self.T).unsqueeze(1)

        x = F.relu(self.conv(x))  # Here number of filters K=64
        k_beat = F.relu(self.conv_k(k_beat))  # Conv1d(1, 1, kernel_size=(32,), stride=(2,)) => k_beat:[128*60,1,10].

        x = x.permute(0, 2, 1)  # x:[128*60,10,64]
        k_beat = k_beat.permute(0, 2, 1)
        out, alpha = self.attn(x, k_beat)
        ### END SOLUTION
        return out, alpha

In [8]:
class RhythmNet(nn.Module):
    def __init__(self, n=3000, T=50, input_size=64, rhythm_out_size=8):
        """
        :param n: size of each 10-second-data
        :param T: size of each smaller segment used to capture local information in the CNN stage
        :param input_size: This is the same as the # of filters/kernels in the CNN part.
        :param rhythm_out_size: output size of this netowrk
        TODO: We will define a network that does two things to handle rhythms. Specifically:
            1. use a bi-directional LSTM to process the learned local representations from the CNN part
                lstm: bidirectional, 1 layer, batch_first, and hidden_size should be set to *rnn_hidden_size*
            2. an attention mechanism to aggregate the convolution outputs. Specifically:
                attn: KnowledgeAttn with input_features equaling lstm output, and attn_dim=att_rnn_dim
            3. output layers
                fc: a Linear layer making the output of shape (..., self.out_size)
                do: a Dropout layer with p=0.5
        """
        #input_size is the cnn_out_channels
        super(RhythmNet, self).__init__()
        self.n, self.M, self.T = n, int(n/T), T
        self.input_size = input_size

        ### LSTM Input: (batch size, M, input_size)
        self.rnn_hidden_size = 32
        ### BEGIN SOLUTION
        self.lstm = nn.LSTM(input_size=self.input_size, #self.conv_out_channels,
                            hidden_size=self.rnn_hidden_size,
                            num_layers=1, batch_first=True, bidirectional=True)
        ### END SOLUTION

        ### Attention mechanism
        self.att_rnn_dim = 8
        ### BEGIN SOLUTION
        self.attn = KnowledgeAttn(2 * self.rnn_hidden_size, self.att_rnn_dim)
        ### END SOLUTION

        ### Dropout and fully connecte layers
        self.out_size = rhythm_out_size
        ### BEGIN SOLUTION
        self.fc = nn.Linear(2 * self.rnn_hidden_size, self.out_size)
        self.do = nn.Dropout(p=0.5)
        ### END SOLUTION



    def forward(self, x, k_rhythm):
        """
        :param x: shape (batch * M, self.input_size=T)
        :param k_rhythm: shape (batch, M)
        :return:
            out: shape (batch, self.out_size)
            beta: shape (batch, M, 1)
        TODO:
            reshape the data - convert x to of shape (batch, M, self.input_size), k_rhythm->(batch, M, 1)
            pass the reshaped x through lstm
            pass the lstm output and knowledge through attn
            pass the result through fully connected layer - ReLU - Dropout
            denote the final output as *out*
        """

        ### BEGIN SOLUTION
        ### reshape for rnn
        self.batch_size = int(x.size()[0] / self.M)
        x = x.view(self.batch_size, self.M, -1)
        ### rnn
        k_rhythm = k_rhythm.unsqueeze(-1)  # [128, 60, 1]
        o, (ht, ct) = self.lstm(x)  # o:[batch,60,64] (in the paper this is called h

        x, beta = self.attn(o, k_rhythm)
        ### fc and Dropout
        x = F.relu(self.fc(x))  # [128, 64->8]
        out = self.do(x)
        ### END SOLUTION
        return out, beta

In [9]:
class FreqNet(nn.Module):
    def __init__(self, n_channels=4, n=3000, T=50):
        """
        :param n_channels: number of channels (F in the paper). We will need to define this many BeatNet & RhythmNet nets.
        :param n: size of each 10-second-data
        :param T: size of each smaller segment used to capture local information in the CNN stage
        TODO: This is the main network that orchestrates the previously defined attention modules:
            1. define n_channels many BeatNet and RhythmNet modules. (Hint: use nn.ModuleList)
                beat_nets: for each beat_net, pass parameter conv_out_channel into the init()
                rhythm_nets: for each rhythm_net, pass conv_out_channel as input_size, and self.rhythm_out_size as the output size
            2. define frequency (channel) level knowledge-guided attention module
                attn: KnowledgeAttn with input_features equaling rhythm_out_size, and attn_dim=att_channel_dim
            3. output layer: a Linear layer for 2 classes output
        """
        super(FreqNet, self).__init__()
        self.n, self.M, self.T = n, int(n / T), T
        self.n_class = 2
        self.n_channels = n_channels
        self.conv_out_channels=64
        self.rhythm_out_size=8

        ### BEGIN SOLUTION
        self.beat_nets = nn.ModuleList()
        self.rhythm_nets = nn.ModuleList()
        for channel_i in range(self.n_channels):
            self.beat_nets.append(BeatNet(self.n, self.T, self.conv_out_channels))
            self.rhythm_nets.append(RhythmNet(self.n, self.T, self.conv_out_channels, self.rhythm_out_size))
        ### END SOLUTION

        ### frequency attention
        self.att_channel_dim = 2
        ### BEGIN SOLUTION
        self.attn = KnowledgeAttn(self.rhythm_out_size, self.att_channel_dim)
        ### END SOLUTION

        ### fully-connected output layer
        ### BEGIN SOLUTION
        self.fc = nn.Linear(self.rhythm_out_size, self.n_class)
        ### END SOLUTION


    def forward(self, x, k_beats, k_rhythms, k_freq):
        """
        We need to use the attention submodules to process data from each channel separately, and then pass the
            output through an attention on frequency for the final output

        :param x: shape (n_channels, batch, n)
        :param k_beats: (n_channels, batch, n)
        :param k_rhythms: (n_channels, batch, M)
        :param k_freq: (n_channels, batch, 1)
        :return:
            out: softmax output for each data point, shpae (batch, n_class)
        TODO:
            1. pass each channel of x through the corresponding beat_net, then rhythm_net.
                We will discard the attention (alpha and beta) outputs for now
            2. stack the output from 1 together into a tensor of shape (batch, n_channels, rhythm_out_size)
            3. pass result from 2 and k_freq through attention module, to get the aggregated result and gama
            4. pass aggregated result from 3 through the final fully connected layer.
            5. Apply Softmax to normalize output to a probability distribution (over 2 classes)
        """
        ### BEGIN SOLUTION
        new_x = [None for _ in range(self.n_channels)]
        att_dic = {}
        for i in range(self.n_channels):
            tx, att_dic['alpha_%d'%i] = self.beat_nets[i](x[i], k_beats[i])
            new_x[i], att_dic['beta_%d'%i] = self.rhythm_nets[i](tx, k_rhythms[i])
        x = torch.stack(new_x, 1)  # [128,8] -> [128,4,8]

        # ### attention on channel
        k_freq = k_freq.permute(1, 0, 2) #[4,128,1] -> [128,4,1]
        x, gama = self.attn(x, k_freq)

        ### fc
        out = F.softmax(self.fc(x), 1)

        ### return
        att_dic['gama'] = gama
        ### END SOLUTION
        return out, gama

## Training and Evaluation
In this part we will define the training procedures, train the model, and evaluate the model on the test set.

In [10]:
def train_model(model, train_dataloader, n_epoch=5, lr=0.003, device=None):
    import torch.optim as optim
    """
    :param model: The instance of FreqNet that we are training
    :param train_dataloader: the DataLoader of the training data
    :param n_epoch: number of epochs to train
    :return:
        model: trained model
        loss_history: recorded training loss history - should be just a list of float
    TODO:
        Specify the optimizer to be optim.Adam
        Specify the loss function to be CrossEntropyLoss
        Hint: to use dataloader, you can do:
            for (X, K_beat, K_rhythm, K_freq), Y in train_dataloader:
                ....

    """
    device = device or torch.device('cpu')
    model.train()

    loss_history = []

    ### BEGIN SOLUTION
    from tqdm import tqdm
    optimizer = optim.Adam(model.parameters(), lr=lr)
    loss_func = torch.nn.CrossEntropyLoss()
    for epoch in range(n_epoch):
        curr_epoch_loss = []
        for (X, K_beat, K_rhythm, K_freq), Y in tqdm(train_dataloader, desc='train', ncols=80):
            X, K_beat, K_rhythm, K_freq, Y = X.to(device), K_beat.to(device), K_rhythm.to(device), K_freq.to(device), Y.to(device)
            pred, _ = model(X, K_beat, K_rhythm, K_freq)
            loss = loss_func(pred, Y)
            curr_epoch_loss.append(loss.cpu().data.numpy())
            optimizer.zero_grad()
            loss.backward()
            optimizer.step()
        print(f"epoch{epoch}: curr_epoch_loss={np.mean(curr_epoch_loss)}")
        loss_history += curr_epoch_loss
    ### END SOLUTION
    return model, loss_history

In [11]:
def eval_model(model, dataloader, device=None):
    """
    :return:
        pred_all: prediction of model on the dataloder.
            Should be an 2D numpy float array where the second dimension has length 2.
        Y_test: truth labels. Should be an numpy array of ints
    TODO:
        evaluate the model using on the data in the dataloder.
        Add all the prediction and truth to the corresponding list
        Convert pred_all and Y_test to numpy arrays.
    """
    device = device or torch.device('cpu')
    model.eval()
    pred_all = []
    Y_test = []
    ### BEGIN SOLUTION
    from tqdm import tqdm
    for (X, K_beat, K_rhythm, K_freq), Y in tqdm(dataloader, desc='test', ncols=80):
        X, K_beat, K_rhythm, K_freq, Y = X.to(device), K_beat.to(device), K_rhythm.to(device), K_freq.to(device), Y.to(device)

        pred, _ = model.forward(X, K_beat, K_rhythm, K_freq)

        pred_all.append(pred.cpu().data.numpy())
        Y_test.append(Y.cpu().data.numpy())
    pred_all = np.concatenate(pred_all, axis=0)
    Y_test = np.concatenate(Y_test, axis=0)
    ### END SOLUTION

    return pred_all, Y_test

In [12]:
device = torch.device('cpu')
n_epoch = 5
lr = 0.003
n_channel = 4
n_dim=3000
T=50

model = FreqNet(n_channel, n_dim, T)
model = model.to(device)

model, loss_history = train_model(model, train_loader, n_epoch=n_epoch, lr=lr, device=device)
pred, truth = eval_model(model, test_loader, device=device)

train: 100%|████████████████████████████████████| 14/14 [00:21<00:00,  1.55s/it]
train:   0%|                                             | 0/14 [00:00<?, ?it/s]

epoch0: curr_epoch_loss=0.6898131370544434


train: 100%|████████████████████████████████████| 14/14 [00:20<00:00,  1.49s/it]
train:   0%|                                             | 0/14 [00:00<?, ?it/s]

epoch1: curr_epoch_loss=0.6576957106590271


train: 100%|████████████████████████████████████| 14/14 [00:20<00:00,  1.45s/it]
train:   0%|                                             | 0/14 [00:00<?, ?it/s]

epoch2: curr_epoch_loss=0.5675951242446899


train: 100%|████████████████████████████████████| 14/14 [00:21<00:00,  1.56s/it]
train:   0%|                                             | 0/14 [00:00<?, ?it/s]

epoch3: curr_epoch_loss=0.503008246421814


train: 100%|████████████████████████████████████| 14/14 [00:19<00:00,  1.42s/it]
test:   0%|                                               | 0/4 [00:00<?, ?it/s]

epoch4: curr_epoch_loss=0.47429385781288147


test: 100%|███████████████████████████████████████| 4/4 [00:03<00:00,  1.33it/s]


In [13]:

def evaluate_predictions(truth, pred):
    """
    TODO: Evaluate the performance of the predictoin via AUROC, AUPRC, and F1 score

    each prediction in pred is a vector representing [p_0, p_1].
    When defining the scores we are interesed in detecting class 1 only
    (Hint: use roc_auc_score, average_precision_score, f1_score from sklearn.metrics)
    return: auroc, auprc, f1
    """
    from sklearn.metrics import roc_auc_score, average_precision_score, f1_score

    ### BEGIN SOLUTION
    pred_label = []
    for i in pred:
        pred_label.append(np.argmax(i))
    pred_label = np.array(pred_label)
    auroc = roc_auc_score(truth, pred[:, 1])
    auprc = average_precision_score(truth, pred[:, 1])
    f1 = f1_score(truth, pred_label)
    ### END SOLUTION

    return auroc, auprc, f1

In [14]:
'''
AUTOGRADER CELL. DO NOT MODIFY THIS.
'''
auroc, auprc, f1 = evaluate_predictions(truth, pred)
print(f"AUROC={auroc}, AUPRC={auprc}, F1={f1}")

assert auroc > 0.85 and f1 > 0.8, "Performance is too low. Something's probably off."

AUROC=0.9107982261640798, AUPRC=0.8887289211695684, F1=0.8693586698337292
