## Introduction
Finally you are going to train our tumor segmentation network. <br />

## Imports
**Task: Import the necessary libraries** <br />
Hint: Make sure that you copy Dataset and model to separate python files so you can import them

## Dataset Creation
**Task: Create the train and val dataset and the augmentation pipeline. Use Affine augmentations with**:
1. 15% translation,
2. scaling between 0.85 and 1.15
3. rotations from -45 to 45°.

Additionally use **ElasticTransformation**

In [None]:
train_dataset = #Todo

## Oversampling to tackle strong class imbalance
Lung tumors are often very small, thus we need to make sure that our model does not learn a trivial solution which simply outputs 0 for all voxels.<br />
In this notebook we will use oversampling to sample slices which contain a tumor more often.

To do so we can use the **WeightedRandomSampler** provided by pytorch which needs a weight for each sample in the dataset.
Typically you have one weight for each class, which means that we need to calculate two weights, one for slices without tumors and one for slices with a tumor and create list that assigns each sample from the dataset the corresponding weight

To do so, we at first need to create a list containing only the class labels:

In [None]:
target_list = []
for _, label in tqdm(train_dataset):
    # Check if mask contains a tumorous pixel:
    if np.any(label):
        target_list.append(1)
    else:
        target_list.append(0)

Then we need to calculate the weight for each class:
To do so, we can simply compute the fraction between the classes and then create the weight list

In [None]:
uniques = np.unique(target_list, return_counts=True)
uniques

In [None]:
fraction = uniques[1][0] / uniques[1][1]
fraction

Subsequently we assign the weight 1 to each slice without a tumor and ~ 9 to each slice with a tumor

In [None]:
weight_list = []
for target in target_list:
    if target == 0:
        weight_list.append(1)
    else:
        weight_list.append(fraction)
weight_list[:50]

Finally we create the sampler which we can pass to the DataLoader.
**Important:** Only use a sampler for the train loader! We don't want to change the validation data to get a real validation

In [None]:
sampler = torch.utils.data.sampler.WeightedRandomSampler(weight_list, len(weight_list))                     

**Task: Create the train and val_loaders. Set batch size and num workers according to your hardware. Use the sampler for the train_loader**

We can verify that our sampler works by taking a batch from the train loader and count how many labels are larger than zero

In [None]:
verify_sampler = next(iter(train_loader))  # Take one batch
(verify_sampler[1][:,0]).sum([1, 2]) > 0   # ~ half the batch size 

## Loss

As this is a harder task to train you might try different loss functions:
We achieved best results by using the Binary Cross Entropy instead of the Dice Loss

## Full Segmentation Model

**Task: Create the pytorch lightning model. Use Binary Cross Entropy as loss function and the Adam optimizer with a learning rate of 1e-4**

**Task: Instanciate the model, create a checkpoint callback and define the trainer.<br /> Train the model for 30 epochs and use a TensorboardLogger to log your training process**

## Evaluation
**Task: Load the latest checkpoint and evaluate the results by computing the prediction for the complete validation dataset and then compute the dice score for it**

## Visualization
**Task: Compute a prediction for a patient and visualize the prediction.**<br />

Congratulations! You just built a lung cancer segmentation model!