# Preprocessing

For traditional statistical analysis there exist established preprocessing pipelines such as
- [FreeSurfer](https://surfer.nmr.mgh.harvard.edu/)
- [CAT12](https://neuro-jena.github.io/cat/)
- [SPM](https://www.fil.ion.ucl.ac.uk/spm/)

which align Niftis using a bunch of operations (see image) enabling simple regression analysis for specific voxel positions.


<p>
<img src="https://neuro-jena.github.io/cat12-help/images/cat_processing_steps.png" width=600/>
<figcaption>Taken from <a href="https://neuro-jena.github.io/cat12-help/images/cat_processing_steps.png">https://neuro-jena.github.io/cat12-help/images/cat_processing_steps.png</a></figcaption>
</p>

Luckily, Neural Nets are more advanced than simple regression methods and can perform high precision prediciton without rigorous alignment.

Therefore, (and for brevity) we only apply minimal preprocessing.

## 1. Brain Extraction
As we are only interested in the brain of the participants, we first have to set all voxels which are not inside the brain to 0.

Luckily, there exist a neural net based tool called [deepbet](https://github.com/wwu-mmll/deepbet) which enables brain extraction via:

```python
from deepbet import run_bet

input_paths = ['path/to/sub_1/t1.nii.gz', 'path/to/sub_2/t1.nii.gz']
mask_paths = ['path/to/sub_1/mask.nii.gz', 'path/to/sub_2/mask.nii.gz']
brain_paths = ['path/to/sub_1/brain.nii.gz', 'path/to/sub_2/brain.nii.gz']

run_bet(input_paths, brain_paths)
```


________________
### Enable GPU on Colab 🔥

Like most neural net based tools deepbet can be run on GPU and thereby realize a speedup compared to normal CPU execution 💨

Thankfully, Colab allows you to use a GPU for free 🍀

To enable GPU you simply click on
- *Runtime*
- *Change runtime type*
- *T4 GPU*
________________

# Exercise 1

## 🚨 Warning 🚨

This Notebook builds on 1_Introduction and the Exercise of 2_Data_Exploration.

You have to run that Notebook (if you didn't already) and mount your Google Drive to this Notebook via
```python
from google.colab import drive
drive.mount('/content/drive')
```
then you are ready to go!

In [None]:
from google.colab import drive
drive.mount('/content/drive')

Mounted at /content/drive


1. Install `deepbet`

In [None]:
!pip install deepbet

Collecting deepbet
  Downloading deepbet-0.0.2-py3-none-any.whl (10.2 MB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m10.2/10.2 MB[0m [31m27.7 MB/s[0m eta [36m0:00:00[0m
[?25hCollecting connected-components-3d (from deepbet)
  Downloading connected_components_3d-3.12.4-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (3.1 MB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m3.1/3.1 MB[0m [31m99.3 MB/s[0m eta [36m0:00:00[0m
[?25hCollecting fill_voids (from deepbet)
  Downloading fill_voids-2.0.6-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (1.4 MB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m1.4/1.4 MB[0m [31m84.8 MB/s[0m eta [36m0:00:00[0m
Collecting fastremap (from fill_voids->deepbet)
  Downloading fastremap-1.14.1-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (6.0 MB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m6.0/6.0 MB[0m [31m56.6 MB/s[0m eta [36m0:

2. Load the DataFrame you saved at the end of the exercise in 2_Data_Exploration


In [None]:
import pandas as pd
PATH = '/content/drive/MyDrive/openneuro'

df = pd.read_csv(f'{PATH}/dataframe.csv')

3. Create two new columns called `brain_filepath` and `mask_filepath` containing the output filepaths needed for the `run_bet` function

The output filepaths should be in the following pattern
- the `t1w_filepath` "...ds000001/sub-XY/anat/sub-XY_T1w.nii.gz"

results in

- the `brain_filepath`: "...ds000001/derivatives/sub-XY/anat/sub-XY_deepbet-brain_T1w.nii.gz"
- the `mask_filepath`: "...ds000001/derivatives/sub-XY/anat/sub-XY_deepbet-mask_T1w.nii.gz"

In [None]:
df['brain_filepath'] = df.t1w_filepath.str.replace('ds000001', 'ds000001/derivatives')
df['brain_filepath'] = df.brain_filepath.str.replace('T1w', 'deepbet-brain_T1w')
df['mask_filepath'] = df.t1w_filepath.str.replace('ds000001', 'ds000001/derivatives')
df['mask_filepath'] = df.mask_filepath.str.replace('T1w', 'deepbet-mask_T1w')
df

Unnamed: 0,participant_id,sex,age,t1w_filepath,t1w_file_exists,brain_filepath,mask_filepath
0,sub-01,F,26,/content/drive/MyDrive/openneuro/ds000001/sub-...,True,/content/drive/MyDrive/openneuro/ds000001/deri...,/content/drive/MyDrive/openneuro/ds000001/deri...
1,sub-02,M,24,/content/drive/MyDrive/openneuro/ds000001/sub-...,True,/content/drive/MyDrive/openneuro/ds000001/deri...,/content/drive/MyDrive/openneuro/ds000001/deri...
2,sub-03,F,27,/content/drive/MyDrive/openneuro/ds000001/sub-...,True,/content/drive/MyDrive/openneuro/ds000001/deri...,/content/drive/MyDrive/openneuro/ds000001/deri...
3,sub-04,F,20,/content/drive/MyDrive/openneuro/ds000001/sub-...,True,/content/drive/MyDrive/openneuro/ds000001/deri...,/content/drive/MyDrive/openneuro/ds000001/deri...
4,sub-05,"M,",22,/content/drive/MyDrive/openneuro/ds000001/sub-...,True,/content/drive/MyDrive/openneuro/ds000001/deri...,/content/drive/MyDrive/openneuro/ds000001/deri...
5,sub-06,F,26,/content/drive/MyDrive/openneuro/ds000001/sub-...,True,/content/drive/MyDrive/openneuro/ds000001/deri...,/content/drive/MyDrive/openneuro/ds000001/deri...
6,sub-07,M,24,/content/drive/MyDrive/openneuro/ds000001/sub-...,True,/content/drive/MyDrive/openneuro/ds000001/deri...,/content/drive/MyDrive/openneuro/ds000001/deri...
7,sub-08,M,21,/content/drive/MyDrive/openneuro/ds000001/sub-...,True,/content/drive/MyDrive/openneuro/ds000001/deri...,/content/drive/MyDrive/openneuro/ds000001/deri...
8,sub-09,M,26,/content/drive/MyDrive/openneuro/ds000001/sub-...,True,/content/drive/MyDrive/openneuro/ds000001/deri...,/content/drive/MyDrive/openneuro/ds000001/deri...
9,sub-10,F,21,/content/drive/MyDrive/openneuro/ds000001/sub-...,True,/content/drive/MyDrive/openneuro/ds000001/deri...,/content/drive/MyDrive/openneuro/ds000001/deri...


4. Save the DataFrame to "drive/MyDrive/openneuro/dataframe_after_deepbet.csv"

In [None]:
df.to_csv(f'{PATH}/dataframe_after_deepbet.csv', index=False)

5. Create the needed "...ds000001/derivatives/sub-XY/anat/" directories using `Path`

In [None]:
from pathlib import Path

for fpath in df.brain_filepath:
  Path(fpath).parent.mkdir(parents=True, exist_ok=True)

6. Run brain extraction via `run_bet` using the filepath columns from 3.

In [None]:
from deepbet import run_bet

run_bet(df.t1w_filepath, df.brain_filepath, df.mask_filepath)

100%|██████████| 16/16 [00:21<00:00,  1.37s/it]


## 2. Intensity Normalization

Image Normalization is a non-trivial preprocessing step and there is no established best way to apply it ([here](https://github.com/jcreinhold/intensity-normalization) is a detailed resource).

Again for brevity, we will only discuss two typically used normalization techniques which are also used for non-image data.

### 2.1 Min-max Normalization

Min-Max normalization scales the values of an array to a range between 0 and 1

```python
import numpy as np

x = np.array([1, 2, 3, 4, 5])
x_min = x.min()
x_max = x.max()
x_normalized = (x - x_min) / (x_max - x_min)
```

**Task 2.1:** What are Pros and Cons of Min-Max normalization?

Pros
- fixed value range
- no loss of information

Cons

- sensitive to outlier min./max. values

### 2.2 Z-score Normalization

Z-score normalization, also known as standardization, scales the values of an array to have a mean of 0 and a standard deviation of 1

```python
import numpy as np

x = np.array([1, 2, 3, 4, 5])
mu = x.mean()
std = x.std()
x_normalized = (x - mu) / std
```

**Task 2.2:** What are Pros and Cons of Z-score normalization?

Pros

- insensitive to outlier values
- no loss of information

Cons

- value range not fixed


# Exercise 2
1. Load the DataFrame from Exercise 1

In [None]:
import pandas as pd

df = pd.read_csv('drive/MyDrive/openneuro/dataframe_after_deepbet.csv')

2. Add two columns `brain_minmax_filepath` and `brain_zscore_filepath` containing the output filepaths. These should be in the same folder as `df.brain_filepath` and called "sub-XY_deepbet-brain-minmax_T1w.nii.gz" and "sub-XY_deepbet-brain-zscore_T1w.nii.gz", respectively.

In [None]:
df['minmax_filepath'] = df.brain_filepath.str.replace('deepbet-brain', 'deepbet-minmax')
df['zscore_filepath'] = df.brain_filepath.str.replace('deepbet-brain', 'deepbet-zscore')
df

Unnamed: 0,participant_id,sex,age,t1w_filepath,t1w_file_exists,brain_filepath,mask_filepath,minmax_filepath,zscore_filepath
0,sub-01,F,26,/content/drive/MyDrive/openneuro/ds000001/sub-...,True,/content/drive/MyDrive/openneuro/ds000001/deri...,/content/drive/MyDrive/openneuro/ds000001/deri...,/content/drive/MyDrive/openneuro/ds000001/deri...,/content/drive/MyDrive/openneuro/ds000001/deri...
1,sub-02,M,24,/content/drive/MyDrive/openneuro/ds000001/sub-...,True,/content/drive/MyDrive/openneuro/ds000001/deri...,/content/drive/MyDrive/openneuro/ds000001/deri...,/content/drive/MyDrive/openneuro/ds000001/deri...,/content/drive/MyDrive/openneuro/ds000001/deri...
2,sub-03,F,27,/content/drive/MyDrive/openneuro/ds000001/sub-...,True,/content/drive/MyDrive/openneuro/ds000001/deri...,/content/drive/MyDrive/openneuro/ds000001/deri...,/content/drive/MyDrive/openneuro/ds000001/deri...,/content/drive/MyDrive/openneuro/ds000001/deri...
3,sub-04,F,20,/content/drive/MyDrive/openneuro/ds000001/sub-...,True,/content/drive/MyDrive/openneuro/ds000001/deri...,/content/drive/MyDrive/openneuro/ds000001/deri...,/content/drive/MyDrive/openneuro/ds000001/deri...,/content/drive/MyDrive/openneuro/ds000001/deri...
4,sub-05,"M,",22,/content/drive/MyDrive/openneuro/ds000001/sub-...,True,/content/drive/MyDrive/openneuro/ds000001/deri...,/content/drive/MyDrive/openneuro/ds000001/deri...,/content/drive/MyDrive/openneuro/ds000001/deri...,/content/drive/MyDrive/openneuro/ds000001/deri...
5,sub-06,F,26,/content/drive/MyDrive/openneuro/ds000001/sub-...,True,/content/drive/MyDrive/openneuro/ds000001/deri...,/content/drive/MyDrive/openneuro/ds000001/deri...,/content/drive/MyDrive/openneuro/ds000001/deri...,/content/drive/MyDrive/openneuro/ds000001/deri...
6,sub-07,M,24,/content/drive/MyDrive/openneuro/ds000001/sub-...,True,/content/drive/MyDrive/openneuro/ds000001/deri...,/content/drive/MyDrive/openneuro/ds000001/deri...,/content/drive/MyDrive/openneuro/ds000001/deri...,/content/drive/MyDrive/openneuro/ds000001/deri...
7,sub-08,M,21,/content/drive/MyDrive/openneuro/ds000001/sub-...,True,/content/drive/MyDrive/openneuro/ds000001/deri...,/content/drive/MyDrive/openneuro/ds000001/deri...,/content/drive/MyDrive/openneuro/ds000001/deri...,/content/drive/MyDrive/openneuro/ds000001/deri...
8,sub-09,M,26,/content/drive/MyDrive/openneuro/ds000001/sub-...,True,/content/drive/MyDrive/openneuro/ds000001/deri...,/content/drive/MyDrive/openneuro/ds000001/deri...,/content/drive/MyDrive/openneuro/ds000001/deri...,/content/drive/MyDrive/openneuro/ds000001/deri...
9,sub-10,F,21,/content/drive/MyDrive/openneuro/ds000001/sub-...,True,/content/drive/MyDrive/openneuro/ds000001/deri...,/content/drive/MyDrive/openneuro/ds000001/deri...,/content/drive/MyDrive/openneuro/ds000001/deri...,/content/drive/MyDrive/openneuro/ds000001/deri...


3. Save the DataFrame to "drive/MyDrive/openneuro/dataframe_after_preprocessing.csv"

In [None]:
df.to_csv('drive/MyDrive/openneuro/dataframe_after_preprocessing.csv', index=False)

4. For each Nifti in the `brain_filepath` column apply Min-max and Z-score normalization and save it to the corresponding `brain_minmax_filepath` / `brain_zscore_filepath` output *path*.

In [None]:
def min_max_normalize(x):
  x_min = x.min()
  x_max = x.max()
  return (x - x_min) / (x_max - x_min)

def z_score_normalize(x):
  mu = x.mean()
  std = x.std()
  return (x - mu) / std

In [None]:
import nibabel as nib

for in_fp, mm_fp, zs_fp in zip(df.brain_filepath, df.minmax_filepath, df.zscore_filepath):
  img = nib.load(in_fp)
  x = img.get_fdata()
  x_minmax = min_max_normalize(x)
  x_zscore = z_score_normalize(x)
  img_minmax = nib.Nifti1Image(x_minmax, img.affine, img.header)
  img_zscore = nib.Nifti1Image(x_zscore, img.affine, img.header)
  img_minmax.to_filename(mm_fp)
  img_zscore.to_filename(zs_fp)