# Preprocessing pipeline tutorial

## Outline

<img src="static/preprocessing_pipeline_diagram.svg">

1. __Temporal filtering__

High-frequency artefacts and slow drifts are removed with a zero-phase bandpass filter 
using mne-Python [1]. 

2. __Segmenting the data__

Epochs are non-overlapping data segments created from the continuous data with a 
given duration.
Epochs can be created from (1) events; there is a custom method that created epochs 
based on annotations in the raw data, (2) without events, data segments are created 
from the beginning of the raw data. 

3. __Outlier data rejection__  

- _Preliminar rejection_

Epochs are rejected based on a global threshold on the z-score (> 3) of the epoch 
variance and amplitude range.

- _ICA decomposition_  

The default method is the infomax algorithm, however it can be changed in the 
configuration file along with the number of components and the decimation parameter. 
Components containing blink artefacts are automatically marked with mne-Python.
The ICA sourced can be visualized and interactively selected and rejected based on 
their topographies, time-courses or frequency spectra.

- _Autoreject_  

Autoreject [2, 3] uses unsupervised learning to estimate the rejection threshold for 
the epochs. In order to reduce computation time that increases with the number of 
segments and channels, autoreject can be fitted on a representative subset of epochs 
(25% of total epochs). Once the parameters are learned, the solution can be applied to 
any data that contains channels that were used during fit.

4. __Outlier channel interpolation__

The Random Sample Consensus (RANSAC) algorithm [4] selects a random subsample of good 
channels to make predictions of each channel in small non-overlapping 4 seconds long 
time windows. It uses a method of spherical splines (Perrin et al., 1989) to 
interpolate the bad sensors.


#### References

[1] A. Gramfort, M. Luessi, E. Larson, D. Engemann, D. Strohmeier, C. Brodbeck, R. Goj, M. Jas, T. Brooks, L. Parkkonen, M. Hämäläinen, MEG and EEG data analysis with MNE-Python, Frontiers in Neuroscience, Volume 7, 2013, ISSN 1662-453X

[2] Mainak Jas, Denis Engemann, Federico Raimondo, Yousra Bekhti, and Alexandre Gramfort, “Automated rejection and repair of bad trials in MEG/EEG.” In 6th International Workshop on Pattern Recognition in Neuroimaging (PRNI), 2016.

[3] Mainak Jas, Denis Engemann, Yousra Bekhti, Federico Raimondo, and Alexandre Gramfort. 2017. “Autoreject: Automated artifact rejection for MEG and EEG data”. NeuroImage, 159, 417-429.

[4] Bigdely-Shamlo, N., Mullen, T., Kothe, C., Su, K. M., & Robbins, K. A. (2015). The PREP pipeline: standardized preprocessing for large-scale EEG analysis. Frontiers in neuroinformatics, 9, 16.



## Import packages


```%matplotlib qt``` is the recommended backend for interactive visualization (can be slower);    

switch to ```%matplotlib inline``` for faster but static plots

In [19]:
import os
from pathlib import Path
from ipyfilechooser import FileChooser

import pandas as pd
from meeg_tools.preprocessing import *
from meeg_tools.utils.epochs import create_epochs_from_events
from meeg_tools.utils.raw import read_raw_measurement, filter_raw
from meeg_tools.utils.log import update_log

%matplotlib qt

## Load raw data


See [this](https://mne.tools/stable/auto_tutorials/io/20_reading_eeg_data.html) documentation for help with supported file formats.  


In [8]:
# Use the widget to navigate to the experiment folder path and select an EEG file 
base_path = 'data/'
fc = FileChooser(base_path)
fc.filter_pattern = ['*.vhdr', '*.edf']

display(fc)

FileChooser(path='/Users/weian/Library/Mobile Documents/com~apple~CloudDocs/crnl/eeg-workshop/data', filename=…

In [10]:
# Load selected file
read_raw_measurement?

## Select condition

The current logic for saving the preprocessed files is to create subfolders inside `base_path`,
with the name "preprocessed" and the name of the condition (e.g. "epochs_asrt", "epochs_rs").

In [11]:
condition = 'epochs_asrt'


# Create folder for preprocessed and interim files
folder_name = 'preprocessed'
epochs_path = os.path.join(base_path, folder_name, condition)


# Create path to epoch files
if not os.path.exists(epochs_path):
    os.makedirs(epochs_path)
    
print(epochs_path)

data/preprocessed/epochs_asrt


## Temporal filtering

We apply a bandpass filter on the continuous data using the `filter_raw` function.

The default parameters can be checked with `settings['bandpass_filter']`

In [14]:
settings['bandpass_filter']

{'low_freq': 0.5, 'high_freq': 45}

In [12]:
filter_raw?

## Create epochs

### Create epochs for event-related analysis

We create epochs from __selected__ events (stimuli) in the data.

Epochs are created with respect to the stimulus onset defined by `start_time` and 
`end_time` within `settings['epochs']`.

In [16]:
settings['epochs']

{'start_time': 0.0, 'end_time': 1.0, 'duration': 1}

In [22]:
settings['epochs']['start_time'] = -0.250
settings['epochs']['end_time'] = 0.750

In [None]:
from mne import events_from_annotations
events, _ = events_from_annotations(raw)

In [None]:
events_ids = np.concatenate([np.arange(10, 53, 1), 
                             np.arange(10, 53, 1) + 100,
                            [211, 212, 213, 214, 215, 216]]) # boundaries of epochs

In [None]:
create_epochs_from_events?

## Create metadata for epochs (optional)

- adding metadata makes it easier to select epochs of different types
- custom triggers are selected from the raw instance

- metadata can be added or replaced later (e.g. after preprocessing)

In [24]:
create_metadata?

# We have to assign it to the epochs instance to take effect
# epochs.metadata = metadata

Object `create_metadata` not found.


In [None]:
# subselecting epochs 
# Here we could also include thrills, repetitions, or practice stimuli.
# ICA should not run on duplicate data (epochs should not be overlapping!)

# epochs = epochs["triplet == 'L' | triplet == 'H'"]
# epochs = epochs["answer == 'correct'"]