# Preprocessing pipeline


This pipeline aims to serve as a semiautomatic and reproducible framework for preprocessing EEG signals prior to time-frequency-based analyses. It minimizes the manual steps required to clean the data based on visual inspection. It is advised to revisit the cleaned epochs before writing the final preprocessed file. 


## Outline

1. Temporal filtering  
High-frequency artefacts and slow drifts are removed with a zero-phase bandpass filter using mne-Python [1]. The cutoff frequencies (0.5 - 45 Hz) can be modified in the utils folder in the configuration file (config.py). 

2. Create epochs  
Epochs are nonoverlapping data segments created from the continuous data with a duration of 1 seconds. The length of epochs can be changed in the configuration file.
Epochs can be created from (1) events; there is a custom method that created epochs based on annotations in the raw data, (2) without events, data segments are created from the beginning of the raw data. 

3. Outlier data rejection  
3.1. Preliminar rejection  
Epochs are rejected based on a global threshold on the z-score (> 3) of the epoch variance and amplitude range.
3.2. ICA decomposition  
The default method is the infomax algorithm, however it can be changed in the configuration file along with the number of components and the decimation parameter. Components containing blink artefacts are automatically marked with mne-Python.
The ICA sourced can be visualized and interactively selected and rejected based on their topographies, time-courses or frequency spectra. The number of components that were removed from the data are documented in the “description” field of the epochs instance “info” structure.
3.3. Autoreject  
Autoreject [2, 3] uses unsupervised learning to estimate the rejection threshold for the epochs. In order to reduce computation time that increases with the number of segments and channels, autoreject can be fitted on a representative subset of epochs (25% of total epochs). Once the parameters are learned, the solution can be applied to any data that contains channels that were used during fit.
4. Outlier channel interpolation  
The Random Sample Consensus (RANSAC) algorithm [4] selects a random subsample of good channels to make predictions of each channel in small non-overlapping 4 seconds long time windows. It uses a method of spherical splines (Perrin et al., 1989) to interpolate the bad sensors. The sensors that were interpolated are added to the "description" field of the epochs "info" structure. 


## References

[1] A. Gramfort, M. Luessi, E. Larson, D. Engemann, D. Strohmeier, C. Brodbeck, R. Goj, M. Jas, T. Brooks, L. Parkkonen, M. Hämäläinen, MEG and EEG data analysis with MNE-Python, Frontiers in Neuroscience, Volume 7, 2013, ISSN 1662-453X

[2] Mainak Jas, Denis Engemann, Federico Raimondo, Yousra Bekhti, and Alexandre Gramfort, “Automated rejection and repair of bad trials in MEG/EEG.” In 6th International Workshop on Pattern Recognition in Neuroimaging (PRNI), 2016.

[3] Mainak Jas, Denis Engemann, Yousra Bekhti, Federico Raimondo, and Alexandre Gramfort. 2017. “Autoreject: Automated artifact rejection for MEG and EEG data”. NeuroImage, 159, 417-429.

[4] Bigdely-Shamlo, N., Mullen, T., Kothe, C., Su, K. M., & Robbins, K. A. (2015). The PREP pipeline: standardized preprocessing for large-scale EEG analysis. Frontiers in neuroinformatics, 9, 16.



## Import packages


```%matplotlib qt``` is the recommended backend for interactive visualization (can be slower);    

switch to ```%matplotlib inline``` for (faster) static plots

In [None]:
import os
from pathlib import Path
from ipyfilechooser import FileChooser

from eeg_preprocessing.preprocessing import *
from eeg_preprocessing.utils.events import get_events_from_raw, create_epochs_from_events
from eeg_preprocessing.utils.io_raw import read_raw

from matplotlib import pyplot as plt
%matplotlib qt

## Load raw data

EEG data can be imported using the custom read_raw() method that accepts BrainVision (.vhdr) and EDF (.edf) format. However, this custom method can be replaced and MNE functions used to import other file formats.

See [this](https://mne.tools/stable/auto_tutorials/io/20_reading_eeg_data.html) documentation for help with importing data.  

In [6]:
# Set base path to EEG data
base_path = '/Volumes/crnl-memo-hd/TMS_rewiring/Raw_data'

# Use the widget to navigate to the experiment folder path and select an EEG file 
fc = FileChooser(base_path)
fc.filter_pattern = ['*.vhdr', '*.edf']

display(fc)

FileChooser(path='/Volumes/crnl-memo-hd/TMS_rewiring/Raw_data', filename='', title='HTML(value='', layout=Layo…

In [7]:
# Load selected file
raw = read_raw(raw_file_path=fc.selected, add_info=True)
print(raw.info)

Extracting parameters from /Volumes/crnl-memo-hd/TMS_rewiring/Raw_data/15_L/Day1/EEG/15_L_Day1.vhdr...
Setting channel info structure...
<Info | 12 non-empty values
 bads: []
 ch_names: Fp1, Fz, F3, F7, FT9, FC5, FC1, C3, T7, TP9, CP5, CP1, Pz, P3, ...
 chs: 64 EEG
 condition: L
 custom_ref_applied: False
 dig: 64 items (64 EEG)
 fid: 15_L_Day1
 highpass: 0.0 Hz
 lowpass: 1000.0 Hz
 meas_date: 2020-10-26 09:34:10 UTC
 nchan: 64
 num_day: 1
 projs: []
 sfreq: 500.0 Hz
 subject: 15
>


## Event processing

In [8]:
# Extract triggers from raw instance
events = get_events_from_raw(raw)

resting_event_names = events.loc[events['event'].str.contains('rs_'), 'event'].tolist()
asrt_event_names = events.loc[events['event'].str.contains('asrt_'), 'event'].tolist()

In [9]:
# Uncomment to show events
events.head(20)

Unnamed: 0,start_time,event_id,event,sequence,end_time,duration
805,1167.746,91,asrt_1_1,A,1282.248,114.502
1068,1324.152,91,asrt_1_1,A,1438.654,114.502
1331,1477.654,91,asrt_1_1,A,1592.106,114.452
1595,1633.542,91,asrt_1_1,A,1748.062,114.52
1860,1790.782,91,asrt_1_1,A,1905.468,114.686
2128,2092.35,93,asrt_1_2,A,2206.82,114.47
2391,2256.246,93,asrt_1_2,A,2370.648,114.402
2652,2416.14,93,asrt_1_2,A,2530.558,114.418
2915,2579.734,93,asrt_1_2,A,2694.236,114.502
3179,2740.828,93,asrt_1_2,A,2855.346,114.518


## Cut raw data and create epochs based on triggers

### Create epochs

- bandpass filter the continuous data (0.5 - 45 Hz)
- create fixed length epochs (1 second)

In [None]:
epochs = create_epochs_from_events(raw=raw, events=events)

In [None]:
epochs['asrt_2_1'].plot()

## Run preprocessing


### 1.1. Preliminary epoch rejection

In [None]:
epochs_faster = prepare_epochs_for_ica(epochs=epochs)

### 1.2. Run ICA

We run ICA for the resting and ASRT periods together; it will take a few minutes.
The parameters are: 32 ICA components using ["infomax"](https://mne.tools/stable/generated/mne.preprocessing.infomax.html) algorithm. 

When visualizing the components, it is recommended to subset the data (see below).

In [None]:
ica = run_ica(epochs=epochs_faster)

In [None]:
# Visualize components on epochs
# Subset epochs to reduce execution time
subset = [asrt_event_names[1]]
# Exclude components by selecting them, right click on component name to visulize source:
ica.plot_sources(epochs_faster[subset], start=0, stop=10)

In [None]:
# After selecting the components to exclude, apply ICA to epochs
# Document the number of excluded components
ica.apply(epochs_faster)
epochs_faster.info['description'] = f'n_components: {len(ica.exclude)}'

### 1.3. Visualize ICA cleaned epochs (optional)

This step can be repeated after each preprocessing step, or you can also do a final inspection at the end. 

In [None]:
epochs_faster

In [None]:
epochs_faster['asrt_2_2'].plot(n_epochs=10, scalings={'eeg': 20e-6}, title=raw.info['fid'])

In [None]:
# Optional

# If you found a component that should have been excluded but it wasn't you can exclude it here:
ica.plot_sources(epochs_faster['rs_3_1'], start=0, stop=10)

In [None]:
# Optional

# After selecting the components to exclude, apply ICA to epochs
# Document the number of excluded components
ica.apply(epochs_rs_faster)
epochs_rs_faster.info['description'] = f'n_components: {len(ica.exclude)}'

### 1.4. Save cleaned epochs (recommended)

In [None]:

# Create folder for preprocessed and interim files
folder_name = 'preprocessed'
interim_path = os.path.join(base_path, folder_name)



# Create path to epoch files
interim_epochs_path = os.path.join(interim_path, raw.info['condition'], 'epochs')
if not os.path.exists(interim_epochs_path):
    os.makedirs(interim_epochs_path)

# Save ICA cleaned epochs 
fid = epochs_faster.info['fid']
epochs_clean_fname = f'{fid}_ICA'
postfix = '-epo.fif.gz'
epochs_faster.save(os.path.join(interim_epochs_path, f'{epochs_clean_fname}{postfix}'), overwrite=True)

### 2.1. Run autoreject

In [None]:
ar = run_autoreject(epochs_faster, n_jobs=11, subset=False)

In [None]:
# Drop bad epochs (stage 1)

reject_log = ar.get_reject_log(epochs_faster)

epochs_autoreject = epochs_faster.copy().drop(reject_log.bad_epochs, reason='AUTOREJECT')

In [None]:
# Drop bad epochs (stage 2) - after visual inspection
idx = np.where(np.count_nonzero(reject_log.labels, axis=1) > epochs_faster.info['nchan']/2)[0].tolist()

# # Plot just the bad epochs!
if idx: 
    epochs_faster[idx].plot(n_epochs=10,
                                scalings={'eeg': 20e-6},
                                n_channels=32)
    
epochs_autoreject.drop(idx, reason='AUTOREJECT')

In [None]:
epochs_autoreject.drop(idx, reason='AUTOREJECT')

In [None]:
epochs_autoreject['asrt_2_1'].plot(n_epochs=10)

In [None]:
# save clean epochs
fid = epochs_autoreject.info['fid']
epochs_clean_fname = f'{fid}_ICA_autoreject'
postfix = '-epo.fif.gz'
epochs_autoreject.save(os.path.join(interim_epochs_path, f'{epochs_clean_fname}{postfix}'), overwrite=True)

### 3. Run ransac

In [None]:
epochs_ransac = run_ransac(epochs_autoreject)

In [None]:
# inspect which sensors were interpolated (if any)
epochs_ransac.info

### 4. Final visual inspection

Mark epochs that should be dropped, select electrodes that should be interpolated etc.

In [None]:
epochs_ransac

In [None]:
epochs_ransac.plot(n_epochs=10,
                       n_channels=32,
                       # group_by='position',
                       scalings={'eeg': 20e-6})

In [None]:
# if there are additional channels marked for interpolation, we can interpolate them here.

if epochs_ransac.info['bads']:
    bads_str = ', '.join(epochs_ransac.info['bads'])
    epochs_ransac.interpolate_bads()
    epochs_ransac.info.update(description=epochs_ransac.info['description'] + ', interpolated: ' + bads_str)

### 5. Set average reference

In [None]:
epochs_ransac.set_eeg_reference()

### 6. Annotate continuous data


In [None]:
start_times = [epochs.events[idx][0] / raw.info['sfreq'] 
               for idx, value in enumerate(epochs_ransac.drop_log) if value]

duration = (epochs_ransac.events[1][0] - epochs_ransac.events[0][0]) / raw.info['sfreq'] 

raw.annotations.append(onset=start_times,
                       duration=[duration] * len(start_times),
                       description='BAD_auto')

In [None]:
# Create path to annotated files
annotated_raw_path = os.path.join(interim_path, raw.info['condition'], 'raw')
if not os.path.exists(annotated_raw_path):
    os.makedirs(annotated_raw_path)

# Save annotated continuous data
fid = raw.info["fid"]
raw_annotated_fname = f'{fid}_bad_annotated'
postfix = '-raw.fif.gz'
raw.save(os.path.join(annotated_raw_path, f'{raw_annotated_fname}{postfix}'), overwrite=True)

### 7. Save cleaned epochs

#### 7.1. Resting period before ASRT

In [None]:
# Create path to annotated files
epochs_rs_path = os.path.join(interim_path, raw.info['condition'], 'epochs_rs')
if not os.path.exists(epochs_rs_path):
    os.makedirs(epochs_rs_path)

rs_period_name = f'rs_{raw.info["num_day"]}_1'
fid = f'{raw.info["subject"]}_{raw.info["condition"]}_{rs_period_name}'
epochs_clean_fname = f'{fid}_ICA_autoreject_ransac'
postfix = '-epo.fif.gz'

epochs_ransac[rs_period_name].save(os.path.join(epochs_rs_path, f'{epochs_clean_fname}{postfix}'), overwrite=True)

#### 7.2. Resting period before ASRT

In [None]:
rs_period_name = f'rs_{raw.info["num_day"]}_2'
fid = f'{raw.info["subject"]}_{raw.info["condition"]}_{rs_period_name}'
epochs_clean_fname = f'{fid}_ICA_autoreject_ransac'
postfix = '-epo.fif.gz'

epochs_ransac[rs_period_name].save(os.path.join(epochs_rs_path,
                                                f'{epochs_clean_fname}{postfix}'), overwrite=True)

#### 7.3. ASRT

In [None]:
# Create path to annotated files
epochs_asrt_path = os.path.join(interim_path, raw.info['condition'], 'epochs_asrt')
if not os.path.exists(epochs_asrt_path):
    os.makedirs(epochs_asrt_path)

In [None]:
for sequence, periods in events.groupby('sequence')['event'].apply(set).to_dict().items():
    #epochs_to_merge = [epochs_ransac[period] for period in periods]
    #merged_epochs = mne.concatenate_epochs(epochs_to_merge, offset=True)
    fid = f'{raw.info["subject"]}_{raw.info["condition"]}_asrt_{raw.info["num_day"]}_{sequence}'
    epochs_clean_fname = f'{fid}_ICA_autoreject_ransac'
    postfix = '-epo.fif.gz'
    
    epochs_ransac[sorted(set(periods))].save(os.path.join(epochs_asrt_path,
                                                          f'{epochs_clean_fname}{postfix}'), overwrite=True)

In [None]:
epochs_ransac

In [None]:
# cleanup from memory
del raw, epochs, epochs_autoreject, epochs_ransac

plt.close('all')