# Replay in Aversive Environments - Localiser decoding

#### _This is a template that will be parameterised and run via [Papermill](http://papermill.readthedocs.io/) for each subject_

This notebook trains a classifier on the localiser data to identify the neural signature associated with each image in the task.

Classification steps:

1. Loading preprocessed data
2. Running classification over multiple trial timepoints to generate a decoding timecourse
3. Hyperparameter estimation
4. Producing a confusing matrix to assess classifier performance


## Imports

In [None]:
# import os
# os.chdir('..')
# %load_ext autoreload
# %autoreload 2

import sys
sys.path.insert(0, 'code')

from mne.io import read_raw_ctf
import mne
import matplotlib.pyplot as plt
from mne.preprocessing import ICA, create_eog_epochs
import numpy as np
import pandas as pd
import re
from sklearn.pipeline import make_pipeline
from sklearn.preprocessing import StandardScaler, RobustScaler
from sklearn.linear_model import LogisticRegression
from sklearn.decomposition import PCA
from sklearn.metrics import confusion_matrix, accuracy_score
from sklearn.preprocessing import FunctionTransformer, scale
from sklearn.model_selection import RandomizedSearchCV, cross_val_predict
from sklearn.externals import joblib
from scipy.stats import halfcauchy
from mne.decoding import (SlidingEstimator, cross_val_multiscore)
from mne.decoding import UnsupervisedSpatialFilter
import plotly
import plotly.graph_objs as go
from plotly.offline import download_plotlyjs, init_notebook_mode, plot, iplot
import os
import papermill as pm
from utils import add_features, select_timepoints
from plotting import plot_confusion_matrix

%matplotlib inline

np.random.seed(100)

## Parameters

In [None]:
session_id = 'MG05572'  # ID of the scanning session
output_dir = 'data/derivatives'  # Where the output data should go
n_stim = 8  # Number of stimuli, including null
shifts = [-5, 6]  # Additional timepoints to use as features
n_iter_search = 100  # Number of iterations of the random search parameter optimisation procedure
cores = 1  # Number of cores to use for parallel processing
os.environ['OMP_NUM_THREADS'] = str(cores)

## Get data

In [None]:
localiser_epochs = mne.read_epochs(os.path.join(output_dir, 'preprocessing/localiser', 'sub-{0}_ses-01_task-AversiveLearningReplay_run-localiser_proc_ICA-epo.fif.gz').format(session_id))

### Plot the responses to image stimuli in sensor space

In [None]:
times = np.arange(0.06, 0.3, 0.02)
evoked = localiser_epochs.average()
evoked.plot_topomap(times, ch_type='mag')
evoked.plot_topomap(0.2, ch_type='mag', show_names=True, colorbar=False, size=3, res=128);

## Decoding analysis

### Timecourse of decoding accuracy

This gives us an idea of where in time we're able to decode stimulus identity - this should be around chance at 0s before rising a little before 200ms.

In [None]:
# Get epoch data
X_raw = localiser_epochs.get_data()  # MEG signals: n_epochs, n_channels, n_times (exclude non MEG channels)
y_raw = localiser_epochs.events[:600, 2]  # Get event types

# select events and time period of interest
picks_meg = mne.pick_types(localiser_epochs.info, meg=True, ref_meg=False)
event_selector = (y_raw < 23) | (y_raw == 99)
X_raw = X_raw[event_selector, ...]
y_raw = y_raw[event_selector]
X_raw = X_raw[:, picks_meg, :]

print("Number of unique events = {0}\n\nEvent types = {1}".format(len(np.unique(y_raw)),
                                                                  np.unique(y_raw)))

# Do PCA with 50 components
pca = UnsupervisedSpatialFilter(PCA(50), average=False)
pca_data = pca.fit_transform(X_raw)

# CLASSIFIER
# Logistic regression with L2 penalty, multi-class classification performed as one-vs-rest
# Data is transformed to have zero mean and unit variance before being passed to the classifier
clf = make_pipeline(StandardScaler(), LogisticRegression(multi_class='multinomial', C=0.1, penalty='l2', class_weight="balanced",
                                                         solver='saga', max_iter=100000, tol=0.2))

# Try classifying at all time points with 5 fold CV
time_decod = SlidingEstimator(clf, n_jobs=1, scoring='accuracy')
scores = cross_val_multiscore(time_decod, pca_data, y_raw,
                              cv=5, n_jobs=1)

# Mean scores across cross-validation splits
mean_scores = np.mean(scores, axis=0)
best_idx = np.where(mean_scores == mean_scores.max())[0][0]

print("Best classification at index {0}, {1}ms".format(best_idx, (best_idx * 10) - 133))

# Plot
fig, ax = plt.subplots(dpi=100)
# ax.plot(range(10), mean_scores, label='Score')
ax.axhline(1. / n_stim, color='#a8a8a8', linestyle='--', label='Chance')
ax.set_xlabel('Time (s)')
ax.set_ylabel('Subset accuracy')
ax.axvline(.0, color='#515151', linestyle='-')
ax.set_title('Decoding accuracy')
ax.plot(localiser_epochs.times[:len(mean_scores)], mean_scores, label='Score')
ax.axvline(localiser_epochs.times[best_idx], color='#76b9e8', linestyle='--')

ax.legend()
plt.tight_layout()

### Plot the response timecourse across trials

This just shows the responses across our principal components across all trials.

In [None]:
ev = mne.EvokedArray(np.mean(pca_data, axis=0),
                     mne.create_info(50, localiser_epochs.info['sfreq']), tmin=-0.1)
ev.plot(show=False, window_title="PCA", time_unit='s');

### Optimise hyperparameters using randomised search

Optimising regularisation parameter (C) and number of PCA components. Randomised search works like grid search but rather than exhaustively searching a grid of predefined parameter values, it samples from specified parameter distributions. This is useful here because C values closer to 0 tend to be better, but this is not always the case - here we sample C values from a half-Cauchy distribution so that low values are tested more frequently, without us having to manually specify a grid that conforms to this criterion.

To make the process more streamlined, we create a classifier pipeline containing the following steps:
1. Temporal PCA (reducing dimensionality in the channel dimension)
2. Adding features from adjacent timepoints - although we're focusing on the 200ms mark, we add timepoints from before and after this point as additional features. This tends to boost decoding accuracy by ~10%.
3. Scaling the data to be in a standard range.
4. Logistic regression with lasso (L2) regularisation and multinomial multi-class classification.

This is the iteratively run and evaluated with 3-fold cross validation across different hyperparameter settings.

All of this is performed on data from the 200ms point as this has been used successfully in previous studies - interestingly decoding from other timepoints doesn't seem to produce the kind of sequential replay we get when decoding from 200ms.


In [340]:
from sklearn.preprocessing import PolynomialFeatures
from sklearn.decomposition import PCA

# Set idx to 33 for 200ms
best_idx = 33

# Select data from timepoint of interest
X, y = (X_raw.copy(), y_raw.copy())
# X_null = X[..., :11]
X = X[..., best_idx + shifts[0]:best_idx + shifts[1]] 
# X = X[y!=99]
# y = y[y!=99]


# Create a pipiline that combines PCA, feature augmentation, scaling, and the logistic regression classifier
clf = make_pipeline(UnsupervisedSpatialFilter(PCA(50), average=False), 
                    FunctionTransformer(add_features, validate=False), StandardScaler(), 
                    LogisticRegression(multi_class='ovr', C=0.1, penalty='l1', class_weight="balanced",
                                                         solver='saga', max_iter=100000, tol=0.2))

# Parameter distributions passed to the random search procedure
param_dist = {"unsupervisedspatialfilter__estimator__n_components": range(30, 60),
              "logisticregression__C": halfcauchy(scale=5)}

# run randomized search
random_search = RandomizedSearchCV(clf, param_distributions=param_dist,
                                   n_iter=n_iter_search, cv=3, n_jobs=1, scoring='accuracy')
random_search.fit(X, y)

# Produce a dataframe of the search results
results = pd.DataFrame(random_search.cv_results_)

print("Parameter optimisation done")



The default of the `iid` parameter will change from True to False in version 0.22 and will be removed in 0.24. This will change numeric results when test-set sizes are unequal.


You are accessing a training score ('split0_train_score'), which will not be available by default any more in 0.21. If you need training scores, please set return_train_score=True


You are accessing a training score ('split1_train_score'), which will not be available by default any more in 0.21. If you need training scores, please set return_train_score=True


You are accessing a training score ('split2_train_score'), which will not be available by default any more in 0.21. If you need training scores, please set return_train_score=True


You are accessing a training score ('mean_train_score'), which will not be available by default any more in 0.21. If you need training scores, please set return_train_score=True


You are accessing a training score ('std_train_score'), which will not be available by default

Parameter optimisation done


In [None]:
# task_epochs = mne.read_epochs(os.path.join(output_dir, 'preprocessing/task', 'sub-{0}_ses-01_task-AversiveLearningReplay_run-task_proc_ICA-epo.fif.gz').format(session_id))
# planning_epochs = task_epochs['planning']
# rest_epochs = task_epochs['rest']

In [None]:
# picks_meg = mne.pick_types(task_epochs.info, meg=True, ref_meg=False)
# planning_X = planning_epochs.get_data()[:, picks_meg, :] # MEG signals: n_epochs, n_channels, n_times
# rest_X = rest_epochs.get_data()[:, picks_meg, :]

# X, y = (pca_data.copy(), y_raw.copy())
# X_null = np.zeros_like(X[..., :11])
# X = X[..., best_idx + -5:best_idx + 6] 

# rest_pca = pca.transform(rest_X)

# for i in range(rest_X.shape[0]):
#     for n, j in enumerate(np.random.randint(0, 589, 6)):
#         if (i * 6) + n < X_null.shape[0] - 1:
#             X_null[(i * 6) + n, ...] = rest_pca[i, :, j:j+11]
            
# X = np.vstack([X, X_null])
# y = np.hstack([y, [99] * X_null.shape[0]])

In [None]:
# from sklearn.tree import DecisionTreeClassifier
# from baggingPU import BaggingClassifierPU
# bc = BaggingClassifierPU(
#     LogisticRegression(solver='lbfgs', max_iter=1000, penalty='l2', C=0.1), n_estimators = 100, n_jobs = 1, 
#     max_samples = np.unique(y[y!=99], return_counts=True)[1].max()  # Each training sample will be balanced
# )
# from sklearn.multiclass import OneVsRestClassifier
# # clf = OneVsRestClassifier(bc)
# clf = make_pipeline(FunctionTransformer(add_features, validate=False), RobustScaler(), 
#                     OneVsRestClassifier(bc))

In [None]:
# clf.fit(X, y)

 Show the results of the optimisation procedure

In [None]:
results.sort_values('mean_test_score', ascending=False).head()

### Plot the results of hyperparameter optimisation

We can plot the results of the randomised search on a 3D mesh, with the two optimised parameters on the X and Y axes and accuracy on the Z axis. This is produced using [plotly](http://plot.ly/).

In [None]:
# init_notebook_mode(connected=True)

# trace = go.Mesh3d(x=results.param_logisticregression__C,
#                   y=results.param_unsupervisedspatialfilter__estimator__n_components,
#                   z=results.mean_test_score, 
#                   color='#275fb5', opacity=0.20)

# layout = go.Layout(
#     title='Hyperparameter optimisation results',
#     autosize=True,
#     width=700,
#     height=700,
#     scene = dict(
#     xaxis = dict(
#         title='Logistic regression C'),
#     yaxis = dict(
#         title='PCA N components'),
#     zaxis = dict(
#         title='Mean accuracy'),)
# )

# fig = go.Figure(data=[trace], layout=layout)
# iplot(fig)

### Make confusion matrix with 5-fold CV

The confusion matrix gives us an idea of whether any individual stimuli are being poorly decoded.

In [None]:
clf.set_params(**random_search.best_params_)

# Get predictions with 5 fold CV
y_pred = cross_val_predict(clf, X, y, cv=3)
y_pred_proba = cross_val_predict(clf, X, y, cv=5, method='predict_proba')
mean_conf_mat = confusion_matrix(y, y_pred)
mean_accuracy = accuracy_score(y, y_pred)
mean_conf_mat = mean_conf_mat.astype('float') / mean_conf_mat.sum(axis=1)  # normalise

print("Mean accuracy = {0}".format(mean_accuracy))
    
# Plot mean confusion matrix
plot_confusion_matrix(mean_conf_mat, title='Normalised confusion matrix, accuracy = {0}'.format(np.round(mean_accuracy, 2)))

# # Save things
# np.save(os.path.join(output_dir, 'localiser_classifier_performance', 'y', 'sub-{0}_localiser_y').format(session_id), y)
# np.save(os.path.join(output_dir, 'localiser_classifier_performance', 'predictions', 'sub-{0}_localiser_predictions').format(session_id), y)
# np.save(os.path.join(output_dir, 'localiser_classifier_performance', 'predictions_prob', 'sub-{0}_localiser_predictions_prob').format(session_id), y)

### Save components of the analysis for later use

First save the classifier that was fit to all the localiser data using the best hyperparameter values.

In [None]:
joblib.dump(random_search.best_estimator_ , os.path.join(output_dir, 'classifier', 'sub-{0}_classifier.pkl').format(session_id))
# joblib.dump(pca, os.path.join(output_dir, 'classifier', 'sub-{0}_pca.pkl').format(session_id)) 

We can use papermill to save certain details, such as the mean accuracy, in the notebook so that we can read them later on.

In [None]:
pm.record("mean_accuracy", mean_accuracy)
pm.record('best_C', random_search.best_params_['logisticregression__C'])
# pm.record('best_n_components', random_search.best_params_['unsupervisedspatialfilter__estimator__n_components'])