# Searchlight analysis: think outside the box

## This searchlight analysis is for the `non smooth betas`

Recall that we think of each $\beta$ as a point in high dimensional voxel space. We have to choose the voxels in which we want to do the MVPA for two main reasons:
   
1. **Cog neuro assumes the brain is made up of functional regions.** Based on the relevant unit of analysis (local regions). This idea about the brain as a modular structure comes from Cognitive Neuroscience.
2. **Avoid overfitting.** From a statistical point of view, we prefer to have as few dimensions as possible. As such, we'd like to reduce the noise in our dataset by only including the most relevant voxels.


<img src="http://drive.google.com/uc?export=view&id=1andZMeSCqfIQSfr7QwIRoYfHP0z7dGTD" style="height:150px"/>

## Searchlight analysis

from ([Joset et al., 2013](https://linkinghub.elsevier.com/retrieve/pii/S1053811913002917))

- a type of Multivoxel Pattern Analysis (MVPA), sometimes referred to as *information mapping*.
- a *searchlight* is a spatial moving window (kernel) that exhaustively searches the brain to localise representations. SA produces maps by measuring the information (read: variation in signal activity) in small spheres around each voxel.


> We want to perform searchlight analyses with different ROI's such as:

[from (Shuck et al., 2016)]()

- PFC
- 

[(Balaguer et al., 2016)]()

- ...


<font color=red> Finish this and plot the tstatistics like before, see jiajias work [here](https://github.com/tomov/VGDL-fMRI-Python-Data-Analysis/blob/master/fMRI_analysis_jiajia/RSA_Searchlight.ipynb) </font>



## Searchlight workflow

- The two main things that determine the speed of a searchlight: the **kernel algorithm** and the **amount of parallelization**.
- Rule of thumb: start your searchlight analysis small!

### Steps

1. Create a mask of one voxel and run the searchlight interactively to check whether the code works.
2. Use timestamps to extract the execution time
3. Print the number of voxels that are passed to the searchlight function
4. Run the searchlight as a job on the smallest unit of real data you have (a single run or single participant)
5. Check the runtime and memory usage of this searchlight (e.g. on slurm: `sacct -j $JID --format=jobid,maxvmsize,elapsed`).

In [34]:
import h5py
import warnings
import sys 
if not sys.warnoptions:
    warnings.simplefilter("ignore")
# Import libraries
import nibabel as nib
import numpy as np
import os 
import time
from nilearn import plotting
from brainiak.searchlight.searchlight import Searchlight
from brainiak.fcma.preprocessing import prepare_searchlight_mvpa_data
from brainiak import io
from pathlib import Path
from shutil import copyfile
import matplotlib.pyplot as plt
from mpl_toolkits import mplot3d 
import seaborn as sns 
import pandas as pd
from importlib import reload 

# Import machine learning libraries
from sklearn.model_selection import StratifiedKFold, GridSearchCV, cross_val_score
from sklearn.svm import SVC
import scipy

# import own functions
import utils
reload(utils)

%autosave 30
%matplotlib inline
sns.set(style = 'white', context='talk', font_scale=1, rc={"lines.linewidth": 2})

Autosaving every 30 seconds


# Load the data

In [4]:
# specify local path
path = '/Users/Daphne/data/'

# load betas 
all_bold_vol = np.load(path+'bold_data_levels_NS.npy')

In [16]:
# mask_nii is the functional mask, this selects the brain voxels
mask_nii_NS = nib.load(os.path.join(path, 'mask_nii_NS.nii')) 

# the mask from the mask .mat files, use thisone to get the coordinates
mask_mat_NS = np.load(path+'mask_mat_NS.npy')

coords_mat = np.array(np.where(mask_mat_NS == 1)) # so need one set of voxel coordinates for all
coords_mat[[0, 2]] = coords_mat[[2, 0]] # exchange the rows

# this where we plot our mask ON (sometimes called brain_nii) - the anatomical/structural image
mean_nii = nib.load(os.path.join(path, 'mean.nii')) 

In [19]:
brain_mask = np.array(mask_nii_NS.dataobj)
affine_mat = mask_nii_NS.affine
dimsize = mask_nii_NS.header.get_zooms()

In [18]:
coords_mat.shape # dimensions by voxel number

(3, 179595)

# 1. Prepare the data and set SA parameters 

- We get the betas in the form `[x,y,z, voxel intensity]`, going from 3D to 4D arr.

TODO: sanity check going from 3D --> 4D

In [21]:
coords = tuple(coords_mat) # take the coordinates, make them tuple

# ==== Make a mask of a voxel of choice (ROI) ===
small_mask = np.zeros(brain_mask.shape)
small_mask[42, 28, 26] = 1
print(np.where(small_mask))

Nrows = 54 # levels data has 54 rows (TRs)

(array([42]), array([28]), array([26]))


In [22]:
vol4D = mask_nii_NS.shape+(Nrows,)

# For 8 subjects
all_bold = []
for sub_id in range(8):
    isc_vol = np.zeros(vol4D)
    bold_vol = all_bold_vol[:,:,sub_id]
    for i in range(6):
        for j in range(len(coords[0])):
            # translate to 4D space in order to perform searchlight
            # i = level
            # j = voxel
            isc_vol[(coords[0][j], coords[1][j], coords[2][j], i)] = bold_vol[i][j]
            # (x,y,z,voxel)
    all_bold.append(isc_vol)

## Searchlight parameters

1. **data** = The brain data as a 4D volume.

2. **mask** = A binary mask specifying the "center" voxels in the brain around which you want to perform searchlight analyses. A searchlight will be drawn around every voxel with the value of 1. Hence, if you chose to use the wholebrain mask as the mask for the searchlight procedure, the searchlight may include voxels outside of your mask when the "center" voxel is at the border of the mask. It is up to you to decide whether then to include these results.

3. **bcvar** = An additional variable which can be a list, numpy array, dictionary, etc. you want to use in your searchlight kernel. For instance you might want the condition labels so that you can determine to which condition each 3D volume corresponds. If you don't need to broadcast anything, e.g, when doing RSA, set this to 'None'.

4. **sl_rad** = The size of the searchlight's radius, excluding the center voxel. This means the total volume size of the searchlight, if using a cube, is defined as: ((2 * sl_rad) + 1) ^ 3. (*in voxels!*)

5. **max_blk_edge** = When the searchlight function carves the data up into chunks, it doesn't distribute only a single searchlight's worth of data. Instead, it creates a block of data, with the edge length specified by this variable, which determines the number of searchlights to run within a job.

6. **pool_size** = Maximum number of cores running on a block (typically 1).



<font color=red> use the following radii: r = 2.6 voxels (4 mm); 4 voxels (6 mm); 6.6 voxels (10 mm </font>

In [28]:
# Preset the variables
data = all_bold
mask = small_mask
bcvar = None
sl_rad = 4
max_blk_edge = 5
pool_size = 1

# 2. Execute searchlight window

In [29]:
# Start the clock to time searchlight
begin_time = time.time()

# Create the searchlight object
sl = Searchlight(sl_rad=sl_rad, max_blk_edge=max_blk_edge)
print("Setup searchlight inputs")
print("Number of subjects: " + str(len(data)))
print("Input data shape: " + str(data[0].shape))
print("Input mask shape: " + str(mask.shape) + "\n")

# Distribute the information to the searchlights (preparing it to run)
sl.distribute(data, mask)
# Data that is needed for all searchlights is sent to all cores via the sl.broadcast function. 
#In this example, we are sending the labels for classification to all searchlights.
sl.broadcast(bcvar)

Setup searchlight inputs
Number of subjects: 8
Input data shape: (79, 95, 79, 54)
Input mask shape: (79, 95, 79)



<font color=red> What is the bcvar and how should I use it??? </font>

In [35]:
# # Set up the kernel, RDM analysis
# def rdm_all(data, sl_mask, myrad, bcvar):
#     all_rho = []
#     behavior_RDM = None
#     # Loop over subject: 
#     for idx in range(len(data)):
#         data4D = data[idx]
#         bolddata_sl = data4D.reshape(sl_mask.shape[0] * sl_mask.shape[1] * sl_mask.shape[2], data[0].shape[3]).T
#         neural_RDM = np.tril(1-np.corrcoef(bolddata_sl), -1)
#         neural_RDM = neural_RDM.ravel()
#         neural_RDM = neural_RDM[neural_RDM != 0]
#         # TODO get partial corr
#         subject_spearman = scipy.stats.spearmanr(neural_RDM, behavior_RDM,axis=None)

#         # print("bbb",behavior_RDM.append(neural_RDM))
#         # df = pd.DataFrame(data=behavior_RDM.append(neural_RDM), index=['visual','goal','rule','difficulty','neural']).transpose()
#         # subject_partial_spearman = pg.partial_corr(data=df, x='neural', y='rule', covar=['visual', 'goal', 'difficulty'],
#         #         method='spearman').round(6)
#         all_rho.append(subject_spearman.correlation)
#     tstats,p = scipy.stats.ttest_1samp(np.arctanh(all_rho), popmean=0)
#     return (tstats, p)

# # Execute searchlight on 8 subjects
# print("Begin Searchlight\n")
# sl_result_allsubj = sl.run_searchlight(rdm_all, pool_size=pool_size)
# print("End Searchlight\n")
# print(sl_result_allsubj[mask==1])

## Save data 

np.save(path+'diffi_pvalues_all', all_pvalues)

np.save(path+'diffi_tstats_all', all_rho)

## Quicklinks & resources 

[brainiak searchlight tutorial](https://brainiak.org/tutorials/07-searchlight/)

[brainiak searchlight package](https://brainiak.org/docs/brainiak.searchlight.html#module-brainiak.searchlight)

### From `nilearn`

- [Searchlight with nilearn](http://nilearn.github.io/auto_examples/02_decoding/plot_haxby_searchlight.html#sphx-glr-auto-examples-02-decoding-plot-haxby-searchlight-py)