# Setup and workflow

## Main goals

The BOLD signal contains noise. Here, we calculate correlations between subjects to reduce noise and estimate task-relevant signals. Want to find brain regions where the same levels of activity are displayed. Key predictions are that, in the theory encoding region, the ISC should:

1. Be highest for same levels, medium for same games, and lowest for random (shuffled) games
2. Increase over levels of the same game

---

## Workflow

1. Load data in notebook (we are dealing with `.mat` files) (**Remark.** note that files are different (glm1 != glm24)
2. Do fMRI data exploration (how does the structure look like, etc.)
    - preprocessing
3. Do ISC analysis; see (Chen et al., 2017) and [Brainiak ISC tutorial](https://brainiak.org/tutorials/10-isc/)
4. Do Searchlight analysis []

[Brainiak ISC analyasis documentation](https://brainiak.org/docs/brainiak.html#module-brainiak.isc)

[Brainiak specific examples](https://github.com/brainiak/brainiak/tree/master/examples)

In [115]:
import brainiak

In [92]:
import h5py
import warnings
import sys 
if not sys.warnoptions:
    warnings.simplefilter("ignore")
import os 
import glob
import time
from copy import deepcopy
import numpy as np
import pandas as pd 

from nilearn import datasets
from nilearn import surface
from nilearn import plotting
from nilearn.input_data import NiftiMasker, NiftiLabelsMasker
import nibabel as nib

from brainiak import image, io
#from brainiak.isc import isc, isfc, permutation_isc
import matplotlib.pyplot as plt
import seaborn as sns 

%autosave 5
%matplotlib inline
sns.set(style = 'white', context='talk', font_scale=1, rc={"lines.linewidth": 2})

Autosaving every 5 seconds


# 0. Loading in the data

In [93]:
data_dir = '/Users/Daphne/Desktop/beta_series/' # local directory
os.path.exists(data_dir) 

True

from [here](https://stackoverflow.com/questions/874461/read-mat-files-in-python) and [scipy docu](https://docs.scipy.org/doc/scipy-0.14.0/reference/generated/scipy.io.loadmat.html)
```
Neither scipy.io.savemat, nor scipy.io.loadmat work for MATLAB arrays version 7.3. But the good part is that MATLAB version 7.3 files are hdf5 datasets. So they can be read using a number of tools, including NumPy.
```

[h5py documentation](http://docs.h5py.org/en/stable/quick.html#core-concepts)

We want to access the following variables in the dataset:
- `B = [blocks, voxels]`:  the average activity in each block
- `names = [18,1]` : the names of the games
- `Vmask = [1,1]` : can be used to convert mask to standardised brain coordinates
- `mask = [79, 95, 79]` : binary mask 

In [94]:
# specify filename
filename = 'beta_series_glm1_subj1_nosmooth.mat'

In [95]:
# each subject is a separate file so
subject = h5py.File(data_dir+filename,'r')

list(subject.keys()) # these are the variables in the data

['#refs#', 'B', 'Vmask', 'mask', 'names']

In [100]:
# now we read in all the variables of interest for a given subject
B = subject['B'].value

mask = subject['mask'].value
names = subject['names'].value
Vmask = subject['Vmask']

# # alternatively, using dictionary syntax..
# B = subject['B']
# B[0]

B

array([[ 13.67853355,   9.34046936,   8.19104671, ...,  13.98070717,
          1.051754  ,   4.48804474],
       [ 19.71709633,  14.8930378 ,  18.61097145, ...,   8.22801685,
         -2.43646121,   3.49442911],
       [  1.22297895,   3.4824307 ,   3.93117332, ...,   8.7571125 ,
          5.50793028,   3.66078162],
       ...,
       [ -9.61049461, -38.04869461, -41.12371445, ...,  22.4010582 ,
         17.33374023, -16.49692345],
       [-14.54338264,  -4.19539356,  -0.50012857, ...,  -3.23373842,
         -7.53230715, -12.56788921],
       [-20.06606102,  -7.664783  ,  -4.73619652, ...,  26.66522789,
         39.77856445,  40.31108856]])

In [111]:
# get all the data for GLM1 (subjects 1-8)
num_subjects = 8

B_data = []
mask_data = []
Vmask_data = []
names_data = []

for i in range(num_subjects):
    idx = i+1
    
    # change filename to subject #
    data_dir = '/Users/Daphne/Desktop/beta_series/'
    filename = 'beta_series_glm1_subjk_nosmooth.mat'
    filename = filename.replace('k', str(idx))
    
    subject = h5py.File(data_dir+filename,'r') 
    print(f'Save data for subject {idx}')
    # load and save data for respective subject
    B = subject['B'].value
    mask = subject['mask'].value
    names = subject['names'].value
    Vmask = subject['Vmask']
    
    # append to lists
    B_data.append(B)
    mask_data.append(mask)
    Vmask_data.append(Vmask)
    names_data.append(names)

Save data for subject 1
Save data for subject 2
Save data for subject 3
Save data for subject 4
Save data for subject 5
Save data for subject 6
Save data for subject 7
Save data for subject 8


Check data shapes (sanity check)

In [114]:
B_data[0].shape

(179595, 18)

In [117]:
mask_data[0].shape

(79, 95, 79)

In [119]:
Vmask_data[0]

<HDF5 group "/Vmask" (8 members)>

In [120]:
names_data[0].shape

(1, 18)

We stick to the recommended sequence of steps for running ISC using Brainiak
1. **Data preparation**. Create a whole-brain mask. The outcome of this is an array of anatomically-aligned and temporally-aligned brain data.
2. **Compute ISC**. The ISC function computes correlations across subjects for corresponding voxels in the mask. It uses the compute_correlation function in BrainIAK, which is optimized for fast execution (and was used in FCMA).
3. **Permutation test for ISC**. Perform statistical analysis to determine significant correlation values for ISC


# 1. Data preparation


In [121]:
"""load brain template"""

# Load the brain mask
brain_mask = io.load_boolean_mask(mask_name)

# Get the list of nonzero voxel coordinates
coords = np.where(brain_mask)

# Load the brain nii image
brain_nii = nib.load(mask_name)

NameError: name 'mask_name' is not defined