## Preprocess `.mat` files and reorder the betas, then save


### Load data in notebook 
(we are dealing with `.mat` files) (**Remark.** note that we have 3 different types of data. `glm1, glm24, glm25`)

The repetition time (TR) is different for each `glm`. So for example, TR of `glm1` is one block, so 3x60s = 180s.

- The blocks (`glm 1`); TR = 180 s
- The levels (`glm 25`); TR = 60 s
- The 10 seconds boxcars (`glm 24`); TR = 10 s
    
    
from [here](https://stackoverflow.com/questions/874461/read-mat-files-in-python) and [scipy docu](https://docs.scipy.org/doc/scipy-0.14.0/reference/generated/scipy.io.loadmat.html)
```
Neither scipy.io.savemat, nor scipy.io.loadmat work for MATLAB arrays version 7.3. But the good part is that MATLAB version 7.3 files are hdf5 datasets. So they can be read using a number of tools, including NumPy.
```

- see `utils.py` for helper functions

### 2. Preprocess and reorder the rows in `B`, the average activity captured by the boxcar regressor for each block (the $\beta_i's$ ).

The bold signal is

$$
Y (\text{signal}) = \beta_1 x_1 + \beta_2 x_2 \dots + \beta_k x_k \\
  = \sum_i^k \beta_i x_i 
$$

### Data variables & structure 


<img src="http://drive.google.com/uc?export=view&id=1Dp27c1wmHMr0aNFHgBF9vZaCWFBpJGUI" style="height:230px"/>


In [2]:
import h5py
import warnings
import sys 
if not sys.warnoptions:
    warnings.simplefilter("ignore")
import os 
import glob
import time
from copy import deepcopy
import numpy as np
import pandas as pd 

from nilearn import datasets
from nilearn import surface
from nilearn import plotting
from nilearn.input_data import NiftiMasker, NiftiLabelsMasker
import nibabel as nib

from brainiak import image, io
from brainiak.isc import isc, isfc, permutation_isc
from brainiak.isc import compute_summary_statistic
import matplotlib.pyplot as plt
from mpl_toolkits import mplot3d 
import seaborn as sns 
import pandas as pd
from importlib import reload 
import scipy.io as sio

# import own functions
import utils
from utils import decode_variable, get_in_shape_levels, get_in_shape_blocks, plot_sub_isc_statmap
reload(utils)

%autosave 30
%matplotlib inline
sns.set(style = 'white', context='talk', font_scale=1, rc={"lines.linewidth": 2})

Autosaving every 30 seconds


# 1. Blocks

for blocks we take the `glm1` files

```Python
filename = 'beta_series_glm1_subj*_smooth.mat'
```

## 1.1 Load data in

In [48]:
num_subjects = 8

B_data_blocks = []
mask_data_blocks = []
Vmask_data_blocks = []
names_data_blocks = []

for i in range(num_subjects):
    idx = i+1
    
    # change filename to subject #
    data_dir = '/Users/Daphne/Desktop/beta_series_smooth/'
    filename = 'beta_series_glm1_subjk_smooth.mat'
    filename = filename.replace('k', str(idx))
    
    subject = h5py.File(data_dir+filename,'r') 
    #print(list(subject.keys()))
    print(f'Get data for subject {idx}')
    # load and save data for respective subject
    B = subject['B'].value
    mask = subject['mask'].value
    Vmask = subject['Vmask']
    
    # === decode level names ===
    names = decode_variable(data_dir+filename, 'names')
    
    # append to lists
    B_data_blocks.append(B)
    mask_data_blocks.append(mask)
    Vmask_data_blocks.append(Vmask)
    names_data_blocks.append(names)

Get data for subject 1
Get data for subject 2
Get data for subject 3
Get data for subject 4
Get data for subject 5
Get data for subject 6
Get data for subject 7
Get data for subject 8


In [49]:
print(names_data_blocks[0][0:10])

['Sn(1) vgfmri3_chase*bf(1)' 'Sn(1) vgfmri3_lemmings*bf(1)'
 'Sn(1) vgfmri3_bait*bf(1)' 'Sn(2) vgfmri3_plaqueAttack*bf(1)'
 'Sn(2) vgfmri3_helper*bf(1)' 'Sn(2) vgfmri3_zelda*bf(1)'
 'Sn(3) vgfmri3_lemmings*bf(1)' 'Sn(3) vgfmri3_plaqueAttack*bf(1)'
 'Sn(3) vgfmri3_zelda*bf(1)' 'Sn(4) vgfmri3_chase*bf(1)']


In [101]:
mask_arr = mask_data_blocks

path = '/Users/Daphne/Desktop/VGDL-fMRI-Python-Data-Analysis/Multivariate_analyses/data/'

np.save(path+'mask_arr', mask_arr)


## 1.2 Preprocessing

- Reorder the data in `B` so that all blocks are in the same order.
- Get data in the right shape to perform ISC with brainiak.


In [50]:
ISC_arr_blocks = []
ordered_dfs_blocks = []

for s in range(num_subjects):
    
    print(f'Preprocess fMRI blocks data for subject {s+1}')
    # get the betas and game order from this
    B_s = B_data_blocks[s].T # transpose to get [blocks, voxels]
    names_s = names_data_blocks[s]
    
    dfOrdered, B_ordered = get_in_shape_blocks(B_s, names_s)
    
    ISC_arr_blocks.append(B_ordered)
    ordered_dfs_blocks.append(dfOrdered)
    
ISC_arr_blocks = np.array(ISC_arr_blocks)

Preprocess fMRI blocks data for subject 1
Preprocess fMRI blocks data for subject 2
Preprocess fMRI blocks data for subject 3
Preprocess fMRI blocks data for subject 4
Preprocess fMRI blocks data for subject 5
Preprocess fMRI blocks data for subject 6
Preprocess fMRI blocks data for subject 7
Preprocess fMRI blocks data for subject 8


In [51]:
ordered_dfs_blocks[2].head(10) # check order

Unnamed: 0,block,session,0,1,2,3,4,5,6,7,...,220065,220066,220067,220068,220069,220070,220071,220072,220073,220074
2,bait,Sn(1),-2.481648,-2.389143,-1.37834,-1.863973,-2.141596,-2.16841,-1.980433,-1.621671,...,-0.169477,-0.378513,-0.492044,-0.544052,-0.152289,-0.129932,0.275107,0.07645,-0.039035,0.00781
8,bait,Sn(3),-0.36701,-0.196446,-0.791404,-0.575174,-0.289704,0.028271,0.245217,0.260419,...,0.32528,0.148578,0.014657,-0.060109,0.007292,0.096778,0.420411,0.26072,0.147974,0.505072
14,bait,Sn(5),1.79047,1.536919,0.929763,1.562272,1.970997,1.969237,1.636931,1.190901,...,0.553769,0.426035,0.306009,0.209851,1.350008,1.332053,0.302795,0.288404,0.305053,1.622812
3,chase,Sn(2),-2.51342,-2.53943,-1.154479,-1.404383,-1.73866,-1.94117,-1.944731,-1.694107,...,0.301603,0.545929,0.86838,1.089191,0.263032,0.316593,0.540303,0.688182,0.920353,0.561565
7,chase,Sn(3),-1.036697,-0.917888,-0.651373,-0.770894,-0.810088,-0.718126,-0.54052,-0.364067,...,0.89084,1.101161,1.255906,1.303964,0.119302,0.052019,1.137229,1.390685,1.587178,0.525781
12,chase,Sn(5),-0.951518,-0.895393,-0.903024,-0.755501,-0.707465,-0.66253,-0.588037,-0.562128,...,0.970452,0.671771,0.324982,0.065867,0.462323,0.496855,0.936099,0.691683,0.407407,0.512947
1,helper,Sn(1),0.052872,0.128258,-0.524896,-0.408537,-0.000135,0.450675,0.646527,0.506248,...,0.481889,0.064717,-0.405451,-0.853344,-0.287999,-0.098856,0.83801,0.377683,-0.137316,0.385629
9,helper,Sn(4),-2.311759,-2.429004,-2.139832,-1.691737,-1.467166,-1.420989,-1.489437,-1.539786,...,0.463179,0.641903,0.79012,0.824531,0.208585,0.327628,0.81351,1.019502,1.155964,0.771025
17,helper,Sn(6),-1.562838,-1.694666,-0.946912,-0.825241,-0.777357,-0.831859,-0.959117,-1.106628,...,0.59422,0.76835,0.916759,1.025551,1.285331,0.968586,0.62803,0.729031,0.796998,1.0519
4,lemmings,Sn(2),-0.056267,-0.152029,0.025156,0.128017,0.047057,-0.072915,-0.189363,-0.226938,...,0.317969,0.500673,0.686621,0.80859,-0.093731,-0.179231,0.600261,0.659544,0.744191,0.119157


In [52]:
# get datat in the right shape to perform ISC with brainiak
# swap those axes!
blocks_ISC = np.swapaxes(ISC_arr_blocks, 0, 1) # need to get [TRs, voxels, subjects]
blocks_ISC = np.swapaxes(blocks_ISC, 1, 2)

blocks_ISC.shape

(18, 220075, 8)

# 2. Games

- Collapse the session numbers to obtain a matrix `6 (games) x voxels`
- For each 3 rows, add the values for each respective column and take the average

In [69]:
ordered_dfs_games = []
B_ordered_games = []

for i, df in enumerate(ordered_dfs_blocks):
    
    # average betas by game name (block)
    df_game = ordered_dfs_blocks[i].groupby('block').mean()
    ordered_dfs_games.append(df_game)
    
    B_games = df_game.values # convert to np array for ISC analysis
    B_ordered_games.append(B_games)
    print(df_game.shape)

B_ordered_games = np.array(B_ordered_games)

(6, 220075)
(6, 220075)
(6, 220075)
(6, 220075)
(6, 220075)
(6, 220075)
(6, 220075)
(6, 220075)


In [70]:
B_ordered_games.shape

(8, 6, 220075)

In [72]:
# get datat in the right shape to perform ISC with brainiak
# swap those axes!
games_ISC = np.swapaxes(B_ordered_games, 0, 1) # need to get [TRs, voxels, subjects]
games_ISC = np.swapaxes(games_ISC, 1, 2)

games_ISC.shape

(6, 220075, 8)

In [73]:
games_ISC[0]

array([[ 0.87387085, -0.73672632, -0.35272933, ..., -0.50960539,
        -1.28747742, -1.01653192],
       [ 0.78006113, -0.71179776, -0.34955696, ..., -0.51687793,
        -1.22466587, -1.13050979],
       [ 0.65952041, -0.34637908, -0.41332698, ..., -0.15840919,
        -1.26114808, -0.9080451 ],
       ...,
       [-0.35900998, -0.40345447,  0.20852445, ..., -0.48222077,
         1.42995558,  1.4721481 ],
       [-0.74783314, -0.52178536,  0.13799743, ..., -0.6416115 ,
         0.85561911,  1.28345553],
       [ 1.64073255,  0.82627211,  0.71189812, ..., -0.24329543,
         1.10470839,  0.94246493]])

In [75]:
#sio.savemat('test.mat', {'mydata': arr})

In [76]:
games_isc_maps = isc(games_ISC, pairwise=False) # The output of ISC is a voxel by 
                           # participant matrix (showing the result of each individual with the group).

In [77]:
# get average ISC corr
# compute the average across participants 
avg_isc_corrs_games = isc(games_ISC, pairwise=False, summary_statistic='mean', tolerate_nans=True) 
#avg_blocks_isc_maps_med = isc(blocks_ISC, pairwise=False, summary_statistic='median', tolerate_nans=True) 
avg_isc_corrs_games = np.array(avg_isc_corrs_games)

In [79]:
avg_isc_corrs_games.shape

(220075,)

In [80]:
avg_isc_corrs_games

array([ 0.08120056,  0.13898273, -0.02791107, ...,  0.41648498,
        0.40611366,  0.16813451])

# 3. Levels

for levels we take the `glm25` files

```Python
filename = 'beta_series_glm25_subj*_smooth.mat'
```

## 3.1 Load in smooth levels data

In [3]:
num_subjects = 8

B_data_levels = []
mask_data_levels = []
Vmask_data_levels = []
names_data_levels = []

for i in range(num_subjects):
    idx = i+1
    
    # change filename to subject #
    data_dir = '/Users/Daphne/Desktop/beta_series_smooth/'
    filename = 'beta_series_glm25_subjk_smooth.mat'
    filename = filename.replace('k', str(idx))
    
    subject = h5py.File(data_dir+filename,'r') 
    #print(list(subject.keys()))
    print(f'Get data for subject {idx}')
    # load and save data for respective subject
    B = subject['B'].value
    mask = subject['mask'].value
    Vmask = subject['Vmask']
    
    # === decode level names ===
    names = decode_variable(data_dir+filename, 'names')
    
    # append to lists
    B_data_levels.append(B)
    mask_data_levels.append(mask)
    Vmask_data_levels.append(Vmask)
    names_data_levels.append(names)

Get data for subject 1
Get data for subject 2
Get data for subject 3
Get data for subject 4
Get data for subject 5
Get data for subject 6
Get data for subject 7
Get data for subject 8


In [4]:
names_data_levels[0][:10]

array(['Sn(1) vgfmri3_chase_run_1_block_1_instance_1*bf(1)',
       'Sn(1) vgfmri3_chase_run_1_block_1_instance_2*bf(1)',
       'Sn(1) vgfmri3_chase_run_1_block_1_instance_3*bf(1)',
       'Sn(1) vgfmri3_lemmings_run_1_block_2_instance_1*bf(1)',
       'Sn(1) vgfmri3_lemmings_run_1_block_2_instance_2*bf(1)',
       'Sn(1) vgfmri3_lemmings_run_1_block_2_instance_3*bf(1)',
       'Sn(1) vgfmri3_bait_run_1_block_3_instance_1*bf(1)',
       'Sn(1) vgfmri3_bait_run_1_block_3_instance_2*bf(1)',
       'Sn(1) vgfmri3_bait_run_1_block_3_instance_3*bf(1)',
       'Sn(2) vgfmri3_plaqueAttack_run_2_block_1_instance_1*bf(1)'],
      dtype='<U57')

In [5]:
B_data_levels[3].shape

(220075, 54)

## 3.2 Preprocess

In [6]:
ISC_arr_levels = []
ordered_dfs_levels = []
clean_names_arr = [] 

for s in range(num_subjects):
    
    print(f'Preprocess fMRI data for subject {s+1}')
    # get the betas and game order from this
    B_s = B_data_levels[s].T # transpose to get [blocks, voxels]
    names_s = names_data_levels[s]
    
    level_names, dfOrdered, B_ordered = get_in_shape_levels(B_s, names_s)
    
    ISC_arr_levels.append(B_ordered)
    ordered_dfs_levels.append(dfOrdered)
    clean_names_arr.append(level_names)
    
ISC_arr_levels = np.array(ISC_arr_levels)

Preprocess fMRI data for subject 1
Preprocess fMRI data for subject 2
Preprocess fMRI data for subject 3
Preprocess fMRI data for subject 4
Preprocess fMRI data for subject 5
Preprocess fMRI data for subject 6
Preprocess fMRI data for subject 7
Preprocess fMRI data for subject 8


In [7]:
ordered_dfs_levels[2].head(10) # check order

Unnamed: 0,game,session,instance,level,0,1,2,3,4,5,...,220065,220066,220067,220068,220069,220070,220071,220072,220073,220074
6,bait_run_1_block_3,Sn(1),1,1,-4.078312,-4.000875,-1.956279,-2.730596,-3.409707,-3.778748,...,0.262834,-0.203534,-0.644644,-0.920552,1.217692,0.997218,0.452487,0.077725,-0.249757,1.150459
7,bait_run_1_block_3,Sn(1),2,2,-3.392328,-3.240817,-1.761241,-2.410842,-2.916747,-3.145504,...,0.252098,0.056709,-0.109983,-0.234372,0.050432,0.028958,0.645768,0.440287,0.249902,0.42093
8,bait_run_1_block_3,Sn(1),3,3,-0.431101,-0.289687,-0.497847,-0.6004,-0.457012,-0.174014,...,-0.44108,-0.523962,-0.49571,-0.423318,-0.336802,-0.219711,0.158012,0.047596,0.026951,-0.169689
24,bait_run_3_block_3,Sn(3),1,4,-1.648642,-1.194657,-1.575249,-1.517209,-1.379423,-1.037922,...,-0.147432,-0.409104,-0.501229,-0.445755,-1.084002,-0.809134,0.000113,-0.086182,-0.102712,-0.760211
25,bait_run_3_block_3,Sn(3),2,5,-0.035355,0.374801,0.377494,-0.029966,-0.318555,-0.206694,...,0.223492,0.007802,-0.009043,0.161127,-0.363223,-0.031964,0.449439,0.326025,0.316556,-0.073289
26,bait_run_3_block_3,Sn(3),3,6,-0.544767,-0.546555,0.130397,-0.186888,-0.51239,-0.653918,...,0.247627,0.030478,-0.059471,0.034244,0.433186,0.424107,0.428847,0.169993,0.028412,0.538861
42,bait_run_5_block_3,Sn(5),1,7,1.601391,1.454119,0.164765,1.149916,1.918923,2.160033,...,0.924717,0.367479,-0.193542,-0.727334,0.534156,1.152215,0.457979,0.101021,-0.242947,1.891566
43,bait_run_5_block_3,Sn(5),2,8,0.366005,0.303563,-0.093082,0.477324,0.817497,0.823092,...,0.50601,0.198096,-0.087675,-0.34286,-0.575285,0.363717,0.349846,0.180999,0.005048,0.715211
44,bait_run_5_block_3,Sn(5),3,9,-0.652914,-0.811983,-0.51216,-0.161988,-0.017548,-0.132819,...,0.065125,-0.106048,-0.233297,-0.295426,0.840334,1.118382,-0.175487,-0.220223,-0.234851,1.281176
9,chase_run_2_block_1,Sn(2),1,1,-2.610836,-2.665869,-1.141243,-1.443048,-1.82891,-2.079875,...,0.337802,0.601614,0.911095,1.057064,0.139441,0.236523,0.562122,0.751068,0.979859,0.562228


In [13]:
ISC_arr_levels.shape

(8, 54, 220075)

In [14]:
# get datat in the right shape to perform ISC with brainiak
# swap those axes!
levels_ISC = np.swapaxes(ISC_arr_levels, 0, 1) # need to get [TRs, voxels, subjects]
levels_ISC = np.swapaxes(levels_ISC, 1, 2)

levels_ISC.shape 

(54, 220075, 8)

In [15]:
# HERE
path = '/Users/Daphne/Desktop/VGDL-fMRI-Python-Data-Analysis/Multivariate_analyses/'

arr = levels_ISC

np.save(path+'bold_data_levels', arr)

# 4. Boxcars (10 s)

for the Boxcars we take the `glm24` files

```Python
filename = 'beta_series_glm24_subj*_smooth.mat'
```

<font color=red>TODO: Boxcar data files are corrupted???</font> 

## 4.1 Load in smooth boxcar data

In [8]:
num_subjects = 8

B_data_boxcars = []
mask_data_boxcars = []
Vmask_data_boxcars = []
names_data_boxcars = []

for i in range(num_subjects):
    idx = i+1
    
    # change filename to subject #
    data_dir = '/Users/Daphne/Desktop/beta_series_smooth/'
    filename = 'beta_series_glm24_subjk_smooth.mat'
    filename = filename.replace('k', str(idx))
    
    if idx==4:
        print('Skipping sub 4, because is corrupted')
    
    else:
        subject = h5py.File(data_dir+filename,'r') 

        #print(list(subject.keys()))
        print(f'Get data for subject {idx}')
        # load and save data for respective subject
        B = subject['B'].value
        mask = subject['mask'].value
        Vmask = subject['Vmask']

        # === decode level names ===
        names = decode_variable(data_dir+filename, 'names')

        # append to lists
        B_data_boxcars.append(B)
        mask_data_boxcars.append(mask)
        Vmask_data_boxcars.append(Vmask)
        names_data_boxcars.append(names)

Get data for subject 1
Get data for subject 2
Get data for subject 3
Skipping sub 4, because is corrupted


OSError: Unable to open file (file signature not found)

In [5]:
os.path.isfile('/Users/Daphne/Desktop/beta_series_smooth/beta_series_glm24_subj4_smooth.mat')

True

In [None]:
print(names_data_blocks[0][0:10])

# Quicklinks & Resources
    
- [Brainiak ISC documentation](https://brainiak.org/docs/brainiak.html#module-brainiak.isc)
- [Brainiak specific examples](https://github.com/brainiak/brainiak/tree/master/examples)