<h4><center> AI Project Proposal Draft 2 </h4>
<h5>Ziqi Chen<br>
CS 344<br>
May 3rd, 2019</h5>



<h5>Vision — Give an overview of the project and its purpose.</h5>
This project was inspired from the neuroscience principle that our visual cortex is a complex system that designates specialized area or group of areas for the processing of different types of visual stimuli. Based on this idea, the activation patterns in our brains should be unique when we are seeing different classes of visual stimuli, such as a human face versus a cat versus an inanimate object.  
With the recent advance of neuroimaging technologies, we have more tools to shed light upon how our brain works. The brain scan data is piling up, but it isn't always easy for humans to pick apart the patterns and connections hidden in the fMRI images right away.  
In this project, I am interested to learn how to apply the strengths of Machine-Learning models we've learned about in class: pattern-extraction, to simple analysis of neuroimagine data. This project trains ML models on a classification task, one that uses a form of represented fMRI data to find hidden distinct brain activation patterns that result from seeing a class of visual stimuli, and predicts the type of stimuli the person was seeing when this brain activity was recorded. 


<h5>Background</h5>
This project aims to do a multiclass (3) classification of brain imaging samples. I'm using the [Haxby et al (2001) dataset](https://zenodo.org/record/1203329#.XNpTO0MpBqs) to perform a 3-way classification, trying to train the models to distinguish the fMRI brain scans that result from seeing a face, a cat, or a house. The dataset contains block-design fMRI data for 6 subjects who viewed 12 runs of repeated visual presentations of various stimuli. For the purpose of this project, only subjects 1 - 4 were included as they have complete fMRI data and the corresponding text labels, describing the stimuli type used in each trial. 
In terms of fMRI data manipulation, This project relies on NiLearn, a Python Scikit-Learn based library with high-level functions for manipulation and analysis of neuroimaging data. In particular, I reference the [documentation on the NiftiMasker class](http://nilearn.github.io/modules/generated/nilearn.input_data.NiftiMasker.html) which does the heavy-lifting of readying the fMRI data to be used by models. It applies a mask to 4D fMRI images to extract 2D arrays, with each datapoint representing voxel of brain region activation * time. In loadingData.py, I cited [this NiLearn example](http://nilearn.github.io/auto_examples/plot_decoding_tutorial.html#sphx-glr-auto-examples-plot-decoding-tutorial-py) to create a data matrix from the fMRIs.
In terms of technologies used, I experimented with a simple [DNN](https://github.com/kvlinden-courses/cs344-code/blob/master/u08features/keras-mnist.ipynb) and [CNN](https://github.com/kvlinden-courses/cs344-code/blob/master/u09classification/keras-cnn.ipynb) models. Neural networks, especially Convolutional Neural Networks, are known for image-processing and classification, hence the inclusion in this project.  
Last but not least, I used Linear Support Vector Machines as one of the models. Support Vector Machines are a class of supervised machine learning methods for classification and regression tasks. SVMs represent the training data as points in space (cited from [https://en.wikipedia.org/wiki/Support-vector_machine], and use a subset of training points in the decision function, also called support vectors. Its basic philosophy is to find the hyper-plane that most efficiently separates each class of samples from each other (cited from [https://www.analyticsvidhya.com/blog/2017/09/understaing-support-vector-machine-example-code/].  
From talking with Professor Vander Linden during the walkthrough, I learned that Support Vector Machines have been studied for a couple decades, and they have strong support for smaller datasets. Considering the size of this dataset, SVMs seem a good choice. Additionally, Support Vector Classification is known for its strength in classification problems that have high dimensional X inputs (cited from [https://scikit-learn.org/stable/modules/svm.html]), and that is the case in this project. It is effective not only in binary classification, but also multiclass classification problems. For the latter, SVMs supports two types of approaches: one-vs-rest or one-vs-one. One-vs-rest establishes one class and compares it against the other, whereas one-vs-one compares every class with every other, hence more computationally demanding (cited from [https://stats.stackexchange.com/questions/142325/svm-three-way-classification]. This project uses the former approach, one-vs-rest. The implementation of this model is based on [https://scikit-learn.org/stable/modules/svm.html#multi-class-classification].  
The fine-tuning of SVM hyperparameters came from documentation on the set__params method from [this page](https://scikit-learn.org/stable/modules/generated/sklearn.pipeline.Pipeline.html). 

<h5>Implementation — Summarize your implementation and, if appropriate, how it extends on the work you’ve referenced</h5>
Given NiLearn's high-level pre-built functions, some of the implementation details will be carried out automatically in 
the library.
After reading tutorials, I think that the first step in implementation is downloading the dataset which will then be loaded
into Python. Next, I'll use NiLearn functions transforming 3D images with a time series into 2D arrays that Scikit accepts.
Then a Support Vector Machine estimator will be trained on the data for the first two tasks. The parameters will be tweaked
to achieve a satisfiable accuracy. Then, I'll need to research NiLearn functions that will plot the resulting brain activities
from the prediction result arrays. Finally, for the third task, a correlation matrix need to be created based on time series data.
A GraphicLasso estimator will be fit on the correlation matrix. I will look for existing projects on functional connectivity maps
and compare the model's prediction with their findings. 

<h5>Results — Give the results of your system and comparing them with other similar work</h5>


**Loading and Preprocessing of data**  
Here I first download the Haxby dataset and then use the NiftiMasker class from NiLearn to preprocess the 4D fMRI images into numpy matrices. 

In [4]:
# loadingData.py employs the NiLearn Python library to download the Haxby dataset,
# which contains the brain scan images of four subjects and the accompanying labels
# of the images they looked at while their brain activity was captured.
# NiftiMasker function in the NiLearn library was used to transform the 4D brain scan images into 2D Numpy arrays,
# which are vectors in which each datapoint represents the extrapolated brain tissue voxel * time

from nilearn import datasets, image, plotting
from nilearn.input_data import NiftiMasker
from nilearn.image.image import mean_img
from nilearn.image import index_img
import numpy as np
import pandas as pd

#import Haxby et al.(2001): Faces and Objects in Ventral Temporal Cortex (fMRI)
# Subjects 5 and 6 don't have complete label or anatomical information, only included subjects 1-4
haxby_dataset = datasets.fetch_haxby(subjects=4)

#load nifti images for the given subjects. Range 0-3
#defaults to subject 2
def loadSubject(subjectNum = 1):
    # 'func' is a list of filenames: one for each subject
    fmri_filename = haxby_dataset.func[subjectNum]
    return fmri_filename

fmri_filename = loadSubject(0)
behavioral = pd.read_csv(haxby_dataset.session_target[0], sep=" ")
conditions = behavioral['labels']

facecat_mask = conditions.isin(['face', 'cat'])
conditions_facecat = conditions[facecat_mask]
session_facecat = behavioral[facecat_mask].to_records(index = False)

facehouse_mask = conditions.isin(['face', 'house'])
conditions_facehouse = conditions[facehouse_mask]
session_facehouse = behavioral[facehouse_mask].to_records(index = False)

threeway_mask = conditions.isin(['face', 'house', 'cat'])
conditions_threeway = conditions[threeway_mask]
session_threeway = behavioral[threeway_mask].to_records(index = False)
mask_filename = haxby_dataset.mask

#masking the data from 4D image to 2D array: voxel x time
#with smothing and standardization
masker = NiftiMasker(mask_img=mask_filename, smoothing_fwhm=4, standardize=True, memory="nilearn_cache", memory_level=1)
X = masker.fit_transform(fmri_filename)

# Apply our condition_mask to subject 1's brain scans:
FC = X[facecat_mask]
FH = X[facehouse_mask]
FHC = X[threeway_mask]

# References
# Haxby, J., Gobbini, M., Furey, M., Ishai, A., Schouten, J., and Pietrini, P. (2001). Distributed and overlapping representations of faces and objects in ventral temporal cortex. Science 293, 2425-2430.

def processSubject(sub):
    mask_filename = haxby_dataset.mask
    # masking the data from 4D image to 2D array: voxel x time
    # with smothing and standardization
    masker = NiftiMasker(mask_img=mask_filename, smoothing_fwhm=4, standardize=True, memory="nilearn_cache",
                         memory_level=1)
    X = masker.fit_transform(loadSubject(sub))
    behavioral = pd.read_csv(haxby_dataset.session_target[sub], sep=" ")
    conditions = behavioral['labels']
    threeway_mask = conditions.isin(['face', 'house', 'cat'])
    conditions_threeway = conditions[threeway_mask]
    FHC = X[threeway_mask]
    return FHC, conditions_threeway

def processSessions(sub):
    behavioral = pd.read_csv(haxby_dataset.session_target[sub], sep=" ")
    conditions = behavioral['labels']
    threeway_mask = conditions.isin(['face', 'house', 'cat'])
    session_threeway = behavioral[threeway_mask].to_records(index = False)
    return session_threeway

X_all, Y_all = processSubject(0)
session_all =  processSessions(0)
for sub in range(1, 4):
    x, y = processSubject(sub)
    session = processSessions(sub)
    X_all = np.concatenate((X_all, x), axis = 0)
    Y_all = np.concatenate((Y_all, y))
    session_all = np.concatenate((session_all, session))

Shape of concatenated transformed fMRI data:

In [5]:
X_all.shape

(1296, 39912)

Example row in the resulting 2D array: 

In [6]:
X_all[0]

array([-1.121251  , -1.1173762 , -1.0611467 , ...,  0.84500587,
        0.89118195,  0.96187204], dtype=float32)

Shape of concatenated sessions:  

In [8]:
session_all.shape

(1296,)

First fifteen tuples recording sessions: 

In [9]:
session_all[0:15]


array([('face', 0), ('face', 0), ('face', 0), ('face', 0), ('face', 0),
       ('face', 0), ('face', 0), ('face', 0), ('face', 0), ('cat', 0),
       ('cat', 0), ('cat', 0), ('cat', 0), ('cat', 0), ('cat', 0)],
      dtype=(numpy.record, [('labels', 'O'), ('chunks', '<i8')]))

**Three-way classification with simple Dense Neural Net**  
Here I use a simple neural network on the processed 2D dataset. This is a different type of ML model than the example SVM in the binary classification tutorial. 

In [10]:
import numpy as np
from keras import models
from keras.layers import Dense
from sklearn.preprocessing import OneHotEncoder

#three-way classification with NN
X_train = X_all[:800]
X_val = X_all[800:]

#need to one-hot encode the Y labels
enc = OneHotEncoder()
#cited from https://machinelearningmastery.com/multi-class-classification-tutorial-keras-deep-learning-library/
Y = enc.fit_transform(Y_all[:, np.newaxis]).toarray()
Y_train = Y[:800]
Y_val = Y[800:]

#DNN on all 4 subjects (1294 trials)
model = models.Sequential()
model.add(Dense(32, input_dim = 39912, activation='relu'))
model.add(Dense(16, input_dim = 32, activation='relu'))
model.add(Dense(3, activation='softmax'))
model.summary()

model.compile(optimizer='adam', loss='categorical_crossentropy', metrics=['accuracy'])
model.fit(X_train, Y_train, batch_size=324, epochs=10, verbose=1)
score = model.evaluate(X_val, Y_val)


_________________________________________________________________
Layer (type)                 Output Shape              Param #   
dense_1 (Dense)              (None, 32)                1277216   
_________________________________________________________________
dense_2 (Dense)              (None, 16)                528       
_________________________________________________________________
dense_3 (Dense)              (None, 3)                 51        
Total params: 1,277,795
Trainable params: 1,277,795
Non-trainable params: 0
_________________________________________________________________
Epoch 1/10
Epoch 2/10
Epoch 3/10
Epoch 4/10
Epoch 5/10
Epoch 6/10
Epoch 7/10
Epoch 8/10
Epoch 9/10
Epoch 10/10


Model with all four subjects data
Test loss & Test accuracy: 

In [11]:
score[0]

5.149821404487856

In [27]:
score[1]

0.40524193548387094

**Three-way classification with Convolutional Neural Network**  
Again, CNN are a different class of model specializing in image classification. Here I use a form of the original brain scans instead of the masker-transformed voxel time series. 

In [29]:
from keras import layers, models
from nilearn.image import crop_img, index_img, iter_img
from sklearn.preprocessing import OneHotEncoder
import numpy as np

def loadFilteredImages(sub):
    behavioral = pd.read_csv(haxby_dataset.session_target[sub], sep=" ")
    conditions = behavioral['labels']
    print("length of all trials: ", len(conditions))
    threeway_mask = conditions.isin(['face', 'house', 'cat'])
    images = index_img(loadSubject(sub), threeway_mask)
    return images

#for subject 1, returned images are a set of 324 frames/trials, each containing 40 slices of 64*64 images
#original shape: (40, 64, 64, 324)
subj1_images = loadFilteredImages(0)
images = np.empty((40, 64, 64))
#use np.stack to reshape the 4D image array to (324, 40, 64, 64)
images = np.stack([img.dataobj for i, img in enumerate(iter_img(subj1_images))])
train_images = images[:250]
val_images = images[250:]
#need to one-hot encode the Y labels
enc = OneHotEncoder()
#cited from https://machinelearningmastery.com/multi-class-classification-tutorial-keras-deep-learning-library/
Y = enc.fit_transform(conditions_threeway[:, np.newaxis]).toarray()
Y_train = Y[:250]
Y_val = Y[250:]

#cited from https://github.com/fchollet/deep-learning-with-python-notebooks/blob/master/5.1-introduction-to-convnets.ipynb
model = models.Sequential()
model.add(layers.Conv2D(32, kernel_size = (3, 3), activation='relu', input_shape=(40, 64, 64)))
model.add(layers.MaxPooling2D((2, 2)))
model.add(layers.Conv2D(64, (3, 3), activation='relu'))
model.add(layers.MaxPooling2D((2, 2)))
model.add(layers.Conv2D(64, (3, 3), activation='relu'))
model.add(layers.Flatten())
model.add(layers.Dense(64, activation='relu'))
model.add(layers.Dense(3, activation='softmax'))
model.summary()

model.compile(optimizer='rmsprop',
              loss='categorical_crossentropy',
              metrics=['accuracy'])
model.fit(train_images, Y_train, epochs=7, batch_size=100)

test_loss, test_acc = model.evaluate(val_images, Y_val)


length of all trials:  1452
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
conv2d_13 (Conv2D)           (None, 38, 62, 32)        18464     
_________________________________________________________________
max_pooling2d_9 (MaxPooling2 (None, 19, 31, 32)        0         
_________________________________________________________________
conv2d_14 (Conv2D)           (None, 17, 29, 64)        18496     
_________________________________________________________________
max_pooling2d_10 (MaxPooling (None, 8, 14, 64)         0         
_________________________________________________________________
conv2d_15 (Conv2D)           (None, 6, 12, 64)         36928     
_________________________________________________________________
flatten_5 (Flatten)          (None, 4608)              0         
_________________________________________________________________
dense_12 (Dense)             (None, 64)         

Test loss & test accuracy: 

In [30]:
test_loss

10.237168692253732

In [31]:
test_acc

0.3648648616429922

**Three-way classification on Subject 1 with SVM**  
This module extends upon the tutorial I followed by doing three-way instead of binary classification. 

In [17]:
from sklearn.feature_selection import SelectPercentile, f_classif, chi2, SelectKBest
from sklearn.svm import LinearSVC, SVC
from sklearn.pipeline import Pipeline
from sklearn.model_selection import LeaveOneGroupOut, cross_val_score
import matplotlib.pyplot as plt

#cited from: https://nilearn.github.io/auto_examples/02_decoding/plot_haxby_anova_svm.html
    # Define the dimension reduction to be used.
    # Here we use a classical univariate feature selection based on F-test,
    # namely Anova. When doing full-brain analysis, it is better to use
    # SelectPercentile, keeping 5% of voxels
    # (because it is independent of the resolution of the data).
feature_selection = SelectPercentile(f_classif, percentile=5)
k_features = SelectKBest(f_classif, k = 7)

# Output accuracy
# Define the cross-validation scheme used for validation.
# Here we use a LeaveOneGroupOut cross-validation on the session group
# which corresponds to a leave-one-session-out
def modelAccuracy(model, X, conditions, groups):
    cv = LeaveOneGroupOut()
    # Compute the prediction accuracy for the different folds (i.e. session)
    cv_scores = cross_val_score(model, X, conditions, cv=cv, groups=groups)
    # Return the corresponding mean prediction accuracy
    classification_accuracy = cv_scores.mean()
    # Print the results
    print("Classification accuracy: %.4f / Chance level: %f" %
          (classification_accuracy, 1. / 3))

#one-vs-the-rest linear kernel
#cited from https://scikit-learn.org/stable/modules/svm.html#multi-class-classification
#Pipeline ANOVA SVM with anova F-value, percetile feature selection. This is univariate feature selection

Fitting a linear SVC with Pipelined Anova f-value feature selection on subject 1 (324) trials: 

In [18]:
lin_svc = LinearSVC()
facecathouse_svc = Pipeline([('anova', feature_selection), ('svc', lin_svc)])
facecathouse_svc.set_params(svc__C = 10, svc__max_iter = 2500)
facecathouse_svc.set_params(anova__percentile = 3.3, svc__max_iter = 750)
facecathouse_svc.fit(FHC, conditions_threeway)

Pipeline(memory=None,
     steps=[('anova', SelectPercentile(percentile=3.3,
         score_func=<function f_classif at 0x000001F09A925620>)), ('svc', LinearSVC(C=10, class_weight=None, dual=True, fit_intercept=True,
     intercept_scaling=1, loss='squared_hinge', max_iter=750,
     multi_class='ovr', penalty='l2', random_state=None, tol=0.0001,
     verbose=0))])

Pipelined SVM with linear kernel accuracy:

In [19]:
modelAccuracy(facecathouse_svc, FHC, conditions_threeway, session_threeway)
cross_validation = cross_val_score(facecathouse_svc, FHC, conditions_threeway, cv = 6, verbose = 1)

Classification accuracy: 0.6605 / Chance level: 0.333333


[Parallel(n_jobs=1)]: Using backend SequentialBackend with 1 concurrent workers.
[Parallel(n_jobs=1)]: Done   6 out of   6 | elapsed:    3.2s finished


Pipelined SVM with linear kernel cross validation score: 

In [21]:
cross_validation.mean()

0.7499999999999999

**Three-way classification with SVM on all four subjects**  
This module extends upon previous work by including all four subjects with complete data in the analysis.  

In [24]:
# fitting on all four subjects
lin_svc1 = LinearSVC()
allSubs_svc = Pipeline([('anova', feature_selection), ('svc', lin_svc1)])
allSubs_svc.set_params(anova__percentile = 2.9, svc__max_iter = 5000)
allSubs_svc.fit(X_all, Y_all)
modelAccuracy(allSubs_svc, X_all, Y_all, session_all)
cross_validation = cross_val_score(allSubs_svc, X_all, Y_all, cv = 7, verbose = 1)
cross_validation.mean()

Classification accuracy: 0.6111 / Chance level: 0.333333


[Parallel(n_jobs=1)]: Using backend SequentialBackend with 1 concurrent workers.
[Parallel(n_jobs=1)]: Done   7 out of   7 | elapsed:   35.3s finished


0.6442673314698698

<h5>Results</h5>

 - **DNN**  
   Model test loss: 5.15, test accuracy: 0.4052  
   This is only a little better than chance. I think one of the reasons may be due to the NiftiMasker transformed arrays having (324 * 39912) shapes. The 39912 dimension is probably too large for a simple DNN to handle well
 - **CNN**  
   Model test loss: 10.24, test accuracy: 0.3649  
   This model is only performing slightly better than chance. Compared to the last DNN model, it has disproportionally large test loss. I wonder if this is related to the shape of the fMRI data - the shape of the image is 40 * 64 * 64 for each trial, which is 3D. This dataset probably requires a Conv3D network build.
 - **Linear SVC with single subject**  
   Model accuracy: 0.6605, cross-validation: 0.7500  
   The accuracy is not in the 90% percentile, but it is significantly better than chance, indicating that the model is actually picking out patterns. It is only 4% lower than the [0.7037](https://nilearn.github.io/auto_examples/02_decoding/plot_haxby_anova_svm.html) accuracy the binary model achieved. 
 - **Linear SVC with 4 subjects**  
   Model accuracy: 0.6111, cross-validation: 0.6443  
   This accuracy is lower than the model trained and tested on one subject, probably being hinged by idiosyncracies among the subjects. It is still significantly higher than pure chance, which means that there exist significant commonalities in how our brains activate in response to different visual stimuli. 

An example anatomical brain scan from the dataset:
![title](anatomical_subj2.png)

An example functional MRI brain scan:
![title](functional_subj2.png)

Reverse-plotting the SVM learned weights on brain regions:
Distinguishing faces vs cats:
![title](faceCat.png)
Distinguishing faces vs houses:
![title](faceHouse.png)

<h5>Implications</h5>
Firstly, this project uses a public datasets of fMRI data with no identifiable personal information.    
I became interested in this project because it is very cool that machine-learning algorithms have this potential in analysing our brains. Neural networks were inspired by the biological layout of neurons, they may very well be able to help humans learn more about the biology of our brains. Other ML algorithms such as SVM show promise as well.    
Applying ML algorithmic analyses to brain imaging data really goes to emphasize the side of our brains that are predictable / computational. One of my classmates mentioned that projects in this area could make it possible to "read your mind". If that really becomes a tangible reality to predict what someone's experiencing mentally, there will be a lot of ethical implications. We will need to consider how we can protect our privacy and the personal integrity of our thought.    
On the other hand, ML and particularly CNN have positive implications for the medical field. Particularly the area in diagnosis of neurological disorders can benefit from use of ML. Another area that professional projects can impact is brain-computer interfaces, where researchers are already trying to decode brain activations and control prosthetics for the bettermenf of life quality for disabled people. The area that we struggle at: finding hidden connections and patterns in brain-scans, is an area of strength for AI algorithms, and we can use them to better understand how brain activation patterns indicate neurological sickness or translate to limb movements. In these areas, we can use ML methods to advance our current knowledge and improve people's well-being. 
