# PyKale Tutorial: Domain Adaptation (Generalization) for Autism Detection with Multi-site Brain Imaging Data
| [Open in Colab](https://colab.research.google.com/github/sz144/pykale/blob/brain-example/examples/autism_detection/tutorial.ipynb) (click `Runtime` → `Run all (Ctrl+F9)` |  [Launch Binder](https://mybinder.org/v2/gh/pykale/pykale/HEAD?filepath=examples%2Fautism_detection%2Ftutorial.ipynb) (click `Run` → `Run All Cells`) |

## Overview

- Pre-processing:
    - [Data loading](#Data-Preparation)
    - [Construct brain networks](#Extracting-Brain-Networks-Features)
- Machine learning pipeline:
    - [Baseline: Ridge classifier](#Baseline)
    - [Domain adaptation](#Domain-Adaptation)
    - [Domain generalization](#Domain-Generalization)

**Reference:**

[1] Cameron Craddock, Yassine Benhajali, Carlton Chu, Francois Chouinard, Alan Evans, András Jakab, Budhachandra Singh Khundrakpam, John David Lewis, Qingyang Li, Michael Milham, Chaogan Yan, Pierre Bellec (2013). The Neuro Bureau Preprocessing Initiative: open sharing of preprocessed neuroimaging data and derivatives. In *Neuroinformatics 2013*, Stockholm, Sweden.

[2] Abraham, A., Pedregosa, F., Eickenberg, M., Gervais, P., Mueller, A., Kossaifi, J., ... & Varoquaux, G. (2014). Machine learning for neuroimaging with scikit-learn. *Frontiers in neuroinformatics*, 14.

[3] Zhou, S., Li, W., Cox, C.R., & Lu, H. (2020). Side Information Dependence as a Regularizer for Analyzing Human Brain Conditions across Cognitive Experiments. *AAAI 2020*, New York, USA. [[Link](https://ojs.aaai.org//index.php/AAAI/article/view/6179)]

[4] Zhou, S. (2022). Interpretable Domain-Aware Learning for Neuroimage Classification (Doctoral dissertation, University of Sheffield). [[Link](https://etheses.whiterose.ac.uk/31044/1/PhD_thesis_ShuoZhou_170272834.pdf)]

## Setup

In [None]:
if 'google.colab' in str(get_ipython()):
    print('Running on CoLab')
    !pip uninstall --yes imgaug && pip uninstall --yes albumentations && pip install git+https://github.com/aleju/imgaug.git
    !pip install git+https://github.com/pykale/pykale.git
    !git clone https://github.com/pykale/pykale.git
    %cd pykale/examples/autism_detection
else:
    print('Not running on CoLab')

This imports required modules.

In [None]:
import os

%matplotlib inline
import matplotlib.pyplot as plt
import numpy as np
from nilearn.datasets import fetch_abide_pcp
import pandas as pd
from config import get_cfg_defaults

import sys

from kale.utils.download import download_file_by_url
from kale.interpret import visualize

In [None]:
cfg_path = "configs/tutorial.yaml" # Path to `.yaml` config file

cfg = get_cfg_defaults()
cfg.merge_from_file(cfg_path)
cfg.freeze()
print(cfg)

## Data Preparation

### Fetch ABIDE fMRI timeseries

In [None]:
root_dir = cfg.DATASET.ROOT
pipeline = cfg.DATASET.PIPELINE  # fmri pre-processing pipeline
atlas = cfg.DATASET.ATLAS
site_ids = cfg.DATASET.SITE_IDS
abide = fetch_abide_pcp(data_dir=root_dir, pipeline=pipeline, 
                        band_pass_filtering=True, global_signal_regression=False, 
                        derivatives=atlas, quality_checked=False,
                        SITE_ID=site_ids, 
                        verbose=0)

### Read Phenotypic data

In [None]:
pheno_file = os.path.join(cfg.DATASET.ROOT, "ABIDE_pcp/Phenotypic_V1_0b_preprocessed1.csv")
pheno_info = pd.read_csv(pheno_file, index_col=0)

View Phenotypic data

In [None]:
pheno_info.head()

### Read timeseries from files

In [None]:
data_dir = os.path.join(root_dir, "ABIDE_pcp/%s/filt_noglobal" % pipeline)
use_idx = []
time_series = []
for i in pheno_info.index:
    data_file_name = "%s_%s.1D" % (pheno_info.loc[i, "FILE_ID"], atlas)
    data_path = os.path.join(data_dir, data_file_name)
    if os.path.exists(data_path):
        time_series.append(np.loadtxt(data_path, skiprows=0))
        use_idx.append(i)

Use "DX_GROUP" (autism vs control) as labels, and "SITE_ID" as covariates

In [None]:
pheno = pheno_info.loc[use_idx, ["SITE_ID", "DX_GROUP"]].reset_index(drop=True)

## Extracting Brain Networks Features

In [None]:
from nilearn.connectome import ConnectivityMeasure

correlation_measure = ConnectivityMeasure(kind='correlation', vectorize=True)
brain_networks = correlation_measure.fit_transform(time_series)

## Machine Learning for Multi-site Data

### Cross validation Pipeline

In [None]:
from sklearn.metrics import accuracy_score
from sklearn.preprocessing import LabelEncoder
import torch
from torch.nn.functional import one_hot

def cross_validation(x, y, covariates, estimator, domain_adaptation=False, domain_generalization=False):
    results = {"Target": [], "Num_samples": [], "Accuracy": []}
    unique_covariates = np.unique(covariates)
    n_covariates = len(unique_covariates)
    le = LabelEncoder()
    covariate_mat = one_hot(torch.as_tensor(le.fit_transform(covariates)))
    
    for tgt in unique_covariates:
        idx_tgt = np.where(covariates == tgt)
        idx_src = np.where(covariates != tgt)
        x_tgt = brain_networks[idx_tgt]
        x_src = brain_networks[idx_src]
        y_tgt = y[idx_tgt]
        y_src = y[idx_src]        
        
        if domain_generalization:
            estimator.fit(x_src, y_src, covariate_mat[idx_src])
        elif domain_adaptation:
            estimator.fit(np.concatenate((x_src, x_tgt)), y_src, 
                          np.concatenate((covariate_mat[idx_src], covariate_mat[idx_tgt])))
        else:            
            estimator.fit(x_src, y_src)
        y_pred = estimator.predict(x_tgt)
        results["Accuracy"].append(accuracy_score(y_tgt, y_pred))
        results["Target"].append(tgt)
        results["Num_samples"].append(x_tgt.shape[0])
    
    mean_acc = sum([results["Num_samples"][i] * results["Accuracy"][i] for i in range(n_covariates)])
    mean_acc /= x.shape[0]
    results["Target"].append("Average")
    results["Num_samples"].append(x.shape[0])
    results["Accuracy"].append(mean_acc)
    
    return pd.DataFrame(results)

### Baseline

In [None]:
from sklearn.linear_model import RidgeClassifier

estimator = RidgeClassifier()
res_df = cross_validation(brain_networks, pheno["DX_GROUP"].values, pheno["SITE_ID"], estimator)

In [None]:
res_df

### Domain Adaptation

In [None]:
from kale.pipeline.multi_domain_adapter import CoIRLS
estimator = CoIRLS(kernel=cfg.MODEL.KERNEL, lambda_=cfg.MODEL.LAMBDA_, alpha=cfg.MODEL.ALPHA)
res_df = cross_validation(brain_networks, pheno["DX_GROUP"].values, pheno["SITE_ID"], 
                          estimator, domain_adaptation=True)

In [None]:
res_df