# Getting Data

First, we want to grab some graphs and subject covariates from a web-accessible url.
Below, you will be getting the following dataset:

| Property | Value |
|:--------:|:-----:|
| Dataset  | SWU4  |
| N-Subjects  | 454   |
| Scans-per-subjects | 2 |
| Atlases | Desikan, CPAC200, Talairach |
| Desikan Nodes | 70 |
| CPAC200 Nodes | 200 |
| Talairach Nodes | 1105 |

The covariates you have are: `SUBID, SESSION, AGE_AT_SCAN_1, SEX, RESTING_STATE_INSTRUCTION, TIME_OF_DAY, SEASON, SATIETY, LMP`. There are other columns in the `.csv` file (downloaded in the next step) but they are populated with a `#` meaning that the value was not recorded.

Running the cell below will get you the data.

In [1]:
%%bash

mkdir -p /home/sic-user/example/data/SWU4/desikan
mkdir -p /home/sic-user/example/data/SWU4/CPAC200
mkdir -p /home/sic-user/example/data/SWU4/Talairach

cd /home/sic-user/example/data/SWU4/desikan
wget --quiet -r -nH -nd -np -R index.html* http://openconnecto.me/mrdata/share/dti/ndmg_v0033/SWU4/desikan/
cd /home/sic-user/example/data/SWU4/CPAC200
wget --quiet -r -nH -nd -np -R index.html* http://openconnecto.me/mrdata/share/dti/ndmg_v0033/SWU4/CPAC200/
cd /home/sic-user/example/data/SWU4/Talairach
wget --quiet -r -nH -nd -np -R index.html* http://openconnecto.me/mrdata/share/dti/ndmg_v0033/SWU4/Talairach/
cd /home/sic-user/example/data/SWU4/
wget --quiet http://openconnecto.me/mrdata/share/dti/covariates/SWU4.csv

## Loading Graphs + Covariates
Run the following cells of code to load the graphs into your computer, as well as the covariates.

In [2]:
import numpy as np
import networkx as nx
import scipy as sp
import matplotlib.pyplot as plt
import os
import csv

from collections import OrderedDict

In [3]:
# Initializing dataset names
dataset_names = 'SWU4'

basepath = '/home/sic-user/example/data/'
atlas = 'desikan' # or 'CPAC200', or 'Talairach'
dir_names = basepath + '/' + dataset_names + '/' + atlas

fs = OrderedDict()
fs[dataset_names] = [root + "/" + fl for root, dirs, files in os.walk(dir_names)
                     for fl in files if fl.endswith(".graphml")]

ps = '/home/sic-user/example/data/SWU4/SWU4.csv'

print "Datasets: " + ", ".join([fkey + ' (' + str(len(fs[fkey])) + ')'
                                for fkey in fs])
print "Total Subjects: %d" % (sum([len(fs[key]) for key in fs]))

Datasets: SWU4 (454)
Total Subjects: 454


In [4]:
def loadGraphs(filenames, verb=False):
    """
    Given a list of files, returns a dictionary of graphs

    Required parameters:
        filenames:
            - List of filenames for graphs
    Optional parameters:
        verb:
            - Toggles verbose output statements
    """
    #  Initializes empty dictionary
    gstruct = OrderedDict()
    for idx, files in enumerate(filenames):
        if verb:
            print "Loading: " + files
        #  Adds graphs to dictionary with key being filename
        fname = os.path.basename(files)
        gstruct[fname] = nx.read_graphml(files)
    return gstruct

def constructGraphDict(names, fs, verb=False):
    """
    Given a set of files and a directory to put things, loads graphs.

    Required parameters:
        names:
            - List of names of the datasets
        fs:
            - Dictionary of lists of files in each dataset
    Optional parameters:
        verb:
            - Toggles verbose output statements
    """
    #  Loads graphs into memory for all datasets
    graphs = OrderedDict()
    if verb:
        print "Loading Dataset: " + names
    # The key for the dictionary of graphs is the dataset name
    graphs[names] = loadGraphs(fs[names], verb=verb)
    return graphs

In [5]:
graphs = constructGraphDict(dataset_names, fs, verb=False)

In [6]:
# This gets age and sex, respecitvely.
tmp = csv.reader(open(ps)) # this is the whole phenotype file
pheno = OrderedDict()
triple = [[t[0].strip(), t[2], int(t[3] == '2')] for t in tmp
          if t[3] != '#' and t[2] != '#'][1:]  # female=1->0, male=2->1

for idx, trip in enumerate(triple):
    pheno[trip[0]] = trip[1:]

# Run up to here