# Initial preparations

<mark> Both the installation and data download only need to be performed once! </mark>

We need to first install all the necessary python packages and download the data we will use in this tutorial. Before we do this, you should create a python environment specifically for processing neuroimaging data with python. [Anaconda](https://www.anaconda.com/products/distribution) is an easy tool to create and manage python environments. Install Anaconda (if don't already have it installed) and create a new environment with the latest version python. Then, with this environment activated (or selected in VS Code), move on to install the necessary python packages.

## Python packages
All the packages we need to install are included in the requirements.txt file in this repository. We can install these packages with a call to pip (note: in a jupyter notebook, you can make calls to the terminal with a %, this command could also be run within a terminal):

In [3]:
%pip install -r requirements.txt

Collecting datalad
  Downloading datalad-0.17.8-py3-none-any.whl (1.4 MB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m1.4/1.4 MB[0m [31m3.1 MB/s[0m eta [36m0:00:00[0m00:01[0m00:01[0m
Collecting patool>=1.7
  Downloading patool-1.12-py2.py3-none-any.whl (77 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m77.5/77.5 kB[0m [31m1.8 MB/s[0m eta [36m0:00:00[0ma [36m0:00:01[0m
Collecting boto
  Downloading boto-2.49.0-py2.py3-none-any.whl (1.4 MB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m1.4/1.4 MB[0m [31m3.1 MB/s[0m eta [36m0:00:00[0m00:01[0m00:01[0m
[?25hCollecting fasteners>=0.14
  Downloading fasteners-0.18-py3-none-any.whl (18 kB)
Collecting distro
  Downloading distro-1.8.0-py3-none-any.whl (20 kB)
Collecting whoosh
  Downloading Whoosh-2.7.4-py2.py3-none-any.whl (468 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m468.8/468.8 kB[0m [31m2.9 MB/s[0m eta [36m0:00:00[0m00:01[0m00:01

## Download the tutorial data
We are using data from the Pinel Localizer task which includes a 5-minute functional localizer for a few basic cognitive processes (visual perception, finger tapping, langauge, math). The [original paper](https://bmcneurosci.biomedcentral.com/articles/10.1186/1471-2202-8-91) and [OSF website](https://osf.io/vhtf6/files/) include all the details. The full dataset includes 94 subject and is big (~42Gb), we will only grab a subset of the data (~6Gb). 

The data is available through a DataLad instance, which thankfully has a python API. DataLad was included in the requirements.txt file above, so was installed with the rest of the python packages. But, for DataLad to work properly, we also need to install git-annex. See [https://git-annex.branchable.com/install/] for installations instructions (on MacOS, use Homebrew to install: `brew install git-annex`). **Install git-annex before proceeding!**

Once git-annex is installed, we can download the data. NOTE: In the code below, make sure to update `localizer_path` to point to a directory on your computer where the data should be downloaded.

In [8]:
import os
import glob
import datalad.api as dl
import pandas as pd

# update this path to a local directory on your computer!
localizer_path = '/Users/michael/Dropbox/work/data/dartbrains/data/localizer'

# clone the datalad repository and create a local dataset instance (this will take several minutes!)
dl.clone(source='https://gin.g-node.org/ljchang/Localizer',path=localizer_path)
ds = dl.Dataset(localizer_path)

Cloning the dataset to your local computer only provides links to the file structure, we still need to actually download the data. The get calls below will take some time to complete (30-50 mins!), so get it started and go grab a coffee. 

In [14]:
# download the experiment metadata
result = ds.get(glob.glob(os.path.join(localizer_path,'*.json')))
result = ds.get(glob.glob(os.path.join(localizer_path,'*.tsv')))
result = ds.get(glob.glob(os.path.join(localizer_path, 'phenotype')))
# download the first 5 subjects fmriprep'd data
file_list = glob.glob(os.path.join(localizer_path,'*','fmriprep','sub*'))
file_list.sort()
for f in file_list[:10]:
    result = ds.get(f)

action summary:
  get (notneeded: 3)
get(ok): derivatives/fmriprep/sub-S04/anat/sub-S04_desc-brain_mask.nii.gz (file) [from origin...]
get(ok): derivatives/fmriprep/sub-S04/anat/sub-S04_desc-preproc_T1w.nii.gz (file) [from origin...]
get(ok): derivatives/fmriprep/sub-S04/anat/sub-S04_dseg.nii.gz (file) [from origin...]
get(ok): derivatives/fmriprep/sub-S04/anat/sub-S04_from-MNI152NLin2009cAsym_to-T1w_mode-image_xfm.h5 (file) [from origin...]
get(ok): derivatives/fmriprep/sub-S04/anat/sub-S04_from-T1w_to-MNI152NLin2009cAsym_mode-image_xfm.h5 (file) [from origin...]
get(ok): derivatives/fmriprep/sub-S04/anat/sub-S04_label-CSF_probseg.nii.gz (file) [from origin...]
get(ok): derivatives/fmriprep/sub-S04/anat/sub-S04_label-GM_probseg.nii.gz (file) [from origin...]
get(ok): derivatives/fmriprep/sub-S04/anat/sub-S04_label-WM_probseg.nii.gz (file) [from origin...]
get(ok): derivatives/fmriprep/sub-S04/anat/sub-S04_space-MNI152NLin2009cAsym_desc-brain_mask.nii.gz (file) [from origin...]
get(ok)