# Data Exploration
Tobias C. Haase \
Master's student of Psychology at [Goethe-University Frankfurt](https://www.goethe-university-frankfurt.de/en?locale=en)


Dataset: [Abrupt hippocampal remapping signals resolution of memory interference](https://openneuro.org/datasets/ds003707/versions/1.0.0)


# Structure of the Data Exploration Notebook
In the first part I will outline the general structure of the data set. Using the `pybids` module, the structure of the data set will be explored. 

Following this, using one subject as an example, the anatomical and functional files will be explored. 

# Exploration of the Dataset

In [3]:
#Import of the BIDSLayout to help exploration of the BIDS dataset. 
from bids import BIDSLayout

#The path to the BIDS dataset. 
data_path = "/home/tchaase/Documents/Universitaet/Forschungsmodul/project/data/ds003707/"

#Application of the BIDSLayout function, followed by printing it.
layout = BIDSLayout(data_path) 
print(layout)

#Import os module to allow easier execution of os-related task
import os

BIDS Layout: ...ngsmodul/project/data/ds003707 | Subjects: 36 | Sessions: 0 | Runs: 288


There is only one session, here shown as zero sessions. Overall, there are 8 runs per participant, equaling 288 runs overall.

The dataset originally contained 36 participants. As the the dorsal part of participant 33's cortex within the t1w image is cut-off, this participants was excluded: 

In [4]:
#!rm -r /home/tchaase/Documents/Universitaet/Forschungsmodul/project/data/ds003707/sub-33
#This leads to an updated sample size of:
layout = BIDSLayout(data_path) 
print(layout)

BIDS Layout: ...ngsmodul/project/data/ds003707 | Subjects: 36 | Sessions: 0 | Runs: 288


The updated sample size is therefore 35.

There are 8 runs, these are related to two different tasks:

In [15]:
#What are the task?
tasks = layout.get_tasks()
print("There are the following tasks: %s" %tasks)

#How many sessions are there?
layout.get_session()

There are the following tasks: ['scene', 'obj']


[]

The tasks are shown above. Only a single sessions exists, but the command gives an empty output. 

Overall for each participant there are several files:

In [8]:
files_sub1 = layout.get(subject = "10")
print("The total number of files for subject 10 is: {}.".format(len(files_sub1)))
all_files = layout.get()
print("The total amount of files is: {}.".format(len(all_files)))


The total number of files for subject 10 is: 32.
The total amount of files is: 1157.


['/home/tchaase/Documents/Universitaet/Forschungsmodul/project/data/ds003707/sub-10/anat/sub-10_T1w.json',
 '/home/tchaase/Documents/Universitaet/Forschungsmodul/project/data/ds003707/sub-10/anat/sub-10_T1w.nii.gz',
 '/home/tchaase/Documents/Universitaet/Forschungsmodul/project/data/ds003707/sub-10/anat/sub-10_T2w.json',
 '/home/tchaase/Documents/Universitaet/Forschungsmodul/project/data/ds003707/sub-10/anat/sub-10_T2w.nii.gz',
 '/home/tchaase/Documents/Universitaet/Forschungsmodul/project/data/ds003707/sub-10/fmap/sub-10_dir-AP_epi.json',
 '/home/tchaase/Documents/Universitaet/Forschungsmodul/project/data/ds003707/sub-10/fmap/sub-10_dir-AP_epi.nii.gz',
 '/home/tchaase/Documents/Universitaet/Forschungsmodul/project/data/ds003707/sub-10/fmap/sub-10_dir-PA_epi.json',
 '/home/tchaase/Documents/Universitaet/Forschungsmodul/project/data/ds003707/sub-10/fmap/sub-10_dir-PA_epi.nii.gz',
 '/home/tchaase/Documents/Universitaet/Forschungsmodul/project/data/ds003707/sub-10/func/sub-10_task-obj_run

There are 32 files for the first subject.These include files of the following file types: `.json`, `.nii.gz` and event files with a `.tsv` ending. 

There are both anatomic T1w and T2w images available, as well as functional images for both tasks.

The respective files for each task are:

In [28]:
sub_10_scene = layout.get(subject='10', return_type='file', task="scene", extension='nii.gz')

#Importing a package to make output more readable. 
from  pprint import pprint
pprint(sub_10_scene)
print("There are {} runs for the task scene." .format(len(sub_10_scene)))

['/home/tchaase/Documents/Universitaet/Forschungsmodul/project/data/ds003707/sub-10/func/sub-10_task-scene_run-01_bold.nii.gz',
 '/home/tchaase/Documents/Universitaet/Forschungsmodul/project/data/ds003707/sub-10/func/sub-10_task-scene_run-03_bold.nii.gz',
 '/home/tchaase/Documents/Universitaet/Forschungsmodul/project/data/ds003707/sub-10/func/sub-10_task-scene_run-04_bold.nii.gz',
 '/home/tchaase/Documents/Universitaet/Forschungsmodul/project/data/ds003707/sub-10/func/sub-10_task-scene_run-05_bold.nii.gz',
 '/home/tchaase/Documents/Universitaet/Forschungsmodul/project/data/ds003707/sub-10/func/sub-10_task-scene_run-06_bold.nii.gz',
 '/home/tchaase/Documents/Universitaet/Forschungsmodul/project/data/ds003707/sub-10/func/sub-10_task-scene_run-07_bold.nii.gz']
There are 6 runs for the task scene.


It is apparent, that the 6 runs for the scene task are not continously numbered. Why this is the case will be explained later!

Now to the object task:

In [30]:
sub_10_object = layout.get(subject='10', return_type='file', task="obj", extension='nii.gz')

#Importing a package to make output more readable. 
from  pprint import pprint
pprint(sub_10_object)
print("There are {} runs for the task scene." .format(len(sub_10_object)))

['/home/tchaase/Documents/Universitaet/Forschungsmodul/project/data/ds003707/sub-10/func/sub-10_task-obj_run-02_bold.nii.gz',
 '/home/tchaase/Documents/Universitaet/Forschungsmodul/project/data/ds003707/sub-10/func/sub-10_task-obj_run-08_bold.nii.gz']
There are 2 runs for the task scene.


Now that the basic structure is known, what is there to explore within the dataset? Which `entities` does pybids detect?

In [41]:
layout.get_entities()


{'subject': <Entity subject (pattern=[/\\]+sub-([a-zA-Z0-9]+), dtype=<class 'str'>)>,
 'session': <Entity session (pattern=[_/\\]+ses-([a-zA-Z0-9]+), dtype=<class 'str'>)>,
 'task': <Entity task (pattern=[_/\\]+task-([a-zA-Z0-9]+), dtype=<class 'str'>)>,
 'acquisition': <Entity acquisition (pattern=[_/\\]+acq-([a-zA-Z0-9]+), dtype=<class 'str'>)>,
 'ceagent': <Entity ceagent (pattern=[_/\\]+ce-([a-zA-Z0-9]+), dtype=<class 'str'>)>,
 'reconstruction': <Entity reconstruction (pattern=[_/\\]+rec-([a-zA-Z0-9]+), dtype=<class 'str'>)>,
 'direction': <Entity direction (pattern=[_/\\]+dir-([a-zA-Z0-9]+), dtype=<class 'str'>)>,
 'run': <Entity run (pattern=[_/\\]+run-(\d+), dtype=<class 'bids.layout.utils.PaddedInt'>)>,
 'proc': <Entity proc (pattern=[_/\\]+proc-([a-zA-Z0-9]+), dtype=<class 'str'>)>,
 'modality': <Entity modality (pattern=[_/\\]+mod-([a-zA-Z0-9]+), dtype=<class 'str'>)>,
 'echo': <Entity echo (pattern=[_/\\]+echo-([0-9]+), dtype=<class 'str'>)>,
 'flip': <Entity flip (pattern=

This is a lot of output - but basically there are a lot of potential parameters that can be explored. 

Slice timing,repition time etc. will later be explored in more detail. In the following section, a overview report is generated for the different the anatomic images, as well as the functional runs. 

In [44]:
from bids.reports import BIDSReport  #Import of the 'report' function. 
report = BIDSReport(layout)

counter = report.generate()



Number of patterns detected: 1
Remember to double-check everything and to replace <deg> with a degree symbol.




In [48]:
main_report = counter.most_common()[0][0]
print(main_report)

In session None, MR data were acquired using a 3-Tesla Siemens Skyra MRI scanner.
	One run of T1-weighted segmented k-space, spoiled, and MAG prepared gradient recalled and inversion recovery (GR/IR) single-echo structural MRI data were collected (256 slices; repetition time, TR=2500ms; echo time, TE=3.43ms; flip angle, FA=7<deg>; field of view, FOV=176x256mm; matrix size=176x256; voxel size=1x1x1mm).
	One run of T2-weighted segmented k-space, spoiled, and oversampling phase spin echo (SE) single-echo structural MRI data were collected (65 slices; repetition time, TR=13520ms; echo time, TE=88ms; flip angle, FA=150<deg>; field of view, FOV=220x220mm; matrix size=512x512; voxel size=0.43x0.43x2mm).
	Two runs of objects old new identification segmented k-space and steady state echo planar (EP) single-echo fMRI data were collected (72 slices in interleaved ascending order; repetition time, TR=2000ms; echo time, TE=36ms; flip angle, FA=90<deg>; field of view, FOV=211x211mm; matrix size=124x

For all image aquisitions a 3T Siemens scanner was used.

Besides basic metrics, it also becomes apparent that the slice number for the T2w image is low - this might be related to it being a partial image. As similar number of slices was aquired for the functional image. 

There are furthermore 8 functional runs, split into the two tasks, as explained before. Both were obtained using single echo imaging. More about those in a bit!



## Anatomic Exploration

For the anatomic exploration, only the T1w images will be looked at in detail. 

Firstly, the necessary modules will be loaded:

In [None]:
from nilearn.plotting import plot_stat_map, plot_anat, plot_img, show, plot_glass_brain

![tobecontinued](https://media.giphy.com/media/W9wHF6yVazlrW/giphy.gif)