# Writing a preprocessing script in python


## first thing we will always do is load our modules

In [7]:
import glob
import os
import pdb
import subprocess

### Often times we won't know all the modules we want to import right off the bat  but I like to make sure that as I am scripting I always put my modules at the top this allows others who may use my script to make sure they have all the necessary tools

## Now lets start by building a function that will hold all the commands we want to execute

```
def prepro():
    #do something cool
```

## We will make a function that will hold all of our global variables and our above function
## I personally like to call this main

```
def main():
    prepro()
```

## Finally we have our two functions and lastly we will call our main( ) which will execute both our global variables and our command function

In [9]:
def prepro(basedir):
    #Do something cool
    print('Hello data in the path '+basedir)
def main():
    #load in all the global variables prepro needs, right now this is just the path to the data
    basedir='/Users/gracer/Desktop/data' 
    prepro(basedir) #call prepro to do cool things 
    
main()#call main to execute all our globals then run our prepro function

Hello data in the path /Users/gracer/Desktop/data


##  What do we want the function to accomplish:

1. skull stripping
2. motion correction
  * creating motion regressors
  * creating framewise displacement regressor
  * a nice easy to read PDF/html?
3. re-orient?
4. trim extra TRs?

## Let's fill in our main( ) function first with the global variables we will need.

In [10]:
def main():
    basedir='/Users/gracer/Desktop/data'
    prepro()

## Anything you define in the main( ) function has to become an argument in the prepro( ) function. 

In [12]:
def prepro(basedir):
    print('Hello data in the path '+basedir)
def main():
    basedir='/Users/gracer/Desktop/data'
    prepro(basedir)

## Let's start with skull stripping using fsl's BET function. This is a linux based command so we are going to need to use a module to python to understand it.

## Normally at the command line we would run something like this:
```
bet input output [options]
```

## In python we can use the os module to run linux commands

```
os.system(bet input output -F)
```

## next lets take a close look at the input and output we need. What will the input look like? What do we want the output to look like?


In [13]:
input='/Users/gracer/Desktop/data/<subject number>/func/<nifiti_file>'


## Each time we run this command the only things we really need to change are the subject number and the name of the nifti file

## Our subject numbers and nifti files use a predictable pattern, so we can use the glob module to find everything with a similar pattern. Here we are going to use a wildcard character (*) to represent the portions of the subject number that differ.

In [14]:
input=glob.glob('/Users/gracer/Desktop/data/sub-*/func/sub-*.nii.gz')
print input[1:5]

['/Users/gracer/Desktop/data/sub-10159/func/sub-10159_task-bart_bold_bra.nii.gz', '/Users/gracer/Desktop/data/sub-10159/func/sub-10159_task-bart_bold_bra_mask.nii.gz', '/Users/gracer/Desktop/data/sub-10159/func/sub-10159_task-bart_bold_brain.nii.gz', '/Users/gracer/Desktop/data/sub-10159/func/sub-10159_task-bart_bold_brain_mask.nii.gz']


## glob has created a list with everything matching our pattern criteria. We can use any of python's list comprehension tools to further explore the list

In [15]:
len(input)

152

In [16]:
input[1]

'/Users/gracer/Desktop/data/sub-10159/func/sub-10159_task-bart_bold_bra.nii.gz'

## we can also take any element from the list and make it a string. By making a string we can grab IDs or other parts of interest

In [17]:
x=input[1]
print('this is '+x)
y=x.split('/')
print (y)
sub=y[5]
print sub

this is /Users/gracer/Desktop/data/sub-10159/func/sub-10159_task-bart_bold_bra.nii.gz
['', 'Users', 'gracer', 'Desktop', 'data', 'sub-10159', 'func', 'sub-10159_task-bart_bold_bra.nii.gz']
sub-10159


## Let's make this look a little nicer

In [18]:
sub=input[1].split('/')[5]
print(sub)

sub-10159


## Now we have the subject number but it looks like we have multiple tasks. How can we split an element from the list to get the task information and the subject information?

In [19]:
subtask=input[1].split('/')[7].split('.')[0]
#subtask=subtask.strip('.nii.gz')
print(subtask)

sub-10159_task-bart_bold_bra


In [20]:
output=subtask+'_brain'
print(output)

sub-10159_task-bart_bold_bra_brain


## Lets go back to our bet command in the os wrapper. We now have all the elements we need to execute it.

In [21]:
os.system('bet' x output '-F')

SyntaxError: invalid syntax (<ipython-input-21-c263dd53731b>, line 1)

## This is a problem, we have our input defined, but it looks like os.system is expecting a string argument. 
## We need to use another wildcare to pass our variables as strings! 

In [22]:
#os.system('bet' x output '-F')
os.system('bet %s %s -F'%(x, output))

0

## The %s is a placeholder for string variable
The **%** lets python know to look to the % sign outside the string for the variable of interest. 
We could also use this to pass **integers and floats using %i and %f** respectively.

## Now we have the ability to run bet through python on one subject.... but what about all the other scans.... ? GLOB!

In [23]:
input=glob.glob('/Users/gracer/Desktop/data/sub-*/func/sub-*.nii.gz')
#this is a little long to type each time, and it is really easy to mess up the / formating 

## os.path.join( ) is super useful to quickly define paths. It will format strings into paths and allows us to use the %s 

In [24]:
#input=glob.glob('/Users/gracer/Desktop/data/sub-*/func/sub-*.nii.gz')
basedir='/Users/gracer/Desktop/data'
path=os.path.join(basedir,'sub-*','func','sub-*.niig.gz')
print(path)
input=glob.glob(os.path.join(basedir,'sub-*','func','sub-*.nii.gz'))
print(input[1:5])

/Users/gracer/Desktop/data/sub-*/func/sub-*.niig.gz
['/Users/gracer/Desktop/data/sub-10159/func/sub-10159_task-bart_bold_bra.nii.gz', '/Users/gracer/Desktop/data/sub-10159/func/sub-10159_task-bart_bold_bra_mask.nii.gz', '/Users/gracer/Desktop/data/sub-10159/func/sub-10159_task-bart_bold_brain.nii.gz', '/Users/gracer/Desktop/data/sub-10159/func/sub-10159_task-bart_bold_brain_mask.nii.gz']


## Let's put this altogether into our function prepro( ) with a loop

In [None]:
def prepro(basedir):
    for item in glob.glob(os.path.join(basedir,'sub-*','func','sub-*.nii.gz')):
        input=item
        output_path=item.strip('.nii.gz')
        output=output_path+('_brain')
        os.system("/usr/local/fsl/bin/bet %s %s -F"%(input, output))
        pdb.set_trace()
def main():
    basedir='/Users/gracer/Desktop/data'
    prepro(basedir)

In [None]:
main()

> <ipython-input-25-807966444b94>(2)prepro()
-> for item in glob.glob(os.path.join(basedir,'sub-*','func','sub-*.nii.gz')):


## Ta Da!!! You have your first preprocessing script! 
### But wait... how do you make sure you don't end up running the same function on the same data over and over?
### Let's write in a check statement

## We can use os.path.exists( ) to check if we have already run BET, and tell our function to skip that subject
### This is useful if you have two people preprocessing data, or if something happens (aka your computer runs out of power) 

In [None]:
def prepro(basedir):
    for item in glob.glob(os.path.join(basedir,'sub-*','func','sub-*.nii.gz')):
        input=item
        output_path=item.strip('.nii.gz')
        output=output_path+'_brain.nii.gz'
        print(output)
        pdb.set_trace()
        if os.path.exists(output):
            print(output_path+' is already stripped')
        else:
            os.system("/usr/local/fsl/bin/bet %s %s -F"%(input, output))
        #pdb.set_trace()
def main():
    basedir='/Users/gracer/Desktop/data'
    prepro(basedir)

In [None]:
main()

## Now that we know how:
1. To make a set of functions
2. Set our global variables
3. Wrap our linux commands
4. Use glob to get all our subjects through wildcard matching
5. Loop through our list of subjects (from glob)
6. Use string comprehension to format file names
7. Use if/else loops to check for existing data
### Try writing a function to skull strip a T1w scan