# Writing Scripts

## Learning Objectives

*   Save a python program into a file
*   Run the python program from notebook or command-line
*   Make a change to the program and see the file change
*   Explain why we should write scripts

At this point, we've written a lot of code and run it interactively.
While this is a great use of the notebook, it doesn't really translate to things like running 
your analysis on a cluster or reusing code across notebooks/projects.

What if we could run our functions and statements from the command-line, just like we did in bash?


For this lesson, we're going to use some magic commands in the notebook to save our script to a file and run it.

1. `%%file` to save the contents of a cell into a file
2. `%ls` to see the contents of a file
3. `%%sh` to run a command on the bash prompt

These can also be accomplished with a text editor and a bash shell, which we'll revisit tomorrow. For now, we'll simplify things by staying in one window

In [None]:
%%file hello-v1.py

name = ''
print 'hello',nam

In [None]:
%ls -l hello-*.py

In [None]:
%%sh
python hello-v1.py

## Challenge: Clean up script

Below is a copy & paste of the `analyze` and `detect` functions from the previous lesson. This script runs but is limited to 3 files and only looks at the local data directory

Create new versions (e.g. `analyze_detect-v2.py`) that:

- **v2**: Moves the `import glob` to the top of the file with `import pandas`
- **v3**: Removes the limit of 3 files and looks in our data directory
- **v4**: Adds a function called `read_bed(filename)` that performs the `pandas.read_table`, sets the `data.columns`, and `return`s the `data`

Now, open a bash prompt, `cd` to the directory where your scripts are, and run each one individually:

`python analyze_detect-v1.py`

What are the differences between the scripts?


In [None]:
%pwd

In [None]:
%%file analyze_detect-v1.py

import pandas

def analyze(filename):
    data = pandas.read_table(filename, header=None)
    data.columns = ['chrom','chromStart','chromEnd','name','score','strand','level','signif','score2']
    data.groupby('chrom').mean().plot(kind='bar',y='score', ylim=[0,250], title=f, figsize=(10,5))
    
def detect_problems(filename):
    data = pandas.read_table(filename, header=None)
    data.columns = ['chrom','chromStart','chromEnd','name','score','strand','level','signif','score2']
    if data['score2'].min() < 1 and data['score'].min() > 0:
        print 'Suspicious data!'
    elif data.loc[data['chrom'] == 'chrM']['score'].mean() > 200:
        print 'High scores on chrM!'
    else:
        print 'Seems OK!'
        
import glob
filenames = glob.glob('data/*.bed')
for f in filenames[:3]:
    print f
    analyze(f)
    detect_problems(f)


In [None]:
%ls -l analyze_detect-v*.py

In [None]:
%%sh
python analyze_detect-v1.py