
# <center>Python 3: Understanding Programming for Improved Workflows</center>
<p>
<center><i>Yale Center for Research Computing</i></center>
<p>

## What is the Yale Center for Research Computing?


- Independent center under the Provost's office
- Created to support your research computing needs
- Focus is on high performance computing and storage
- ~20 staff, including applications specialists and system engineers
- Available to consult with and educate users
- Manage compute clusters and support users
- Located at 160 St. Ronan St, at the corner of Edwards and St. Ronan
- [ycrc.yale.edu](https://research.computing.yale.edu/)



## What is Programming?

Programming is the process of writing computer scripts to automate or complete some process



## How do I program?

necessary requirements:
- access to a computer
- access to google and/or some knowledge of a programming language (infinite tutorials for any language online)
- <b>the ability to convert your issue/goal into computer logic </b>



## Convert to Computer Logic?

A computer only knows what you tell it to know and can only operate with true or false answers

Humans operate with a multi-faceted level of logic based on a number of factors including observation, bias, and experience.
-A computer will only see what you tell it to see and will only draw conclusions based on the conclusions you tell it to draw about the provided information


## Example - Turning on the TV

Problem: We want to turn the TV on

Human steps:
- pick up remote
- press power button
- put down remote


Computer steps:
- Tell computer what room TV is in
- Tell computer where remote is
- Tell computer to pick up remote
- Tell computer what button to press
- Tell computer to press button
- Tell computer to put down remote in specific location



In basic programming, computers can not infer and will not be able to perform any tasks outside of what it is explicitly told to do



## Why Python?
- Free, portable, easy to learn
- Wildly popular, huge and growing community
- Intuitive, natural syntax
- Large ecosystem of libraries/packages (modules)
- Ideal for rapid prototyping but also for large applications
- Very efficient to write, reasonably efficient to run as is


## Benefits for C&T positions?
- Data management
    - Research support - visualization, easy searches, large data storage
    - Accounting - Search specific clients, organize clients via different labels, track purchases
    - Secretarial - easy lookups for benefits, organization of historical data such as meetings, clients, etc.
    - Curatorial/Library - complex inventory management (age of material, location, future tasks, etc)
- Python provides opportunities for improvement in job performance and efficiency in many C&T positions


## You can use Python to...
- Convert or filter files
- Automate repetitive tasks
- Compute statistics
- Build processing pipelines
- Build simple web applications
- Perform large numerical computations
- ...

You can use Python instead of bash, Java, or C

Python can be run interactively or as a program

## Different ways to run Python

1. Jupyter notebook
    ``` bash
   jupyter notebook notebook.ipynb
   ```  

1. Run interpreter interactively

   ``` bash
   python
   ```

1. Create a file using editor, then:

   ``` bash
   python myscript.py
   ```





## Installing Python

We recommend Anaconda:
- easy to install
- easy to add additional packages
- allows creation of custom environments

## Installing Anaconda Python and getting tutorial notebook (later)

1. Install Anaconda Python (includes Jupyter notebook)

Go to the anaconda.com download page https://www.anaconda.com/products/individual and get python 3.11 for your OS.  Follow the instructions to install it (this will depend on your OS). 
```
2. Get tutorial files

The files are in a github repository.  If you are familiar with git, you can do:
     git clone https://github.com/ycrc/IntroPythonC-T.git

Alternatively, you can download the files as a zip, and unzip them:
https://github.com/ycrc/IntroPythonC-T/archive/master.zip

3.  Run the Jupyter notebook and select  PythonNotebook.ipynb from the tutorial folder.
Again, exactly how to do this will depend on your OS.  
In windows type anaconda into the search box and open 

Jupyter should connect to your web browser and open the notebook.

## Python 3 versions

- Python 3 has versions ranging from 3.1-3.12+
- Most existing python programs use python 3.7 at minimum
- Most commonly used version at this time is 3.11
- Python version changes past 3.* usually consist of small changes
    - improve efficiency
    - correct bugs
    - add additional packages
- best practice is to use newest version or second newest version for new scripts
    - one caveat: version could be restricted to program you are trying to use script with
        - program from 2020 might only be able to use python 3.7

## Python 2 versus 3

- Python 2 no longer supported, but some old projects have not migrated
- Any new code should be Python 3
- Key changes in Python 3:
 - print is a function
 - integer division returns float
 - functions returning iterators instead of lists (range, dic.keys()...)

## Basic Python types: _integers, floats, strings, booleans_

The building blocks - variables (assigned values for program execution)
- Integers: any whole number, including negative (-1, 5, 9, etc)
- Floats: Any number with decimals
- Strings: Any non-numerical text (each letter is a string)
- Booleans: True, False 

In [2]:
radius=2
pi=3.14
diam=radius*2
area=pi*(radius**2)
title="fun with strings"
pi='cherry'
radius=2.5
delicious=True
print(radius)

2.5



- Variables do not need to be declared or typed
- Integers and floating points can be used together
- The same variable can hold different types


### _strings_ ...

- are defined with `'` or `"`
- cannot be modified
- have lots of useful methods, e.g. `strip`, `split`, etc


In [4]:
s="hello"
s.split('l')

['he', '', 'o']

## Data structures: _lists_

Like arrays in other languages. 



In [5]:
numbers=[1,2,3,4,5,6,7,8,9]



Each value is assigned a position in the list, STARTING from position 0



In [6]:
numbers[3]

4

To cycle through multiple positions, can use :

: = "to", so 1:4 means 1 to 4


In [7]:
numbers[5:7]

[6, 7]

Can control increment by using second :

i.e. 1:4:2 = 1,3

In [8]:
numbers[1:6:2]

[2, 4, 6]

Can replace values in list by indicating listname[position]=new value


In [9]:
numbers[2]=3.14
numbers

[1, 2, 3.14, 4, 5, 6, 7, 8, 9]

In [10]:
numbers.reverse()
numbers

[9, 8, 7, 6, 5, 4, 3.14, 2, 1]

## Lists are more flexible than arrays

You can ...

- insert or append new elements (positions)
- remove elements
- nest list


In [11]:
numbers=[1,2,3,4,5,6,7,8,9]
numbers[2]=[11,12,13]
numbers[2][0]

11

Here, we have inserted the list, 11,12,13 into element 2.

Since this is now a nested list (list inside a list), it requires two arguments to pull a value:
numbers[2][0] = 3rd value in numbers list (2) and first value (0) of the nested list = 11

Additionally, you can
- combine values of different types into lists, (strings and integers for example)


In [15]:
numbers[3:6]=['four to six']
numbers

[9, 8, 7, 'four to six', 1]

## Data structures: _tuples_

Like lists, but not modifiable

Good for sharing data that shouldn't be edited by other users or other programs


In [16]:
tup=(1,2,3,4,5,6,7,8,9)
tup

(1, 2, 3, 4, 5, 6, 7, 8, 9)

In [17]:
print(tup[4])

5


## Data structures: _dictionaries_

Dicts are what python calls "hash tables"

- dicts associate keys with values, which can be of (almost) any type
- dicts have length, but are not ordered
- looking up values in dicts is very fast, even if the dict is BIG.



In [18]:
coins={'penny':1, 'nickle':5, 'dime':10, 'quarter':25}
coins

{'penny': 1, 'nickle': 5, 'dime': 10, 'quarter': 25}

In [19]:
coins['dime']

10

In [20]:
coins['half-dollar'] = 50
coins

{'penny': 1, 'nickle': 5, 'dime': 10, 'quarter': 25, 'half-dollar': 50}

## Control Flow Statements: _if_

- `if` statements run a test, then do something based on the result
- `else` is optional





In [21]:
import random
number=random.randint(0,100)
if number < 50:
    if number < 5: 
        print ("really small")
    print ("small", number)
    print ("another line")
else: 
    print ("big", number)  
print ("after else")

big 66
after else


## Control Flow Statements: _while_

- While statements execute one or more statements repeatedly until the
test is false

In [22]:
import random
count=0
while count<100:
   count=count+random.randint(0,10)
   print (count)
print("done with loop")


8
16
23
29
35
44
48
55
63
72
74
82
86
86
93
101
done with loop


## Control Flow Statements: _for_

For statements take some sort of _iterable_ object and loop once for
every value.

In [23]:
for i in [0,1,2,3]:
    print(i)

0
1
2
3


In [24]:
for i in range(10):
   print(i)

0
1
2
3
4
5
6
7
8
9


In [25]:
for letter in "this string":
   print(letter)

t
h
i
s
 
s
t
r
i
n
g


## Using `for` loops and `dicts`

Do something for each key (Use `items()` for keys and values)

In [26]:
for val in coins:  
   print (val)

penny
nickle
dime
quarter
half-dollar


In [27]:
for val in coins.items():  
   print (val)

('penny', 1)
('nickle', 5)
('dime', 10)
('quarter', 25)
('half-dollar', 50)


## Control Flow Statements: altering loops
While and For loops can skip steps (`continue`) or terminate early (`break`).

In [28]:
for i in range(10):
   if i%2 != 0: continue # is i odd?
   print (i)

0
2
4
6
8


In [29]:
for i in range(10):
   if i>5: break
   print (i)


0
1
2
3
4
5


## Note on code blocks

In the previous example:


In [30]:
for i in range(10):
   if i>5: break
   print(i)
 

0
1
2
3
4
5


How did we know that `print(i)` was part of the loop?  What defines a loop?

- Many programming languages use `{ }` or Begin End to delineate blocks of code to treat as a single unit.

- Python uses white space.  Code indented to the same level is one block.

- By convention and for readability, indent a consistent number (many editors will do this for you).

## List comprehensions

can replace simple loops, are fast and concise

In [33]:
numbers=[1,2,3,4,5,6,7,8,9]
new_numbers=[]
for number in numbers:
    new_numbers.append(number*5) 
new_numbers

[5, 10, 15, 20, 25, 30, 35, 40, 45]

In [34]:
numbers=[1,2,3,4,5,6,7,8,9]
new_numbers = [number * 5 for number in numbers]
new_numbers

[5, 10, 15, 20, 25, 30, 35, 40, 45]

## Functions

allow you to write code once and use it many times

compartmentalize detail so code is more understandable


In [35]:
def area(w, h):
   return w*h

Function area is defined, can call it by providing NAME_OF_FUNCTION(input values)


In [37]:
area(3, 10) 

30

## Summary of basic elements of Python

- Basics: int, float, string, boolean
- More complex: list, dict, tuple
- Control constructs: if, while, for, list comprehension, def


## Printing & Formatting


In [34]:
i=5
print("Simple "+"Stuff "+str(i))

Simple Stuff 5


In [38]:
import math
x=16

# old-school way
print("The sqrt of %i is %f" % (x, math.sqrt(x)))

# format function
print("The sqrt of {} is {}".format(x, math.sqrt(x)))
print("{1:.0f} squared is {0}". format(x, math.sqrt(x)))

# f-strings (3.6+)
print(f"The sqrt of {x} is {math.sqrt(x):.2f}")

The sqrt of 16 is 4.000000
The sqrt of 16 is 4.0
4 squared is 16
The sqrt of 16 is 4.00


## Objects

- All python values are objects (lists, strings, dicts, etc.)
- Objects combine value(s) and methods (functions)
- Advanced users can create their own classes of objects
- All the usual OO stuff: inheritance, data hiding, etc.
- Use `dir()` to discover an object's methods & attributes


In [36]:

numbers=[1,2,3]
numbers.clear()
numbers

[]

In [52]:
# try: index, upper, isupper, startswith, replace
sobject="This is a String"
print(sobject.partition("i"))
print(sobject.split("s"))
print(sobject.upper())
print(sobject.index("s"))
print(sobject.startswith("t"))
print(sobject.replace("T","t"))


('Th', 'i', 's is a String')
['Thi', ' i', ' a String']
THIS IS A STRING
3
False
this is a String
False


In [53]:
#dir will list all methods that can be used on an object
dir(sobject) 

['__add__',
 '__class__',
 '__contains__',
 '__delattr__',
 '__dir__',
 '__doc__',
 '__eq__',
 '__format__',
 '__ge__',
 '__getattribute__',
 '__getitem__',
 '__getnewargs__',
 '__gt__',
 '__hash__',
 '__init__',
 '__init_subclass__',
 '__iter__',
 '__le__',
 '__len__',
 '__lt__',
 '__mod__',
 '__mul__',
 '__ne__',
 '__new__',
 '__reduce__',
 '__reduce_ex__',
 '__repr__',
 '__rmod__',
 '__rmul__',
 '__setattr__',
 '__sizeof__',
 '__str__',
 '__subclasshook__',
 'capitalize',
 'casefold',
 'center',
 'count',
 'encode',
 'endswith',
 'expandtabs',
 'find',
 'format',
 'format_map',
 'index',
 'isalnum',
 'isalpha',
 'isascii',
 'isdecimal',
 'isdigit',
 'isidentifier',
 'islower',
 'isnumeric',
 'isprintable',
 'isspace',
 'istitle',
 'isupper',
 'join',
 'ljust',
 'lower',
 'lstrip',
 'maketrans',
 'partition',
 'removeprefix',
 'removesuffix',
 'replace',
 'rfind',
 'rindex',
 'rjust',
 'rpartition',
 'rsplit',
 'rstrip',
 'split',
 'splitlines',
 'startswith',
 'strip',
 'swapcase',


## Converting logic to code - turning the tv on


- Tell computer what room TV is in - variable room="living_room"

- Tell computer where remote is - variable remote="right_couch_arm"

- Tell computer to pick up remote - function that uses room and remote to grab remote

- Tell computer what button to press - variable power="red_button"

- Tell computer to press power button - function using power variable

- Tell computer to put down remote in specific location - function using room and remote variables

## Example 1: File Reformatter

#### Task: given a file of hundreds or thousands of lines

```
FCID,Lane,Sample_ID,SampleRef,index,Description,Control,Recipe,...
160212,1,A1,human,TAAGGCGA-TAGATCGC,None,N,Eland-rna,Mei,Jon_mix10
160212,1,A2,human,CGTACTAG-CTCTCTAT,None,N,Eland-rna,Mei,Jon_mix10
160212,1,A3,human,AGGCAGAA-TATCCTCT,None,N,Eland-rna,Mei,Jon_mix10
...
```

#### Remove the last 3 letters from the 5th column

```
FCID,Lane,Sample_ID,SampleRef,index,Description,Control,Recipe,...
160212,1,A1,human,TAAGGCGA-TAGAT,None,N,Eland-rna,Mei,Jon_mix10
160212,1,A2,human,CGTACTAG-CTCTC,None,N,Eland-rna,Mei,Jon_mix10
160212,1,A3,human,AGGCAGAA-TATCC,None,N,Eland-rna,Mei,Jon_mix10
...
```

`TAAGGCGA-TAGATCGC` -> `TAAGGCGA-TAGAT` and so on

In this example, we'll show:
- reading lines of a file
- parsing and modifying the lines
- writing them back out
- creating a script to do the above and running it
- passing the script the file to modify

## In Psuedocode: writing logic prior to programming

- open the input file
- read the first header line, and print it out
- for each remaining line in the file:
    - read the line
    - find the value in the 5th column
    - truncate it by removing the last three letters
    - put the line back together
    - print it out


## Step 1: open the input file

In [1]:
filepointer=open('badfile.txt')

## Step 2: read the first header line, and print it out

In [61]:
file_pointer=open('badfile.txt')
print (file_pointer.readline().rstrip())

FCID,Lane,Sample_ID,SampleRef,index,Description,Control,Recipe,Operator,Project


- Call `readline()` on the file pointer to get a single line from the file
(the header line)

- `rstrip()` removes the return character at the end of the line

- Then print it

## Step 3: for each remaining line in the file, read the line

In [62]:
file_pointer=open('badfile.txt')
print (file_pointer.readline().rstrip())
for line in file_pointer:
  line=line.rstrip()
  print(line)

FCID,Lane,Sample_ID,SampleRef,index,Description,Control,Recipe,Operator,Project
160212,1,A1,human,TAAGGCGA-TAGATCGC,None,N,Eland-rna,Mei,Jon_mix10
160212,1,A2,human,CGTACTAG-CTCTCTAT,None,N,Eland-rna,Mei,Jon_mix10
160212,1,A3,human,AGGCAGAA-TATCCTCT,None,N,Eland-rna,Mei,Jon_mix10
160212,1,A4,human,TCCTGAGC-AGAGTAGA,None,N,Eland-rna,Mei,Jon_mix10


A file pointer is an example of an iterator.

Instead of explicitly calling `readline()` for each line, we can just loop on the file
pointer, getting one line each time.

Since we already read the header, we won't get that line.

## Step 4: find the value in the 5th column, and remove last 3 letters


In [63]:
file_pointer=open('badfile.txt')
print (file_pointer.readline().strip())  
for line in file_pointer:
    fields=line.strip().split(',')
    fields[4]=fields[4][:-3]
    print(fields)

FCID,Lane,Sample_ID,SampleRef,index,Description,Control,Recipe,Operator,Project
['160212', '1', 'A1', 'human', 'TAAGGCGA-TAGAT', 'None', 'N', 'Eland-rna', 'Mei', 'Jon_mix10']
['160212', '1', 'A2', 'human', 'CGTACTAG-CTCTC', 'None', 'N', 'Eland-rna', 'Mei', 'Jon_mix10']
['160212', '1', 'A3', 'human', 'AGGCAGAA-TATCC', 'None', 'N', 'Eland-rna', 'Mei', 'Jon_mix10']
['160212', '1', 'A4', 'human', 'TCCTGAGC-AGAGT', 'None', 'N', 'Eland-rna', 'Mei', 'Jon_mix10']


Like before, we strip the return from the line.

We split it into
individual elements where we find commas.

The 5th field is referenced by
flds[4], since python starts indexing with 0.  [:-3] takes all characters
of the string until the last 3.

## Brief detour: string splitting and joining

In [64]:
s='this,is,a,string'
l=s.split(',')
print(l)
print(':'.join(l))

['this', 'is', 'a', 'string']
this:is:a:string


## Step 5: put the line back together, and print it

In [65]:
file_pointer=open("badfile.txt")
print (file_pointer.readline().strip())
for line in file_pointer:
    fields=line.strip().split(',')
    fields[4]=fields[4][:-3]
    print (','.join(fields))

FCID,Lane,Sample_ID,SampleRef,index,Description,Control,Recipe,Operator,Project
160212,1,A1,human,TAAGGCGA-TAGAT,None,N,Eland-rna,Mei,Jon_mix10
160212,1,A2,human,CGTACTAG-CTCTC,None,N,Eland-rna,Mei,Jon_mix10
160212,1,A3,human,AGGCAGAA-TATCC,None,N,Eland-rna,Mei,Jon_mix10
160212,1,A4,human,TCCTGAGC-AGAGT,None,N,Eland-rna,Mei,Jon_mix10


 
Join takes a list of strings, and combines them into one string using the
string provided. Then we just print that string.

 
We would invoke it like this:
```
$ python Ex1.py badfile.txt

$ python Ex1.py badfile.txt > fixedfile.txt
```

## Example 2: directory walk with file ops

Imagine you have a directory tree with many subdirectories.

In those directories are files named *.fastq.  You want to:

- find them
- compress them to fastq.gz using a program
- delete them if the conversion was successful

In this example, we'll demonstrate:

- traversing an entire directory tree
- executing a program on files in that tree
- testing for successful program execution



 ## In psuedocode
   
- for each directory
    - get a list of files in that directory
    - for each file in that directory
        - if that file's name ends with .fastq
            - create a new file name with .gz added
            - create a command to do the compression
            - run that command and check for success
                - if success
                    - delete the original
                - else
                    - stop

The conversion command is: 
```gzip -c file.fastq > file.fastq.gz```


## Step 1: directory traversal

We need a way to traverse all the files and directories.
```os.walk(dir)``` starts at dir and visits every subdirectory below it.
It returns a list of files and subdirectories at each subdirectory.

For example, imagine we have the following dirs and files:

```
Ex2dir
Ex2dir/d1
Ex2dir/d1/d2
Ex2dir/d1/d2/f2.fastq
Ex2dir/d1/f1.fastq
```



In [66]:
import os
for path , dirs, files in os.walk('Ex2dir'):
   print (path, dirs, files)

Python-Bootcamp/Ex2dir ['d1'] []
Python-Bootcamp/Ex2dir/d1 ['d2'] ['f1.fastq.gz']
Python-Bootcamp/Ex2dir/d1/d2 [] ['f2.fastq.gz']


## Step 2: Invoking other programs from python

The [subprocess module](https://docs.python.org/3/library/subprocess.html) has a variety of ways to do this. A simple one:

```
import subprocess

ret=subprocess.call(cmd, shell=True)

```

ret is 0 on success, non-zero error code on failure.



In [67]:
import subprocess
ret_code=subprocess.call('gzip -c myfile.fastq > myfile.fastq.gz', shell=True)
ret_code

0

## Put it all together

In [68]:
import os, sys, subprocess
sys.argv=['Ex2.py', 'Ex2dir'] # for Jupyter we'll cheat
start=sys.argv[1]
for path, subdirs, files in os.walk(start):
    for file in files:
        if file.endswith('.fastq'):
            file_name=f'{path}/{file}'
            cmpress_file=file_name.replace('.fastq', '.fastq.gz')
            cmd=f'gzip -c {file_name} > {cmpress_file}'
            print (f"running {cmd}")
            ret_code=subprocess.call(cmd, shell=True)
            if ret_code==0:
                if os.path.exists(cmpress_file):
                    os.remove(file_name)
            else:
                print ("Failed on ", file_name)
                sys.exit(1)
print("Done")

Done



We would invoke it like this:
```
$ python Ex2.py Ex2dir
```


## Example 3: Nested Dictionaries

Dictionaries associate names with data, and allow quick retrieval by name.

By nesting dictionaries, powerful lookups are fast and easy.

In this example, we'll:
- create a dict containing objects
- load the objects with search data
- use the dict to retrieve the appropriate object for a search
- perform the search




genes.txt describes the locations of genes:

(name, chrom, strand, start, end)

```
uc001aaa.3      chr1    +       11873   14409 
uc010nxr.1      chr1    +       11873   14409 
uc010nxq.1      chr1    +       11873   14409  
uc009vis.3      chr1    -       14361   16765  
uc009vit.3      chr1    -       14361   19759  
...
```


mappedreads.txt describes mapped dna sequences

(name, chrom, position, sequence)

```
seq1 chr1  674540   ATCTGTGCAGAGGAGAACGCAGCTCCGCCCTCGCGGT
seq2 chr19 575000   AGAGGAGAACGCAGCTCCGCCCTCGCGGTGCTCTCCG
seq3 chr5  441682   TCTGCATCTGCTCTGGTGTCTTCTGCCATATCACTGC
...
```

We'd like to be able to quickly determine the genes overlapped by a dna sequence.

First, we need a simple but efficient way to determine if two intervals overlap.

intervaltree is a python module that makes that easy.

In [53]:
from intervaltree import IntervalTree
it=IntervalTree()
it[4:7]='gene1'
it[5:10]='gene2'
it[1:11]='gene3'
it

IntervalTree([Interval(1, 11, 'gene3'), Interval(4, 7, 'gene1'), Interval(5, 10, 'gene2')])

In [54]:
it[3]

{Interval(1, 11, 'gene3')}

## General plan

- create an interval tree for each chromosome
- organize the trees in a dictionary by chromosome
- store an interval for each gene the tree for its chromosome

```
{'chr1': IntervalTree([Interval(1000, 1100, 'GeneA'), 
                       Interval(2000, 2100, 'GeneB'), ...
 'chr2': IntervalTree([Interval(4000, 5100, 'GeneC'), 
                       Interval(7000, 8100, 'GeneD'), ...
 'chr3':
 ...
```

 # In psuedocode
 ### Step 1: Set up the lookup table
 
- create empty dict
- open the gene file
- for each line in the file
    - get gene name, chrom, start, end
    - initialize an intervaltree for the chrom, if needed, and add to dict
    - add the interval and gene name to the interval tree


In [56]:
import sys
from intervaltree import IntervalTree

print("initializing table")
table={}
sys.argv=['Ex3.py', 'genes.txt', 'mappedreads.txt', 'results.txt'] # for Jupyter
for line in open(sys.argv[1]):
    genename, chrm, strand, start, end = line.split()
    if not chrm in table:
        table[chrm]=IntervalTree()
    table[chrm][int(start):int(end)]=genename
print("done")


initializing table
done


In [57]:
table['chr1'][770000:780000]

{Interval(763063, 788902, 'uc009vjn.1'),
 Interval(763063, 788997, 'uc001abp.1'),
 Interval(763063, 788997, 'uc001abq.1'),
 Interval(763063, 788997, 'uc009vjo.1'),
 Interval(763063, 789740, 'uc001abr.1')}

## Step 2: Use the interval trees to find overlapped genes

- open the dna sequence file
- for each line in the file:
    - get chrom, mapped position, and dna sequence
    - look up the interval tree for that chrom in the dict
    - search the interval tree for overlaps [pos, pos+len]
    - print out the gene names


In [58]:
print("reading sequences")

outfp=open(sys.argv[3], 'w')
for line in open(sys.argv[2]):
    name, chrm, pos, seq = line.strip().split()
    genes=table[chrm][int(pos):int(pos)+len(seq)]
    if genes:
        print("\t".join([name, chrm, pos, seq]))
        for gene in genes:
            print (f'\t{gene.data}')
print("done")
outfp.close()


reading sequences
seq1	chr1	674540	ATCTGTGCAGAGGAGAACGCAGCTCCGCCCTCGCGGT
	uc001abm.2
	uc002khh.3
seq2	chr19	575000	AGAGGAGAACGCAGCTCCGCCCTCGCGGTGCTCTCCG
	uc002loy.3
	uc002loz.3
	uc002lpa.3
	uc021ulx.1
seq3	chr5	441682	TCTGCATCTGCTCTGGTGTCTTCTGCCATATCACTGC
	uc010ita.3
done


## Using Packages

Packages are add-ons to python that provide additional functions
- matplotlib: lets you visualize plots
- os: lets you use files on your computer
- pandas: lets you manage databases

Can download and install packages using anaconda

How do I know what packages I need?
- Google is your friend: "Best way to visualize graph in python?"
    - returns a number of possible packages, can use any and decide on your favorite
 
Why do I have to find and download packages?
- Python with all of its packages is a massive program that likely couldn't run effectively on personal machines.
- Allowing users to choose their own packages greatly reduces the overall bulk of the program
    - faster execution and performance

## Important Packages 
- numpy: high performance arrays
- scipy: stats, linear alg, etc.
- multiprocessing: easy parallelism
- matplotlib: plotting
- pandas: stats with R-like dataframes
- flask, web2py: web applications

## Python Resources we like

- anaconda python: www.continuum.io
- Jupyter notebook
- pycharm debugger: www.jetbrains.com
- _Introducing Python_, Bill Lubanovic, O'Reilly
- _Python in a Nutshell_, Alex Martelli, O'Reilly
- _Python Cookbook_, Alex Martelli, O'Reilly
- Google's python class: https://www.youtube.com/watch?v=tKTZoB2Vjukxo
- https://docs.python.org/3/tutorial
- codecademy https://www.codecademy.com/learn/learn-python
