# Introduction to Python

## Stephen Weston and Robert Bjornson  
## Yale Center for Research Computing  
## Jan 2017

## What is the Yale Center for Research Computing?


- Independent center under the Provost's office
- Created to support your research computing needs
- Focus is on high performance computing and storage
- ~15 staff, including applications specialists and system engineers
- Available to consult with and educate users
- Manage compute clusters and support users
- Located at 160 St. Ronan st, at the corner of Edwards and St. Ronan
- http://research.computing.yale.edu



## Why Python?
- Free, portable, easy to learn
- Wildly popular, huge and growing community
- Intuitive, natural syntax
- Ideal for rapid prototyping but also for large applications
- Very efficient to write, reasonably efficient to run as is
- Can be very efficient (numpy, cython, ...)
- Huge number of packages (modules)


## You can use Python to...
- Convert or filter files
- Automate repetitive tasks
- Compute statistics
- Build processing pipelines
- Build simple web applications
- Perform large numerical computations
- ...

You can use Python instead of bash, Java, or C

Python can be run interactively or as a program

![alt text](https://github.com/adam-p/markdown-here/raw/master/src/common/images/icon48.png "Logo Title Text 1")


## Basic Python Types

In [1]:
radius=2
pi=3.14
diam=radius*2
area=pi*(radius**2)
title="fun with strings"
pi="cherry"
longnum=31415926535897932384626433832795028841971693993751058\
2097494459230781640628620899862803482534211706798214808651
delicious=True


- variables do not need to be declared or typed
- integers and floating points can be used together
- the same variable can hold different types
- lines can be broken using \
- python supports arbitrary length integer numbers


In [2]:
area*58



728.48

## Other Python Types: _lists_

Lists are like arrays in other languages.  



In [3]:
l=[1,2,3,4,5,6,7,8,9,10]
l[5]

6

In [4]:
l[3:5]


[4, 5]

In [5]:
l[5:]

[6, 7, 8, 9, 10]

In [6]:
l[5:-3]

[6, 7]

In [34]:
>>> l[2]=3.14
>>> l[3]="pi"
>>> l

[1, 2, 3.14, 'pi', 7, 8, 9]

In [8]:
>>> len(l)

10

In [36]:
l=range(1,10)
print l
print l[4:6]
print l[-6:-3]

[1, 2, 3, 4, 5, 6, 7, 8, 9]
[5, 6]
[4, 5, 6]


## Lists are more flexible than arrays, e.g.:
- Insert or append new elements
- remove elements
- nest lists
- combine values of different types into lists


In [32]:
>>> l=[1,2,3,4,5,6,7,8,9]
>>> l+[11,12,13]

[1, 2, 3, 4, 5, 6, 7, 8, 9, 11, 12, 13]

In [33]:
>>> l[3:6]=['four to six']
>>> l


[1, 2, 3, 'four to six', 7, 8, 9]

This is some weird text
more

In [11]:
# Lists
l=[1,2,3,4,"a", '"b"', "c"]
l

[1, 2, 3, 4, 'a', '"b"', 'c']

In [12]:
# Tuples
t=(1,2,3,4,5,6)
t

(1, 2, 3, 4, 5, 6)

In [13]:
# Strings
s="Donald"
s

'Donald'

In [14]:
# Hash Example
coins={"penny":1, "nickle":5, "dime":10, "quarter":25}
coins

{'dime': 10, 'nickle': 5, 'penny': 1, 'quarter': 25}

In [15]:
import random
random.randint(88, 100)

91

In [16]:
# Example of if statement
import random
v=random.randint(0,100)
if v < 50:
    print "small", v
    print "something else"
    print "yet more"
else:
    print "big", v
    print "yada yada"
print "after else"

small 16
something else
yet more
after else


In [17]:
# Example of while statement
import random
count=0
while count<100:
    count=count+random.randint(0,10)
    print count,
    count=count-4
print "here"
print "\nall done"

8 13 19 15 19 22 25 30 35 41 43 42 42 43 48 49 47 51 55 56 57 55 54 58 64 70 72 69 72 68 65 66 66 69 67 70 72 73 78 75 75 75 81 79 81 86 86 85 89 86 87 90 92 88 86 86 84 83 86 87 87 91 87 90 88 92 97 95 100 98 103 101 102 107 here

all done


In [18]:
# Example of for statement
for fruit in ['apple', 'orange','banana']:
    print fruit,
print
for i in range(5):
    print i,

apple orange banana
0 1 2 3 4


In [19]:
range(1,5)

[1, 2, 3, 4]

In [20]:
# Example of looping over dictionary
for denom, val in coins.iteritems(): 
    print denom, val

quarter 25
nickle 5
penny 1
dime 10


In [21]:
# Example of function definition
def area(w, h):
    return w*h
    
print area(4,4)

16


In [22]:
s="160212,1,A1,human,TAAGGCGA-TAGATCGC,None,N,Eland-rna,Mei,Jon_mix10"
s.split(',')[4][:-3]


'TAAGGCGA-TAGAT'

In [23]:
# File Formatter example
import sys
fp=open('badfile.txt')
print fp.readline().strip()
for l in fp:
   flds=l.strip().split(',')
   flds[4]=flds[4][:-3]
   print ','.join(flds)

FCID,Lane,Sample_ID,SampleRef,index,Description,Control,Recipe,Operator,Project
160212,1,A1,human,TAAGGCGA-TAGAT,None,N,Eland-rna,Mei,Jon_mix10
160212,1,A2,human,CGTACTAG-CTCTC,None,N,Eland-rna,Mei,Jon_mix10
160212,1,A3,human,AGGCAGAA-TATCC,None,N,Eland-rna,Mei,Jon_mix10
160212,1,A4,human,TCCTGAGC-AGAGT,None,N,Eland-rna,Mei,Jon_mix10


In [24]:
# OS walk example
import os
for d, dirs, files in os.walk('d1'):
    print d, dirs, files

d1 ['d2'] ['f1.txt']
d1/d2 [] ['f2.txt']


In [25]:
# Interval trees
from intervaltree import IntervalTree
it = IntervalTree()
it[4:7]='I1'
it[5:10]='I2'
it[1:11]='I3'

print it[8]

set([Interval(1, 11, 'I3'), Interval(5, 10, 'I2')])


In [26]:
import sys
from intervaltree import IntervalTree

print "initializing"
genefinder={}
for line in open('knownGene.txt'):
    genename, chrm, strand, start, end = line.split()[0:5]
    if not chrm in genefinder:
        genefinder[chrm]=IntervalTree()
    genefinder[chrm][int(start):int(end)]=genename

print "reading sequences"
for line in open('sample_hits.sam'):
    tag, flag, chrm, pos, mapq, cigar, rnext, \
        pnext, tlen, seq, qual = line.split()[0:11]
    genes=genefinder[chrm][int(pos):int(pos)+len(seq)]
    if genes:
        print tag
        for gene in genes:
            print '\t',gene.data


initializing
reading sequences
HWI-ST0831:196:C1YCJACXX:2:2211:2571:23347
	uc004cqm.3
	uc010nda.3
	uc004cqn.3
HWI-ST0831:196:C1YCJACXX:2:2114:9661:90395
	uc003zbm.3
HWI-ST0831:196:C1YCJACXX:2:2302:16215:62515
	uc003pvj.3
	uc003pvh.3
	uc010kdy.1
	uc003pvk.3
	uc003pvi.3
HWI-ST0831:196:C1YCJACXX:2:2316:2140:71837
	uc003sxr.4
	uc003sxs.4
HWI-ST0831:196:C1YCJACXX:2:1309:6299:31215
	uc001pha.3
	uc009yxc.3
HWI-ST0831:196:C1YCJACXX:2:1106:2548:78910
	uc022bqs.1
	uc004cov.4
	uc022bqt.1
HWI-ST0831:196:C1YCJACXX:2:2111:18134:4152
	uc002ace.1
	uc002acd.1
HWI-ST0831:196:C1YCJACXX:2:2311:13286:6227
	uc001ouw.3
	uc009yty.3
	uc001ouy.4
	uc001oux.3
HWI-ST0831:196:C1YCJACXX:2:2309:8997:17893
	uc003mda.2
	uc003mdb.1
HWI-ST0831:196:C1YCJACXX:2:1304:11911:24449
	uc002dpf.4
	uc002dpi.4
	uc010byb.3
	uc002dph.4
	uc002dpg.4
	uc010vcs.2
	uc010byc.3
HWI-ST0831:196:C1YCJACXX:2:1301:6375:39747
	uc003kja.3
	uc003kjb.3
HWI-ST0831:196:C1YCJACXX:2:2301:20407:92954
	uc003hsx.3
	uc003hsv.4
	uc010ikv.2
HWI-ST0831:196:C1Y

In [27]:
genefinder['chr22'][16242753]


{Interval(16242753, 16242785, 'uc021wke.1')}

In [28]:
d={0: 8633, 1: 951, 2: 1166, 3: 2085, 4: 1916, 5: 8518, 6: 10255, 7: 10697, 8: 55921, 9: 25955, 10: 44636, 11: 55644, 12: 56152, 13: 51422, 14: 36350, 15: 19657, 16: 11452, 17: 5670, 18: 4922, 19: 2292, 20: 1652, 21: 1411, 22: 650, 23: 744, 24: 459, 25: 226, 26: 322, 27: 109, 28: 26, 29: 37, 30: 45, 31: 10, 32: 8, 33: 3, 34: 4}
d
bins=sorted(d.keys())
vals=[d[k] for k in bins]
bins, vals
import pylib

ImportError: No module named pylib