# Research Programming in the Life Sciences
## functions, modules and files 

- David L. Bernick, PhD
- Biomolecular Engineering
- Baskin School of Engineering
- UCSC

# Homework
 
## Reading
 - Functions (and methods) - Model Ch 2. pp 24-29
 - Modules - Model Ch 2.  pp 34-41, 44
 - Namespaces - Model Ch 2. pp 21-22, 27, 34-37
 
## Lab
 - Lab 4 due next Monday
 - submit in “assignments” section of Canvas

# Questions:
## Feedback
 - Class pace
 - Keeping up with reading
 - Textbook
 - More...

# Overview
 - Functions
 - Namespaces
 - Modules
 - Resources

# Functions: General Concept
A function is a compound statement block 
![Functions](Lecture7Functions.png)
 - Name the function (def) using the def header line
 - Code for the function is provided as an indented suite
 - Results of the function calculation can be __*return*__ed
 - Arguments are passed to the function by the caller, enclosed in ( )
 - Arguments are seen by the function as named parameters

# Functions: advantages
 - Allows you to reuse code
 - Easier to test your code
 - Organizes your code
 - More reliable/robust code
 - Speeds up development time
 - Break up into smaller problems
 - Caller is only concerned with inputs and outputs

# Functions: Definition
A function consists of:
 - def
 - name
 - input parameters (zero or more)
 - docstrings
 - set of statements in a code suite (indentation)
 - return value(s) - optionally

In [None]:
def addTwo (x, y):
    """Return the sum of two values."""
    s = x + y
    return s

def undecided ():
    pass

# ProteinParam __init__

In [None]:
class ProteinParam :
    def __init__ (self, protein):
        '''Build initial AA composition.'''
  
        self.aaComp = {
            'A': 0, 'G': 0, 'M': 0, 'S': 0, 'C': 0,
            'H': 0, 'N': 0, 'T': 0, 'D': 0, 'I': 0,
            'P': 0, 'V': 0, 'E': 0, 'K': 0, 'Q': 0,
            'W': 0, 'F': 0, 'L': 0, 'R': 0, 'Y': 0
            }
        # count symbols in protein
        # ignoring any bad characters

        for aa in protein.upper():
            if aa in self.aaComp: # count valid AA
                self.aaComp[aa] += 1

# molecular Weight

In [None]:
def molecularWeight (self):
    ''' Determine the molecular weight of the protein, in daltons.'''
        
    aa2mw = {
        'A':  89.093,'G':  75.067,'M': 149.211,'S': 105.093,'C': 121.158,
        'H': 155.155,'N': 132.118,'T': 119.119,'D': 133.103,'I': 131.173,
        'P': 115.131,'V': 117.146,'E': 147.129,'K': 146.188,'Q': 146.145,
        'W': 204.225,'F': 165.189,'L': 131.173,'R': 174.201,'Y': 181.189
        }
    mwH2O = 18.015 # the molecular weight of water
       
    # for each AA in the composition, find their MW and add to the total
    totalMW = 0.0
    # iterate over the previously calculated aa Composition
    
            
    # subtract the weight of water 
    # removed by peptide bond formation
    # make sure to deal with cases where there are fewer then 2 valid AAs
    

    return totalMW

# Functions: Input Parameters
 - A function has zero or more input parameters
 - optional parameters have "default" values

In [None]:
def cutDNA (seq, RE='GAATTC', offset=1):
    '''Split a DNA sequence based on a RE match.'''
    newSeq = []
    tmpSeq = seq
    matchPos = tmpSeq.find(RE)
    while matchPos > 0:
        cutPos = matchPos + offset
        newSeq.append(tmpSeq[:cutPos])
        tmpSeq = tmpSeq[cutPos:]
        matchPos = tmpSeq.find(RE)
    newSeq.append(tmpSeq)
    return newSeq

# Nested Functions
 - functions can call functions, 
 - even themselves (recursive)
 - and have distinct namespaces
![Call Return Tree](Lecture7CallReturn.png)

# Function Summary
 - Perform a specific task 
 - Define their own namespace
 - Components of a function:

 - def, name, parameters (defaults), return
 - docstrings
 - pass (useful for initial design)
 - assert* and error handling
 - nested functions calls

# Namespace
Python names are only unique to a “Namespace”
![Namespaces](Lecture7Namespaces.png)
 - pH in the .pI() namespace 
 - pH in the .charge() namespace

# Namespaces Continued
 - names in different namespaces are unrelated

In [1]:
a = 0
def nsExample(a):
    c = 5
    while a < 3:
        print (a)
        a += 1
    return c

b = nsExample(a)
print ("a equals {}".format(a)) 
print ("b equals {}".format(b)) 
print ("c equals {}".format(c))

0
1
2
a equals 0
b equals 5


NameError: name 'c' is not defined

# namespaces of functions
 - methods and functions have their own namespaces
 - for example:

In [2]:
def plusplus (y):
    y += 1
    return y

someOtherX = 4
z = plusplus (someOtherX)
print (someOtherX,z)



4 5


# namespace in Functions
 - Namespace defines the scope(use) of a name
 - A name’s namespace is established by its defining block
 - All blocks contained in the defining block have access to that name
 - A function defines its own block, 
 - recursive functions have recursively defined namespaces

# Modularity
decomposing a system into smaller “modules”

# Code modularity
Suppose that your work includes physical and chemical characterization of biological sequences
 - Count bases
 - Count amino acids
 - Calculate hydrophobic amino acids
 - Calculate hydrophilic amino acids
 - Calculate GC-richness of a sequence
 - Calculate molecular weight
 - Calculate codon usage

# Python Modules
contain classes, methods, functions, andconstant data that allow you to:
- reuse functions that you often need
- organize your code so it's easier to read and write
- reduce the time to develop new function
- hold data (usually constants)

# Module: Import
 - To get access to a module's contents, use the import statement
 ![Importing](Lecture7Importing.png)
 

# Some Useful Modules
 - os: 
     - collection of > 150 functions and > 50 different data definitions specific to the operating system you're running on
     - http://docs.python.org/library/os
 - math: 
     - collection of 40 mathematical functions and 2 data definitions (e, pi)
     - http://docs.python.org/library/math
 - sys: 
     - collection of > 20 functions and > 40 different data definitions for interacting with the interpreter
     - http://docs.python.org/library/sys

# Importing and Using Modules

In [None]:
import math
# calculate sqrt
math.sqrt(9)

# determine natural log – ln(x)
math.log(1)

# determine log base 2
math.log(8, 2)

# calculate sine of π/2
math.sin(math.pi/2)

# Module: Selective Import
 - 3 variations of selective import
 - __from modulname import * should be avoided__
 ![Selective Import](Lecture7SelectiveImport.png)

# Selective Import of a Module

In [3]:
from math import sqrt, log
# calculate sqrt
sqrt(9)        # don't need math.

# determine natural log – ln(x)
log(1)         # don't need math.

# determine log base 2
log(8, 2)      # don't need math.

# calculate sin of π/2
math.sin(math.pi/2)   # produces an error (wasn't imported and math not defined)

NameError: name 'math' is not defined

# Create Your Own Module
Different programs that need access to the same functions or constant data tables
 - Examples:
     - codon to AA tables (lab 2, 3 and 4)
     - DNAstring class
 - Future ideas:
     - aaComposition (lab 3 and 4) if we could figure out how to use the codon table
     - reading fastA files (lab4 and nice for lab 3 ?)
     - calculateMW.py and calculateEC.py from lab 3, if only they read an aaComposition

# Including Our New Module
Write modules to reuse common data and functions whenever possible

In [None]:
import sequenceAnalysis
sequenceAnalysis.NucParams.rnaCodonTable

# or

from sequenceAnalysis import NucParams
myNucParams = NucParams('ATGAAACCCGGGTAG')

# or

from sequenceAnalysis import NucParams as NucStuf
myNucParams = NucStuf('ATGAAACCCGGGTAG')

# Modules Summary
Allows us to collect related classes, functions and constants into a single file. These can then be shared using import

In [None]:
import mod1 # makes all of mod1 available as mod1.func1
from mod1 import func1 # makes func1 available as func1

from mod1 import func1 as newname # makes func1 available as newname

 - Allows you to reuse and share components 
 - Avoid:

In [None]:
from mod1 import *

# Codon resources
 - Codon usage database
     - http://www.kazusa.or.jp/codon/readme_codon.html
 - cusp
     - search for “cusp” at http://mobyle.pasteur.fr/cgi-bin/portal.py

##### Final Project: Proposal
February 5, abstracts due