# Procedural programming in python

## Topics

* Flow control, part 2
 * Functions
 * In class exercise:
   * Functionalize this!
 * From nothing to something:
   * Pairwise correlation between rows in a pandas dataframe
   * Sketch of the process
   * In class exercise:
     * Write the code!
   * Rejoining, sharing ideas, problems, thoughts

<hr>

<hr>
## Flow control

<img src="https://docs.oracle.com/cd/B19306_01/appdev.102/b14261/lnpls008.gif">Flow control figure</img>

Flow control refers how to programs do loops, conditional execution, and order of functional operations.  

### If
If statements can be use to execute some lines or block of code if a particular condition is satisfied.  E.g. Let's print something based on the entries in the list.

In [None]:
instructors = ['Dave', 'Jim', 'Dorkus the Clown']

if 'Dorkus the Clown' in instructors:
    print('#fakeinstructor')

There is a special do nothing word: `pass` that skips over some arm of a conditional, e.g.

In [None]:
if 'Jim' in instructors:
    print("Congratulations!  Jim is teaching, your class won't stink!")
else:
    pass

## For

For loops are the standard loop, though `while` is also common.  For has the general form:
```
for items in list:
    do stuff
```

For loops and collections like tuples, lists and dictionaries are natural friends.

In [None]:
for instructor in instructors:
    print(instructor)

You can combine loops and conditionals:

In [None]:
for instructor in instructors:
    if instructor.endswith('Clown'):
        print(instructor + " doesn't sound like a real instructor name!")
    else:
        print(instructor + " is so smart... all those gooey brains!")

### range()

Since for operates over lists, it is common to want to do something like:
```
NOTE: C-like
for (i = 0; i < 3; ++i) {
    print(i);
}
```

The Python equivalent is:

```
for i in [0, 1, 2]:
    do something with i
```

What happens when the range you want to sample is big, e.g.
```
NOTE: C-like
for (i = 0; i < 1000000000; ++i) {
    print(i);
}
```

That would be a real pain in the rear to have to write out the entire list from 1 to 1000000000.

Enter, the `range()` function.  E.g.
 ```range(3) is [0, 1, 2]```

In [1]:
sum = 0
for i in range(10):
    sum += i
print(sum)

45


<hr>

### Functions

For loops let you repeat some code for every item in a list.  Functions are similar in that they run the same lines of code for new values of some variable.  They are different in that functions are not limited to looping over items.

Functions are a critical part of writing easy to read, reusable code.

Create a function like:
```
def function_name (parameters):
    """
    docstring
    """
    function expressions
    return [variable]
```

_Note:_ Sometimes I use the word argument in place of parameter.

Here is a simple example.  It prints a string that was passed in and returns nothing.

In [20]:
def print_string(str):
    """This prints out a string passed as the parameter."""
    print(str)
    for c in str:
        print(c)
        if c == 'r':
            break
    print("done")
    return


In [21]:
print_string("string")

string
s
t
r
done


To call the function, use:
```
print_string("Dave is awesome!")
```

_Note:_ The function has to be defined before you can call it!

In [None]:
print_string("Dave is awesome!")

If you don't provide an argument or too many, you get an error.

In [7]:
#print_string()

Parameters (or arguments) in Python are all passed by reference.  This means that if you modify the parameters in the function, they are modified outside of the function.

See the following example:

```
def change_list(my_list):
   """This changes a passed list into this function"""
   my_list.append('four');
   print('list inside the function: ', my_list)
   return

my_list = [1, 2, 3];
print('list before the function: ', my_list)
change_list(my_list);
print('list after the function: ', my_list)
```

In [23]:
def change_list(my_list):
   """This changes a passed list into this function"""
   my_list.append('four');
   print('list inside the function: ', my_list)
   return

my_list = [1, 2, 3];
print('list before the function: ', my_list)
change_list(my_list);
print('list after the function: ', my_list)

list before the function:  [1, 2, 3]
list inside the function:  [1, 2, 3, 'four']
list after the function:  [1, 2, 3, 'four']


Variables have scope: `global` and `local`

In a function, new variables that you create are not saved when the function returns - these are `local` variables.  Variables defined outside of the function can be accessed but not changed - these are `global` variables, _Note_ there is a way to do this with the `global` keyword.  Generally, the use of `global` variables is not encouraged, instead use parameters.

```
my_global_1 = 'bad idea'
my_global_2 = 'another bad one'
my_global_3 = 'better idea'

def my_function():
    print(my_global_1)
    my_global_2 = 'broke your global, man!'
    global my_global_3
    my_global_3 = 'still a better idea'
    return
    
my_function()
print(my_global_2)
print(my_global_3)
```

In [25]:
my_global_1 = 'bad idea'
my_global_2 = 'another bad one'
my_global_3 = 'better idea'

def my_function():
    print(my_global_1)
    my_global_2 = 'broke your global, man!'
    print(my_global_2)
    global my_global_3
    my_global_3 = 'still a better idea'
    return

my_function()
print(my_global_2)
print(my_global_3)

bad idea
broke your global, man!
another bad one
still a better idea


In general, you want to use parameters to provide data to a function and return a result with the `return`. E.g.

```
def sum(x, y):
    my_sum = x + y
    return my_sum
```

If you are going to return multiple objects, what data structure that we talked about can be used?  Give and example below.

In [30]:
def a_function(parameter):
    return None



In [31]:
foo = a_function('bar')
print(foo)

None


### Parameters have three different types:

| type | behavior |
|------|----------|
| required | positional, must be present or error, e.g. `my_func(first_name, last_name)` |
| keyword | position independent, e.g. `my_func(first_name, last_name)` can be called `my_func(first_name='Dave', last_name='Beck')` or `my_func(last_name='Beck', first_name='Dave')` |
| default | keyword params that default to a value if not provided |


In [32]:
def print_name(first, last='the Clown'):
    print('Your name is %s %s' % (first, last))
    return

Take a minute and play around with the above function.  Which are required?  Keyword?  Default?

In [34]:
def massive_correlation_analysis(data, method='pearson'):
    pass
    return

Functions can contain any code that you put anywhere else including:
* if...elif...else
* for...else
* while
* other function calls

In [39]:
def print_name_age(first, last, age):
    print_name(first, last)
    print('Your age is %d' % (age))
    print('Your age is ' + str(age))
    if age > 35:
        print('You are really old.')
    return

In [40]:
print_name_age(age=40, last='Beck', first='Dave')


Your name is Dave Beck
Your age is 40
Your age is 40
You are really old.


Once you have some code that is functionalized and not going to change, you can move it to a file that ends in `.py`, check it into version control, import it into your notebook and use it!

Let's do this now for the above two functions.

...


    See you after the break!

Import the function...

Call them!

<hr>
## Hacky Hack Time with Functions!

Notes from last class:
* The `os` package has tools for checking if a file exists: ``os.path.exists``
```
import os
filename = 'HCEPDB_moldata.zip'
if os.path.exists(filename):
    print("wahoo!")
```
* Use the `requests` package to get the file given a url (got this from the requests docs)
```
import requests
url = 'http://faculty.washington.edu/dacb/HCEPDB_moldata.zip'
req = requests.get(url)
assert req.status_code == 200 # if the download failed, this line will generate an error
with open(filename, 'wb') as f:
    f.write(req.content)
```
* Use the `zipfile` package to decompress the file while reading it into `pandas`
```
import pandas as pd
import zipfile
csv_filename = 'HCEPDB_moldata.csv'
zf = zipfile.ZipFile(filename)
data = pd.read_csv(zf.open(csv_filename))
```

Here was my solution
```
import os
import requests
import pandas as pd
import zipfile

filename = 'HCEPDB_moldata.zip'
url = 'http://faculty.washington.edu/dacb/HCEPDB_moldata.zip'
csv_filename = 'HCEPDB_moldata.csv'

if os.path.exists(filename):
    pass
else:
    req = requests.get(url)
    assert req.status_code == 200 # if the download failed, this line will generate an error
    with open(filename, 'wb') as f:
        f.write(req.content)

zf = zipfile.ZipFile(filename)
data = pd.read_csv(zf.open(csv_filename))
```



My solution:

In [4]:
def download_if_not_exists(url, filename):
    if os.path.exists(filename):
        pass
    else:
        req = requests.get(url)
        assert req.status_code == 200 # if the download failed, this line will generate an error
        with open(filename, 'wb') as f:
            f.write(req.content)

In [5]:
def load_HCEPDB_data(url, zip_filename, csv_filename):
    download_if_not_exists(url, zip_filename)
    zf = zipfile.ZipFile(zip_filename)
    data = pd.read_csv(zf.open(csv_filename))
    return data

In [6]:
import os
import requests
import pandas as pd
import zipfile

load_HCEPDB_data('http://faculty.washington.edu/dacb/HCEPDB_moldata_set1.zip', 'HCEPDB_moldata_set1.zip', 'HCEPDB_moldata_set1.csv')

Unnamed: 0,id,SMILES_str,stoich_str,mass,pce,voc,jsc,e_homo_alpha,e_gap_alpha,e_lumo_alpha,tmp_smiles_str
0,655365,C1C=CC=C1c1cc2[se]c3c4occc4c4nsnc4c3c2cn1,C18H9N3OSSe,394.3151,5.161953,0.867601,91.567575,-5.467601,2.022944,-3.444656,C1=CC=C(C1)c1cc2[se]c3c4occc4c4nsnc4c3c2cn1
1,1245190,C1C=CC=C1c1cc2[se]c3c(ncc4ccccc34)c2c2=C[SiH2]...,C22H15NSeSi,400.4135,5.261398,0.504824,160.401549,-5.104824,1.630750,-3.474074,C1=CC=C(C1)c1cc2[se]c3c(ncc4ccccc34)c2c2=C[SiH...
2,65553,[SiH2]1C=CC2=C1C=C([SiH2]2)C1=Cc2[se]ccc2[SiH2]1,C12H12SeSi3,319.4448,6.138294,0.630274,149.887545,-5.230274,1.682250,-3.548025,C1=CC2=C([SiH2]1)C=C([SiH2]2)C1=Cc2[se]ccc2[Si...
3,720918,C1C=c2c3ccsc3c3[se]c4cc(oc4c3c2=C1)C1=CC=CC1,C20H12OSSe,379.3398,1.991366,0.242119,126.581347,-4.842119,1.809439,-3.032680,C1=CC=C(C1)c1cc2[se]c3c4sccc4c4=CCC=c4c3c2o1
4,1310744,C1C=CC=C1c1cc2[se]c3c(c4nsnc4c4ccncc34)c2c2ccc...,C24H13N3SSe,454.4137,5.605135,0.951911,90.622776,-5.551911,2.029717,-3.522194,C1=CC=C(C1)c1cc2[se]c3c(c4nsnc4c4ccncc34)c2c2c...
5,196637,C1C=CC=C1c1cc2[se]c3cc4ccsc4cc3c2[se]1,C17H10SSe2,404.2520,2.644436,0.587932,69.223461,-5.187932,2.201106,-2.986827,C1=CC=C(C1)c1cc2[se]c3cc4ccsc4cc3c2[se]1
6,262174,C1C=CC=C1c1cc2[se]c3c4occc4c4cscc4c3c2[se]1,C19H10OSSe2,444.2730,2.523057,0.397670,97.645325,-4.997670,1.982122,-3.015548,C1=CC=C(C1)c1cc2[se]c3c4occc4c4cscc4c3c2[se]1
7,393249,C1C=CC=C1c1cc2[se]c3cc4cccnc4cc3c2c2ccccc12,C24H15NSe,396.3495,3.115895,0.869140,55.174815,-5.469140,2.331815,-3.137325,C1=CC=C(C1)c1cc2[se]c3cc4cccnc4cc3c2c2ccccc12
8,35,C1C2=C([SiH2]C=C2)C=C1c1cc2occc2c2cscc12,C17H12OSSi,292.4328,2.743214,0.387106,109.062905,-4.987106,1.909966,-3.077141,C1=CC2=C([SiH2]1)C=C(C2)c1cc2occc2c2cscc12
9,1048612,C1C=CC=C1C1=Cc2sc3cc4C=C[SiH2]c4cc3c2C1,C18H14SSi,290.4606,2.408411,0.431315,85.937708,-5.031315,2.065850,-2.965465,C1=CC=C(C1)C1=Cc2sc3cc4C=C[SiH2]c4cc3c2C1


How many functions did you use?

Why did you choose to use functions for these pieces?

<HR>
## From something to nothing

### Task: Compute the pairwise Pearson correlation between rows in a dataframe.

Let's say we have three molecules (A, B, C) with three measurements each (v1, v2, v3).  So for each molecule we have a vector of measurements:

$$X=\begin{bmatrix}
         X_{v_{1}} \\
         X_{v_{2}} \\
         X_{v_{3}} \\
        \end{bmatrix} $$
        
Where X is a molecule and the components are the values for each of the measurements.  These make up the rows in our matrix.

Often, we want to compare molecules to determine how similar or different they are.  One measure is the Pearson correlation.

Pearson correlation: <img src="https://wikimedia.org/api/rest_v1/media/math/render/svg/01d103c10e6d4f477953a9b48c69d19a954d978a"/>

Expressed graphically, when you plot the paired measurements for two samples (in this case molecules) against each other you can see positively correlated, no correlation, and negatively correlated.  Eg.
<img src="http://www.statisticshowto.com/wp-content/uploads/2012/10/pearson-2-small.png"/>


Simple input dataframe (_note_ when you are writing code it is always a good idea to have a simple test case where you can readily compute by hand or know the output):

| index | v1 | v2 | v3 |
|-------|----|----|----|
| A     | -1 | 0  | 1  |
| B     | 1  | 0  | -1 |
| C     | .5 | 0  | .5 |

* If the above is a dataframe what shape and size is the output?

* Whare are some unique features of the output?

For our test case, what will the output be?

|   | A | B | C |
|---|---|---|---|
| A | 1 | -1 | 0 |
| B | -1 | 1 | 0 |
| C | 0 | 0 | 1 |

### Let's sketch the idea...

In [35]:
import pandas as pd
import numpy as np
import os
import zipfile
import math
import matplotlib as plt
from scipy.stats import pearsonr
%matplotlib inline 

df = pd.DataFrame([[-1,0,1],[1,0,-1],[0.5,0,0.5]])
Answer = 0
def pairwise_correlation(df):
    for index,row1 in df.iterrows():
        r1 = row1
        for index2,row2 in df.iterrows():
            r2 = row2
            cor=pearsonr(r1,r2)
            

In [37]:
pairwise_correlation(df)

(1.0, 0.0)
(-1.0, 0.0)
(0.0, 1.0)
(-1.0, 0.0)
(1.0, 0.0)
(0.0, 1.0)
(0.0, 1.0)
(0.0, 1.0)
(1.0, 0.0)


(1.0, 0.0)

## In class exercise
### 20-30 minutes
#### Objectives: 
1. Write code using functions to compute the pairwise Pearson correlation between rows in a pandas dataframe.  You will have to use ``for`` and possibly ``if``.
2. Use a cell to test each function with an input that yields an expected output.  Think about the shape and values of the outputs.
3. Put the code in a ``.py`` file in the directory with the Jupyter notebook, import and run!


#### To help you get started...
To create the sample dataframe:
```
df = pd.DataFrame([[-1, 0, 1], [1, 0, -1], [.5, 0, .5]])
```

To loop over rows in a dataframe, check out (Google is your friend):
```
DataFrame.iterrows
```

In [22]:
df = pd.DataFrame([[-1, 0, 1], [1, 0, -1], [.5, 0, .5]])

In [23]:
df

Unnamed: 0,0,1,2
0,-1.0,0,1.0
1,1.0,0,-1.0
2,0.5,0,0.5


In [None]:
for row in df.

<hr>
## How do we know it is working?


#### Use the test case!
Our three row example is a useful tool for checking that our code is working.  We can write some tests that compare the output of our functions to our expectations.

E.g. The diagonals should be 1, and corr(A, B) = -1, ...

#### But first, let's talk ``assert`` and ``raise``

We've already briefly been exposed to assert in this code:
```
if os.path.exists(filename):
    pass
else:
    req = requests.get(url)
    # if the download failed, next line will raise an error
    assert req.status_code == 200
    with open(filename, 'wb') as f:
        f.write(req.content)
```

What is the assert doing there?

Let's play with ``assert``.  What should the following asserts do?
```
assert True == False, "You assert wrongly, sir!"
assert 'Dave' in instructors
assert function_that_returns_True_or_False(parameters)
```

So when an assert statement is true, the code keeps executing and when it is false, it ``raises`` an exception (also known as an error).

We've all probably seen lots of exception.  E.g.

```
def some_function(parameter):
    return

some_function()
```

```
some_dict = { }
print(some_dict['invalid key'])
```

```
'fourty' + 2
```

Like C++ and other languages, Python let's you ``raise`` your own exception.  You can do it with ``raise`` (surprise!).  Exceptions are special objects and you can create your own type of exceptions.  For now, we are going to look at the simplest ``Exception``.

We create an ``Exception`` object by calling the generator:
```
Exception()
```

This isn't very helpful.  We really want to supply a description.  The Exception object takes any number of strings.  One good form if you are using the generic exception object is:
```
Exception('Short description', 'Long description')
```


Creating an exception object isn't useful alone, however.  We need to send it down the software stack to the Python interpreter so that it can handle the exception condition.  We do this with ``raise``.

```
raise Exception("An error has occurred.")
```

Now you can create your own error messages like a pro!

#### DETOUR!

There are lots of types of exceptions beyond the generic class ``Exception``.  You can use them in your own code if they make sense.  E.g.  
```
import math
my_variable = math.inf
if my_variable == math.inf:
    raise ValueError('my_variable cannot be infinity')
```

<p>List of Standard Exceptions &minus;</p>
<table class="table table-bordered">
<tr>
<th><b>EXCEPTION NAME</b></th>
<th><b>DESCRIPTION</b></th>
</tr>
<tr>
<td>Exception</td>
<td>Base class for all exceptions</td>
</tr>
<tr>
<td>StopIteration</td>
<td>Raised when the next() method of an iterator does not point to any object.</td>
</tr>
<tr>
<td>SystemExit</td>
<td>Raised by the sys.exit() function.</td>
</tr>
<tr>
<td>StandardError</td>
<td>Base class for all built-in exceptions except StopIteration and SystemExit.</td>
</tr>
<tr>
<td>ArithmeticError</td>
<td>Base class for all errors that occur for numeric calculation.</td>
</tr>
<tr>
<td>OverflowError</td>
<td>Raised when a calculation exceeds maximum limit for a numeric type.</td>
</tr>
<tr>
<td>FloatingPointError</td>
<td>Raised when a floating point calculation fails.</td>
</tr>
<tr>
<td>ZeroDivisonError</td>
<td>Raised when division or modulo by zero takes place for all numeric types.</td>
</tr>
<tr>
<td>AssertionError</td>
<td>Raised in case of failure of the Assert statement.</td>
</tr>
<tr>
<td>AttributeError</td>
<td>Raised in case of failure of attribute reference or assignment.</td>
</tr>
<tr>
<td>EOFError</td>
<td>Raised when there is no input from either the raw_input() or input() function and the end of file is reached.</td>
</tr>
<tr>
<td>ImportError</td>
<td>Raised when an import statement fails.</td>
</tr>
<tr>
<td>KeyboardInterrupt</td>
<td>Raised when the user interrupts program execution, usually by pressing Ctrl+c.</td>
</tr>
<tr>
<td>LookupError</td>
<td>Base class for all lookup errors.</td>
</tr>
<tr>
<td><p>IndexError</p><p>KeyError</p></td>
<td><p>Raised when an index is not found in a sequence.</p><p>Raised when the specified key is not found in the dictionary.</p></td>
</tr>
<tr>
<td>NameError</td>
<td>Raised when an identifier is not found in the local or global namespace.</td>
</tr>
<tr>
<td><p>UnboundLocalError</p><p>EnvironmentError</p></td>
<td><p>Raised when trying to access a local variable in a function or method but no value has been assigned to it.</p><p>Base class for all exceptions that occur outside the Python environment.</p></td>
</tr>
<tr>
<td><p>IOError</p><p>IOError</p></td>
<td><p>Raised when an input/ output operation fails, such as the print statement or the open() function when trying to open a file that does not exist.</p><p>Raised for operating system-related errors.</p></td>
</tr>
<tr>
<td><p>SyntaxError</p><p>IndentationError</p></td>
<td><p>Raised when there is an error in Python syntax.</p><p>Raised when indentation is not specified properly.</p></td>
</tr>
<tr>
<td>SystemError</td>
<td>Raised when the interpreter finds an internal problem, but when this error is encountered the Python interpreter does not exit.</td>
</tr>
<tr>
<td>SystemExit</td>
<td>Raised when Python interpreter is quit by using the sys.exit() function. If not handled in the code, causes the interpreter to exit.</td>
</tr>
<tr>
<td>Raised when Python interpreter is quit by using the sys.exit() function. If not handled in the code, causes the interpreter to exit.</td>
<td>Raised when an operation or function is attempted that is invalid for the specified data type.</td>
</tr>
<tr>
<td>ValueError</td>
<td>Raised when the built-in function for a data type has the valid type of arguments, but the arguments have invalid values specified.</td>
</tr>
<tr>
<td>RuntimeError</td>
<td>Raised when a generated error does not fall into any category.</td>
</tr>
<tr>
<td>NotImplementedError</td>
<td>Raised when an abstract method that needs to be implemented in an inherited class is not actually implemented.</td>
</tr>
</table>

#### Put it all together... ``assert`` and ``raise``

Breaking assert down, it is really just an if test followed by a raise.  So the code below:
```
assert <some_test>, <message>
```
is equivalent to a short hand for:
```
if not <some_test>:
        raise AssertionError(<message>)       
```

Prove it?  OK.

```
instructors = ['Dorkus the Clown', 'Jim']
assert 'Dave' in instructors, "Dave isn't in the instructor list!"
```

```
instructors = ['Dorkus the Clown', 'Jim']
assert 'Dave' in instructors, "Dave isn't in the instructor list!"
if not 'Dave' in instructors:
    raise AssertionError("Dave isn't in the instructor list!")
```

#### Questions?


### All of this was in preparation for some testing...

Can we write some quick tests that make sure our code is doing what we think it is?  Something of the form:

```
corr_matrix = pairwise_row_correlations(my_sample_dataframe)
assert corr_matrix looks like what we expect, "The function is broken!"
```

What are the smallest units of code that we can test?

What asserts can we make for these pieces of code?

#### Remember, in computers, 1.0 does not necessarily = 1

Put the following in an empty cell:
```
.99999999999999999999
```

How can we test for two floating point numbers being (almost) equal? Pro tip:  [Google!](http://lmgtfy.com/?q=python+assert+almost+equal)


## From nothing to something wrap up

Here we created some functions from just a short description of our needs.  
* Before we wrote any code, we walked through the flow control and decided on the parts that were necessary.
* Before we wrote any code, we created a simple test example with simple predictable output.
* We wrote some code according to our specifications.
* We wrote tests using ``assert`` to verify our code against the simple test example.

Next: errors, part 2; unit tests; debugging;


### QUESTIONS?