ISRC Python Workshop: Baiscs II

__Functions, File I/O and External Libraries__

<hr>

@author: Zhiya Zuo

@email: zhiya-zuo@uiowa.edu

---

### Functions

Previously, we have already made use of many built-in functions to facilitate programming. Function is a block of codes with input arguments (and, optionally, return values) for specific purposes. In Python ( and many other languages), a function call is as the following:

```python
>> output = function(input_argument)
```

For example:

In [1]:
range(5)

[0, 1, 2, 3, 4]

Note that we are not limited to built-in functions only. Let's now try make our own functions. Before that, we need to be clear on the structure of a function
```python
def func_name(arg1, arg2, arg3, ...):
    #####################
    # Do something here #
    #####################
    return output
```

\* *`return output` is NOT required*

In the following example, we make use of `sum`, a built-in function to sum up numeric iterables.

In [2]:
def mySum(list_to_sum):
    return sum(list_to_sum)

In [3]:
mySum(range(5))

10

A more complicated one that does not use `sum` function

In [4]:
def mySumUsingLoop(list_to_sum):
    sum_ = list_to_sum[0]
    for item in list_to_sum[1:]:
        sum_ += item
    return sum_

In [5]:
mySumUsingLoop(range(5))

10

*The two example functions are not doing anything interesting but just served as illustrations to build customized functions.*

---

### FIle I/O

This section is about some basics on reading and writing data.

#### Write data to a file

In [6]:
f = open("tmp1.csv", "w") # f is a file handler, while "w" is the mode (w for write)
data = range(10)
for item in data:
    f.write(str(item))
    f.write("\n") # add newline character
f.close() # close the filer handler for security reasons.

Note that without the typecasting from `int` to `str`, an error will be raised.

A more commonly used way:

In [7]:
data = range(10)
with open("tmp2.csv", "w") as f: # f is a file handler, while "w" is the mode (w for write)
    for item in data:
        f.write(str(item))
        f.write("\n") # add newline character

No need to close because of `with`.

See more here:
1. https://stackoverflow.com/questions/3012488/what-is-the-python-with-statement-designed-for
2. https://docs.python.org/3/whatsnew/2.6.html#pep-343-the-with-statement

#### Read data to a file

In [8]:
f = open("tmp1.csv", "r") # this time, use read mode
contents = [item for item in f] # list comprehension. This is the same as for-loop but more concise
print(contents)

['0\n', '1\n', '2\n', '3\n', '4\n', '5\n', '6\n', '7\n', '8\n', '9\n']


In [9]:
contents = [item.strip("\n") for item in contents] # strip the newline
print(contents)

['0', '1', '2', '3', '4', '5', '6', '7', '8', '9']


In [10]:
int_values = map(int, contents) # map the values into integer type
print(int_values)
f.close() # always remember to close the file handler

[0, 1, 2, 3, 4, 5, 6, 7, 8, 9]


Also using with:

In [11]:
with open("tmp.csv", "r") as f:
    contents = [item for item in f] # list comprehension. This is the same as for-loop but more concise
    contents = [item.strip("\n") for item in contents] # strip the newline
    print(contents)
    int_values = map(int, contents) # map the values into integer type
    print(int_values)

['0', '1', '2', '3', '4', '5', '6', '7', '8', '9']
[0, 1, 2, 3, 4, 5, 6, 7, 8, 9]


---

### Libraries

#### Built-in Libraries

Python provides many built-in packages to prevent extra work on some common and useful functions

We will use __math__ as an example.

In [12]:
import math # use import to load a library

In [13]:
x = 3
print("e^x = e^3 = %f"%math.exp(x))
print("log(x) = log(3) = %f"%math.log(x))

e^x = e^3 = 20.085537
log(x) = log(3) = 1.098612


You can also import one specific function:

In [14]:
from math import exp # You can import a specific function
print(exp(x)) # This way, you don't need to use math.exp but just exp

20.0855369232


Or all:

In [15]:
from math import * # Import all functions

In [16]:
print(exp(x))
print(log(x)) # Before importing math, calling `exp` or `log` will raise errors

20.0855369232
1.09861228867


#### External Libraries

There are times you'll want some advanced utility functions not provided by Python. There are many useful packages by developers.

We'll use __numpy__ as an example. (__numpy__, __scipy__, __matplotlib__,and probably __pandas__ will be of the most importance to you for data analyses.

Installation of packages for Python is the easiest using <a href="https://packaging.python.org/installing/" target="_blank">pip</a>:

```bash
~$ pip install numpy
```

Loading external libraries is just the same as built-in ones.

In [17]:
# After you install numpy, load it
import numpy as np # you can use np instead of numpy to call the functions in numpy package

In [18]:
x = np.array([[1,2,3], [4,5,6]], dtype=np.float) # create a numpy array object, specify the data type as float
print(x)
print(type(x))

[[ 1.  2.  3.]
 [ 4.  5.  6.]]
<type 'numpy.ndarray'>


__Scipy/Numpy__ provides extensive utilities to manipulate data and simple analysis

In [19]:
from scipy.stats import pearsonr, spearmanr # correlation functions

In [20]:
print(pearsonr(x[1, :], x[0, :]))
print(spearmanr(x[1, :], x[0, :]))

(1.0, 0.0)
SpearmanrResult(correlation=1.0, pvalue=0.0)


__Pandas__ (Python Data Analysis Library) is a great package for data structures: `DataFrame`

In [21]:
import pandas as pd

In [22]:
x_df = pd.DataFrame(x)
x_df

Unnamed: 0,0,1,2
0,1.0,2.0,3.0
1,4.0,5.0,6.0


Easy import/export

In [23]:
x_df.to_csv('tmp_pd.csv', index=False) # `index=False`: do not write row indices to file

In [24]:
df = pd.read_csv('tmp_pd.csv')

In [25]:
df

Unnamed: 0,0,1,2
0,1.0,2.0,3.0
1,4.0,5.0,6.0
