# Python session - 3.1

## Functions and modules

`Pandas` cheat sheet: https://github.com/pandas-dev/pandas/blob/master/doc/cheatsheet/Pandas_Cheat_Sheet.pdf

Software Carpentry reference files: http://tobyhodges.github.io/python-novice-gapminder/

## Functions

Functions are reusable blocks of code that you can name and execute any number of times from different parts of your script(s). This reuse is known as "calling" the function. Functions are important building blocks of a software.

There are several built-in functions of Python, which can be called anywhere (and any number of times) in your current program. You have been using built-in functions already, for example, `len()`, `range()`, `sorted()`, `max()`, `min()`, `sum()` etc.

#### Structure of writing a function:

- `def` (keyword) + function name (you choose) + `()`.
- newline with 4 spaces or a tab +  block of code # Note: Codes at the 0 position are always read
- Call your function using its name

In [None]:
## Non parametric function
# Define a function that prints a sum of number1 and number2 defined inside the function

get_sum()

In [None]:
# Parametric function
# Define a function that prints a sum of number1 and number2 provided by the user
# Hint: get_sum_param(number1, number2)


In [None]:
# Returning values
# Define a function that 'returns' a sum of number1 and number2 provided by the user
# Hint: print(get_sum_param(number1, number2))


In [None]:
# Local Vs. global variable

# Define a function that returns a sum of number1 and number2 to a variable
# and print it after calling the function
# Hint: returned_value = get_sum_param(number1, number2)


### Exercises: write old code into a function

In [None]:
# Optional exercise
# Let’s take one of our older code blocks and write it in a function


### Libraries and Modules

One of the great things about Python is the free availability of a _huge_ number of libraries (also called package) that can be imported into your code and (re)used. 

Modules contain functions for use by other programs and are developed with the aim of solving some particular problem or providing particular, often domain-specific, capabilities. A library is a collection of modules, but the terms are often used interchangeably, especially since many libraries only consist of a single module (so don’t worry if you mix them). 

In order to import a library, it must available on your system or should be installed.  

A large number of libraries are already available for import in the standard distribution of Python: this is known as the standard library. If you installed the Anaconda distribution of Python, you have even more libraries already installed - mostly aimed at data science.

Importing a library is easy:

- Import (keyword) + library name, for example: 
    - `import os    # contains functions for interacting with the operating system`
    - `import sys   # contains utilities to process command line arguments`

More at: https://pypi.python.org/pypi

In [None]:
import os

# Get current directory

# Make new directory

help(os)                    # manual page created from the module's docstrings

In [None]:
import sys

# sys.argv

### Using loops to iterate through files in a directory

In [None]:
# define a function that lists all the files in the folder called data

import os

def read_each_filename(pathname):
    ...
    
pathname = 'data' # name of path with multiple files

In [None]:
# define a function that reads and prints each line of each file in the folder called data

import os

def read_each_line_of_each_file(pathname): # name of path with multiple files
    ...
pathname = 'data' # name of path with multiple files

# Hints:
# Options for opening files
# option-1: with open("{}/{}".format(pathname, filename)) as in_fh:
# option-2: with open('%s/%s' % (pathname, filename)) as in_fh:
# option-3: with open(pathname + '/' + filename) as in_fh:
# option-4: with open(os.path.join(pathname, filename)) as in_fh:

In [None]:
# Exercise: Go through each filename in the directory 'data'
# Print the names of the files that contains the keyword 'asia'

# Open each file containing the keyword 'Asia' and print all the entries

# Print entries containing gdp information on 'Japan', 'Korea', 'China' and 'Taiwan'


### Examples of importing basic modules.

#### Questions
- How can I read tabular data?

#### Objectives
- Import the Pandas library.
- Use Pandas to load a simple CSV data set.
- Get some basic information about a Pandas DataFrame.

In [None]:
import pandas

In [None]:
# Use Oceania data here

#### Aside: Namespaces
Python uses namespaces a lot, to ensure appropriate separation of functions, attributes, methods etc between modules and objects. When you import an entire module, the functions and classes available within that module are loaded in under the modules namespace - `pandas` in the example above.  
It is possible to customise the namespace at the point of import, allowing you to e.g. shorten/abbreviate the module name to save some typing:

In [None]:
# import pandas as pd

Also, as in the examples above, if you need only a single function from a module, you can import that directly into your main namespace (where you don't need to specify the module before the name of the function):

In [None]:
# from pandas import read_csv

#### Conventions
- You should perform all of your imports at the beginning of your program. This ensures that
  - users can easily identify the dependencies of a program, and 
  - that any lacking dependencies (causing fatal `ImportError` exceptions) are caught early in execution
- the shortening of `numpy` to `np` and `pandas` to `pd` are very common, and there are others too - watch out for this when e.g. reading docs and guides/SO answers online.

### Execises - Importing

Use this link to follow further exercises: http://tobyhodges.github.io/python-novice-gapminder/37-reading-tabular/

In [None]:
# Use index_col to specify that a column’s values should be used as row headings.


In [None]:
# Use DataFrame.info to find out more about a dataframe.


In [None]:
# The DataFrame.columns variable stores information about the dataframe’s columns.


In [None]:
# Use DataFrame.T to transpose a dataframe.


In [None]:
# Use DataFrame.describe to get summary statistics about data.


### Reading other data

Read the data in `gapminder_gdp_americas.csv` (which should be in the same directory as `gapminder_gdp_oceania.csv`) into a variable called `americas` and display its summary statistics.

### Inspecting Data

After reading the data for the Americas, use `help(americas.head)` and `help(americas.tail)` to find out what `DataFrame.head` and `DataFrame.tail` do.

What method call will display the first three rows of this data?
What method call will display the last three columns of this data? (Hint: you may need to change your view of the data.)

### Reading Files in Other Directories

The data for your current project is stored in a file called `microbes.csv`, which is located in a folder called `field_data`. You are doing analysis in a notebook called `analysis.ipynb` in a sibling folder called `thesis`:

```
your_home_directory
+-- field_data/
|   +-- microbes.csv
+-- thesis/
    +-- analysis.ipynb
```

What value(s) should you pass to `read_csv` to read `microbes.csv` in `analysis.ipynb`?

### Writing data

As well as the `read_csv` function for reading data from a file, Pandas provides a `to_csv` function to write dataframes to files. Applying what you’ve learned about reading from files, write one of your dataframes to a file called `processed.csv`. You can use help to get information on how to use `to_csv`.

#### Aside: Your Own Modules
Whenever you write some python code and save it as a script, with the `.py` file extension, you are creating your own module. If you define functions within that module, you can load them into other scripts and sessions.

### Some Interesting Module Libraries to Investigate
- os
- sys
- shutil
- random
- collections
- math
- argparse
- time
- datetime
- numpy
- scipy
- matplotlib
- pandas
- scikit-learn
- requests
- biopython
- openpyxl