## Working with libraries



A python library is simply a file which contains function
definitions. The key, here, is that this file contains only function
definitions, and no other program code. The file must also be written
in plain python, it can't be in notebook format.

So why would we want that?

1.  Most of the time, we only want to use the functions in the file,
    rather then changing them. Library development is best done with
    a full fledged python IDE (Integrated Development Environment).
2.  Moving often used functions into a library clutters your code.
3.  Python has myriads of libraries which greatly extend the
    functionality of the language.

Why you would not want to use a library:

1.  Unless it's your own library, you depend on someone else. Imagine
    this great library you found, but it has this nasty bug, and the
    guy who wrote this library is no longer responding to
    e-mail&#x2026;. it might be just easier to implement your function
    yourself&#x2026;
2.  Libraries can introduce considerable complexities, which may be
    overkill in your situation&#x2026;.



### Using a library in your code



Consider the following simple library [mylib.py](mylib.py). It only provides two
functions. In order to access these functions, we first need to import
the library, and then we can inspect its contents.



In [1]:
import mylib
dir(mylib)

however, this wont allow you to use the function names as stated in the library files.



In [1]:
import mylib
print(square(5))

rather, you have to write



In [1]:
import mylib
print(mylib.square(5))

why is that? Most python programs import a quite a few
libraries. However, the people writing those do not know what
functions names have been used by other library developers. So in
order to avoid naming conflicts, python automatically adds the library
name in front of each function of the library. While this clever
mechanism avoid naming conflicts, it also adds a lot of
tpying&#x2026;. There are two ways around it:

1.  Often you will only need on or two functions from a library (also
    called module). In this case you can import each function explicitly.



In [1]:
from mylib import hello_world
     hello_world()

1.  Or, we can create our own library alias



In [1]:
import mylib as ml
    ml.hello_world()

### Using pandas to read data from an excel file



Pandas is one of the most used python libraries, and provides powerful
data analysis tools. It also provides for an easy way to read data
from files which contains comma separated values (CSV), or from excel
spreadsheets. Here we will use isotope data from a recent paper of one
of my graduate students.

In the following code snippet, we import two libraries using the above
syntax.  The first line imports a module from the `typing`
library. This library support type hinting beyond the standard
variable types like integer and float. You may notice that we have
used it before. The `TypeVar` module allows us to create or own
variable types, and we use it in line 5 to create a type hint for the
`DataFrame` variable type. For the context of this course, there is no
need to understand this in depth, but you should know what this is
for.

Line two, imports the entire pandas library with the alias `pd`. So
all of the functions provided by pandas are available as
`pd.functionname()`. We will explore how to use some of the pandas
provided functions below.

In order to read the excel file, we need to know it's name, and we
need to know the name of the data sheet we want to read. If the read
operation succeeds, the read data will be stored as a pandas dataframe



In [1]:
from typing import TypeVar # this is used to declare a new type hint
import pandas as pd # inport pandas as pd

# declare a dataframe type for type hinting
pdf = TypeVar('pandas.core.frame.DataFrame')

# define the file and sheetname we want to read. Note that the file
# has to be present in the local working directory! 
fn :str = "Yao_2018.xlsx" # file name
sn :str = "outside_peak"  # sheet name

# read the excel sheet using pandas read_excel function and add it
os_peak :pdf = pd.read_excel(fn, sheet_name=sn) # to pandas dataframe

### Working with the pandas dataframe object



In most cases, your datasets will contain many lines. So the pdf
`head()` and `tail()` methods will only show the first (or last) few
lines of your dataset. Give this a try:



In [1]:
os_peak.head()

If you are really on the ball, you may have noticed that the first
column is not present in the actual excel file (you did check that the
`data.read()` actually read the correct file and data, did you?)

Those numbers are called the index. Think of them as line numbers. All
pandas objects show them, but they are ignored when you do
computations with the data.



#### Selecting specific rows



In order to select a specific row from a pandas dataframe, we can use
the `iloc()` method (aka inter location). In other words, if you want
to select the 4th row, you can write



In [1]:
os_peak.iloc[3]  # get the third row

and you can use the normal slicing operators to get more than one row



In [1]:
os_peak.iloc[3:5]  # get the third row

#### Selecting specific columns by index



The `iloc()` method can also be used to select a specific row. In this
case we have to give the row and colum index we want see. You remember
the slicing syntax (if not, review the slicing module). So in order to
get all data from the second column you can write



In [1]:
os_peak.iloc[:,2]

or, if you only want to see the first two rows of the second column:



In [1]:
os_peak.iloc[0:2,2]

#### Selecting rows by Label



Pandas also supports the selection by label, rather then index. This
is done with `.loc()` method.  However, this one requires some
thinking. We do in fact not mix `iloc()` and `loc()` syntax here,
rather, this commands treats the index as a label. So if your first
index number would start at 100, this code would yield no result,
since there is no label called "2". As a side note, the index does not
even have to be numeric, it could very well be a date-time value, or
even a letter code.



In [1]:
os_peak.loc[2:4,'d34S'] # extract the d34S data between index 2 and 4
#print(a)

#### Getting statistical coefficients



Pandas supports a large number of statistical methods, and the
`describe()` method will give you a quick overview of your data.



In [1]:
os_peak.describe()

#### What else can you do?



the short answer is, lots. The dataframe can act as database, you can
add/remove, values/columns/rows, etc. etc. If these cases arise,
please have a look at the excellent online documentation and
tutorials.



##### Assignments



### Marking Scheme



-   functions do work as requested 3 \* 2pts
-   forward and backward conversion results in the same numbers 12pts
-   proper docstrings for each function 6pts

Notes: There is no need to use the coding template for this
assignment. As usual, create a notebook in your submissions folder
named `FirstName_LastName_funcctions.ipynb`, and submit the pdf and
notebook on quercus

