## Working with pandas series data object



The pandas dataframe is composed of smaller units, the so called
pandas series. You already used this data type implicitly in our last
module, but here we will explore it more fully. Lets create a
data-frame first. Since I am lazy, use a function from the numpy
library to generate an array of random distributed integer values
between 0 and 10, and yes, I simply googled "pandas generate list of
random integers"



In [1]:
from typing import TypeVar
import pandas as pd
import numpy as np

pdf = TypeVar('pandas.core.frame.DataFrame')
pds = TypeVar('pandas.core.series.Series')

df :pdf = pd.DataFrame(np.random.randint(0,10,size=(4, 4)),
                       columns=list('ABCD'))
print(df)

Now lets extract the data from column A, and execute the following



In [1]:
A = df['A']
print(A * 2)

You probably remember that this was not possible with a list because
lists can contain numbers, letters, other lists, tuples etc. A pandas
series on the other hand, must only have one data-type per column. I
other words, a column can contain either letter, or numbers, but you
can't mix both within the same column. Since A consists of only of
numbers, python can directly multiply each element with 2. What
happens if you multiply A\*B is the multiplication element by element?
What happens if you write `A**B`? Hurray! no more loops! (kinda)

In a way python treats a pandas series object like a vector, not
unlike matlab. The numpy library even provides a vector datatype which
behaves similar to matlab. There is some cool stuff we can do with
this. We can e.g., apply a comparison operator on a pandas series



In [1]:
print(A>2)

Now why would this be useful? Remember that `False` equals zero,
whereas `True` euqals 1. So if we want to count the number of values
in A which are larger then 2, we can simply write



In [1]:
n :int = sum(A>2)
print (n)
# or even shorter
print(sum(A>2))

## Assignment



In the following exercises, we will practice some of the
above. However, some tasks have not been explained. I recommend to
refer to
[https://pandas.pydata.org/pandas-docs/stable/user\_guide/indexing.html](https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html)
or
[https://pandas.pydata.org/pandas-docs/stable/getting\_started/basics.html#basics](https://pandas.pydata.org/pandas-docs/stable/getting_started/basics.html#basics)
or [https://www.tutorialspoint.com/python\_pandas/python\_pandas\_dataframe.htm](https://www.tutorialspoint.com/python_pandas/python_pandas_dataframe.htm)

Notes: 

-   Create a notebook ino your submissions folder with this name:
    "pandas-1-FirstName-LastName". In order to submit your
    assignment, you need to download it and submit it on Quercus
    (ipynb and pdf format).  Please have the usual header with date, name etc.

-   All questions should be solved with pandas methods. You may have
    to look up syntax or options either through the help system, or
    by using the above links.

-   Marking Scheme (per question):
    -   All variables declared and type hinting used throughout 1pt. No partial marks
    -   Code produces correct output 2pt. 1pt if code is sort of correct
    -   Proper use of comments 1pt. There is no need for doc-strings
        though
    -   Code is self contained 1pt.
    -   Max points per question: 5pts for a total of 14\*5 = 70 pts



### Exercises:



For each answer, please write self contained code, that is, it imports
all libraries, declares all variables, imports all data etc., rather
then writing code which relies on data imported in a previous
cell. Cut copy paste is your friend here.

1.  Using the above methods to find a way to set all values in A which
    are smaller than say 4 to 0. Write the result into a new variable
    `X`, rather then replacing the values in A. Note, do not use
    builtin methods like replace for this operation, nor use a
    loop. Rather use a comparison operator.



In [1]:
n = A < 6
X = A * n
print(X.mean)

2.  Write a short function and code snippet which calculates the
    mean of a pandas series containing numbers. Do not use the builtin
    functions or the pandas method for mean. Rather use your own code
    to compute the mean as
    
    \begin{equation}
    \mu = \frac{\sum\limits_{i=0}^{i=N} X_i}{N}
    \end{equation}
    
    where N is equal to the number of elements in the series, and
    `X_i` denotes the individual elements (i.e. `X[i]`). Use this
    template, and add the missing code statements to make this a self
    contained example.



In [1]:
pds = TypeVar('pandas.core.series.Series')

def my_mean(X :pds)->float:
    """
    Arguments: S, a pandas series
    """
    mu :float = 0 # this is the mean value of the pandas series S

    
    return mu

print(A)
print(f"The mean value of A using my_mean = {my_mean(A)}")
print(f"The mean value of A using A.mean() = {A.mean()}")

3.  As before, but now we will compute the population standard deviation which is defined as
    
    \begin{equation}
    \sigma = \sqrt{\frac{\sum\limits_{i=0}^{i=N} (X_i - \mu)^2}{N}}
    \end{equation}
    
    where $\mu$ is the mean value. Note, since your code regenerates the
    dataframe from random numbers each time, you need to compute $\mu$
    each time. Thus, this code also needs to define `my_mean()` (you can
    obviously cut/copy/paste).  Again, your code should be self
    contained, and compare your result against the builtin pandas series
    method `.std`. Again, do not use loops.
    
    1.  Write a program wich imports the isotope data from the last
        lecture as dataframe. Extract the delta values as pandas
        series. Next, use the following two equations to first split the
        delta values into ^{32}S and ^{34}S and append the results to the
        dataframe. Next compute the delta values from ^{32}S and ^{34},
        and append the results to the dataframe as `delta-new`. Compute
        the difference (it should be very small or zero). Export the data
        frame to a csv file.
        
        \begin{equation}
        ^{32}S = \frac{1000}{(\delta +1000) \times R + 1000}
        \end{equation}
        
        \begin{equation}
        ^{34}S = \frac{(\delta + 1000) \times R}{(\delta + 1000) \times R + 1000}    	
        \end{equation}

\begin{equation}
\delta^{34}S = \left(
  \frac{
    \left(\frac{^{34}S}{^{32}S}\right) _{Sample}}
  {
    \left(\frac{^{34}S}{^{32}S}\right) _{VCDT}}
  -1
  \right) \times 1000 \quad [^0/_{00}]
  \end{equation}

