# 1 Pandas and Plotting

Pandas is a Python library with fairly advanced spreadsheet-like methods. This notebook explores some basic syntax.

In [None]:
import pandas as pd

Read data from a speadsheet-file into a `pandas` data frame (.csv or .xlsx). Each data frame works exactly like a "tab" in Excel or Google-sheets. In other words, a data frame is a two-dimensional row-column data structure. 

In [None]:
df = pd.read_csv('AU_COURSE_LIST.csv')
df

If you think about it, using individual "cells" or blocks of "cells" is rarely needed. Most often we need to manipulate a whole column: 

In [None]:
df['Subject']

But we can get individual "cells" using the '.loc': `df.loc[row,column]`. Here`column` is the column name and `row` is normally the row index, but it may be the row name - provided that exists.

In [None]:
df.loc[1,"Subject"]

Of course, we can get a row, too.

In [None]:
df.loc[1]

Much more interesting: Defining patterns and filtering out all matching rows.

In [None]:
pattern=(df['Subject']=='CRKT')
df[pattern]

### Task
Modify the box above to find
* all sword-classes: 'SWRD'.
* all classes offered in the disciplne 'Fencing'.

## Let's investigate enrollment in AU university

We can plot
* The raw data.
* Histograms of the raw data.
* The cumulative distribution function of the raw data.
* A bar graph showing how often each enrollment value appears: `value-counts`.

In [None]:
# Get the value count:
df['Tot Enrl'].value_counts()

In [None]:
# Sort it - not by the count, but by enrollment, its value.
counts_unsorted = df['Tot Enrl'].value_counts()
counts = counts_unsorted.sort_index()
counts

## Plotting the data

***If you use floating plots, close each plot before executing the next cell.*** 

In [None]:
# by default each bar has a label, let's suppress that in this plot 
df['Tot Enrl'].plot.bar(xticks=[], title='Raw enrollment unsorted list')

There is more than one way to skin a cat, and there are at least four good ways to draw histograms. The `df[].hist()` is nice because is works with the DataFrame, but still communicates with matplotlib rather directly.

In [None]:
#df['Tot Enrl'].plot.hist(title='default histogram')
df['Tot Enrl'].hist()

In [None]:
#df['Tot Enrl'].plot.hist(bins=51, title='51 bin histogram')
df['Tot Enrl'].hist(bins=51)

In [None]:
#df['Tot Enrl'].plot.hist(bins=51, cumulative=True, density=1, title='cummulative distribution function')
df['Tot Enrl'].hist(bins=51, cumulative=True, density=1)

In [None]:
counts.plot.bar(xticks=[10,20,30,40,50,60,70,80,90], title='bar graph of sorted value-counts')

### Task

For each of the five plots:
* Add axis labels to the *x* and *y* axis.
* Write a brief desciption: What specifically does this plot show?
* Does the plot have a clear message or at least important highlights?