# Pandas tutorial

This notebook uses examples from the official [Pandas: Getting started tutorials](https://pandas.pydata.org/docs/getting_started/intro_tutorials/index.html)

## Load the libraries that you plan to use

It's good practice in a notebook to always load the libraries at the top of the notebook that you plan to use throughout the notebook. This preps the reader to understand what to expect in terms of commands and outputs. It also ensures that all the required functions and commands are available in the code cells, since you've loaded them into the computer's memory as the first set of instructions in the notebook.

This notebook will use [`pandas`](https://pandas.pydata.org/) and [`matplotlib`](https://matplotlib.org/), which are two very popular libraries to analyze and visualize data respectively.

In [None]:
import pandas as pd
import matplotlib.pyplot as plt

## Load your data

You should store your data on the same server as your notebook whenever possible. Comma Separated Values (or `csv`) files are a very common format for storing Table-like information. The `read_csv` function in the `pandas` library, which the code above has shorted to `pd` when running `pandas` functions, can read in a `csv` file if you pass the function a string that contains the location of the file.

For example, the `pd.read_csv` function below will look in the `data` folder found in the same location as this notebook, for the file named `titanic.csv`. 

In [None]:
titanic = pd.read_csv("data/titanic.csv")

In [None]:
ages = titanic["Age"]
ages

In [None]:
age_sex = titanic[["Age", "Sex"]]
age_sex

### I’m interested in the passengers older than 35 years.

In [None]:
titanic[titanic["Age"] > 35]

### I’m interested in the Titanic passengers from cabin class 2 and 3.

In [None]:
class_23 = titanic[titanic["Pclass"].isin([2, 3])]
class_23

In [None]:
# Same as above, just written differently
class_23 = titanic[(titanic["Pclass"] == 2) | (titanic["Pclass"] == 3)]
class_23

### I want to work with passenger data for which the age is known.

In [None]:
age_no_na = titanic[titanic["Age"].notna()]
age_no_na

### I’m interested in the names of the passengers older than 35 years.

In [None]:
adult_names = titanic.loc[titanic["Age"] > 35, "Name"]
adult_names

### I’m interested in rows 10 till 25 and columns 3 to 5.

In [None]:
titanic.iloc[9:25, 2:5]

## Visualizing Data

Load the `csv` file named `air_quality_no2.csv` using the command below. Notice a few other options are included when calling the function.

`index_col=0` means `pandas` will use the first column (most programming languages start counting at 0) to be the index in each row. `parse_dates` means that if it finds any data that looks like a date, it will format as such to make it easier to view.

In [None]:
air_quality = pd.read_csv("data/air_quality_no2.csv", index_col=0, parse_dates=True)

In [None]:
air_quality.head()

In [None]:
air_quality.plot()
plt.show()

In [None]:
air_quality["station_paris"].plot()
plt.show()

In [None]:
air_quality.plot.scatter(x="station_london", y="station_paris", alpha=0.5)
plt.show()

In [None]:
air_quality.plot.box()
plt.show()

In [None]:
air_quality.plot.area(figsize=(12, 4), subplots=True)
plt.show()

In [None]:
fig, axs = plt.subplots(figsize=(12, 4))
air_quality.plot.area(ax=axs)
axs.set_ylabel("NO$_2$ concentration")
fig.savefig("no2_concentrations.png")
plt.show()