# Lab 4. Plotting

## Data visualization is as much a part of the data processing step as the data presentation step. It is much easier to compare values when they are plotted than numeric values. By visualizing data we are able to get a better intuitive sense of the data than would be possible by looking at tables of values alone. Additionally, visualizations can bring to light hidden patterns in data, that you, the analyst, can exploit for model selection.

## This session will cover:
1. matplotlib
2. seaborn
3. Plotting in Pandas

# 0. Let's load and install some libraries

In [None]:
import pandas as pd

In [None]:
import matplotlib.pyplot as plt


In [None]:
import seaborn as sns

# 1. Let's load some data

In [None]:
# tips is a dataset included in the seaborn library
tips = sns.load_dataset("tips")

In [None]:
tips.head()

# 2. Let's do some plotting (univariate, bivariate)

## 2.1. Histograms

In [None]:
tips['total_bill'].describe()

In [None]:
ax = sns.distplot(tips['total_bill'])
ax.set_title('Total Bill Histogram with Density Plot');

## The histogram reveals that 20 dollars is the usual total_bill (mean 19.78)

## 2.2. Bar Plots

In [None]:
ax = sns.countplot('day', data=tips)
ax.set_title('Count of days');
ax.set_xlabel('Day of the Week');
ax.set_ylabel('Frequency');

## we observe from the previous figure that people tend to dine outside mostly on weekends

## 2.3. ScatterPlot

In [None]:
x = sns.regplot(x='total_bill', y='tip', data=tips)
ax.set_title('Scatterplot of Total Bill and Tip');
ax.set_xlabel('Total Bill');
ax.set_ylabel('Tip');

## we observe some relationship between the total amount paid and the tip received, but also increasing variability

## 2.4. Density Plot

In [None]:
ax = sns.kdeplot(data=tips['total_bill'],data2=tips['tip'],shade=True) # shade will fill in the contours
ax.set_title('Kernel Density Plot of Total Bill and Tip');
ax.set_xlabel('Total Bill');

## 2.5. Box Plot

In [None]:
ax = sns.boxplot(x='time', y='total_bill', data=tips)
ax.set_title('Boxplot of total bill by time of day')
ax.set_xlabel('Time of day')
ax.set_ylabel('Total Bill')

## dinners seem to be more profitable

## 2.6. Violin Plot

In [None]:
ax = sns.violinplot(x='time', y='total_bill', data=tips)
ax.set_title('Violin plot of total bill by time of day');
ax.set_xlabel('Time of day');
ax.set_ylabel('Total Bill');

## 2.6. Pair Plot

In [None]:
fig = sns.pairplot(tips)

## 2.7. Pair Plot (fancy)

In [None]:
pair_grid = sns.PairGrid(tips)
pair_grid = pair_grid.map_upper(sns.regplot)
pair_grid = pair_grid.map_lower(sns.kdeplot)
pair_grid = pair_grid.map_diag(sns.distplot, rug=False)
plt.show()

# 3. Multivariate plotting

## multivariate plotting is hard and there is no best practice on how to do it, it really depends on the purpose of the visualization

## 3.1. Violin Plots

In [None]:
ax = sns.violinplot(x='time', y='total_bill',hue='sex', data=tips,split=True)

In [None]:
ax = sns.violinplot(x='time', y='total_bill',hue='smoker', data=tips,split=True)

## 3.2. Pair Plots

In [None]:
fig = sns.pairplot(tips, hue='sex')

## 3.3. Facet Plots

In [None]:
facet = sns.FacetGrid(tips, col='day', hue='sex')
facet = facet.map(plt.scatter, 'total_bill', 'tip')
facet = facet.add_legend()


In [None]:
facet = sns.FacetGrid(tips, col='day', hue='smoker')
facet = facet.map(plt.scatter, 'total_bill', 'tip')
facet = facet.add_legend()

In [None]:
facet = sns.FacetGrid(tips, col='size', hue='time')
facet = facet.map(plt.scatter, 'total_bill', 'tip')
facet = facet.add_legend()

# 4. Challenge yourself ! 

In [None]:
### Let's load the gapminder dataset
gapminder = pd.read_csv('https://raw.githubusercontent.com/thousandoaks/BEMM458/master/data/gapminder.tsv', sep='\t')

In [None]:
gapminder.head(50)

## 4.1. Is there any relationship between GDP Per Capita and life Expectancy ?

### Tip: consider a regplot visualization

## 4.2. Is there any relationship between GDP Per Capita and life Expectancy changed with time ?

### Tip: consider a FacetGrid visualization (year, gdpPercap, lifeExp)