# Data Visualization

_September 18, 2020_

By the end of the lecture you will be able to:

- explain why data visualization matters
- plot with matplotlib a Single plot
- plot with matplotlib a multi plot
- plot with Seaborn


## Why is Data Visualization Important?

In [None]:
import seaborn as sns
import pandas as pd
import matplotlib.pyplot as plt

<h2> Men lie, women lie, numbers don't - Jay - Z</h2>
<h3> But sometimes they do </h3> 

In [None]:
# Load the example dataset for Anscombe's quartet
df = sns.load_dataset("anscombe")
df

In [None]:
# use groupby, get the mean and variance of each data set
df.groupby(['dataset']).agg(['mean','var'])

In [None]:
# we can also examine the correlation between the dataset
df.groupby(['dataset']).corr()

In [None]:
sns.set(style="ticks")

# Show the results of a linear regression within each dataset
sns.lmplot(x="x", y="y", col="dataset", hue="dataset", data=df,
           col_wrap=2, ci=None, palette="muted", height=4,
           scatter_kws={"s": 50, "alpha": 1})

## Matplotlib

In [None]:
import matplotlib.pyplot as plt
import numpy as np
%matplotlib inline

Matplotlib is a complex library (70,000 + lines of code!). In order to best understand how it creates graphs it's important to understand how objects are structured in matplotlib. 

### Two Ways to Generate a Single Plot

**Method 1**<br>
Plot your Xs vs your Ys

In [None]:
X = [1,2]
Y = [3,4]
plt.scatter(X,Y);
plt.title('This is an example title')

**Method 2**<br>
Using subplots to set up your figure and axes

<img src = 'figure-axes.png' width = 400;>

In [None]:
fig, ax = plt.subplots()
ax.plot(X, Y);

In [None]:
ax.set_title('This is an example title');
ax.set_xlabel('x label')
ax.set_xticks([1,2])
ax.set_xticklabels(['one','two'])
fig
# examine the ax object

In [None]:
# examine the figure


#### How do we add a title to the above plot? 

In [None]:
ax.set_title('Line')
ax.set_xlabel('X Axis')
ax.set_ylabel('Y Axis');
ax.set_xticks([1,2]);

In [None]:
# examine the figure
fig

### Create a Figure with 2 Axes(Plots) 

In stages

In [None]:
fig, ax = plt.subplots(1,2)

In [None]:
ax[0].bar([1,2], [3,4],color = 'pink', alpha = 0.5)
ax[1].scatter([1,2], [3,4], color = 'blue', alpha = 0.5)

In [None]:
# help function on barplot 

In [None]:
# check out the figure

### Another example

In [None]:
#generate some data
x = np.linspace(0, 10, 100)
y = np.sin(x)

#### create a figure with 2 x 2 plots

In [None]:
figure_sin, axes = plt.subplots(2,2, figsize = (10,10))

In [None]:
axes[0][0].scatter(x,y)
axes[0][1].plot(x,y)
axes[1][0].hist(y)
axes[1][1].fill(x,y);

In [None]:
figure_sin

#### How can we create a title for the scatter plot? 

In [None]:
axes[0][0].set_title('Scatter')
axes[0][0].set_xlabel('Independent variable name')
axes[0][0].set_ylabel('Dependent variable name');
axes[0][1].set_title('line plot');

In [None]:
figure_sin

### Your turn

1 - Create a 2x2 figure with matplotlib<br>
2 - Use 4 different types of plots for the following dataset

In [None]:
aq = pd.read_csv('aq.csv')

In [None]:
aq

In [None]:
aq1 = aq[['x123','y1']]
aq1

In [None]:
fig , axes = plt.subplots(2,2, figsize = (10,10))

In [None]:
# Take the opportunity to read the documentation of these 4 methods
x = None
y = None
axes[0][0].scatter(x,y)
#axes[_][_].plot(_,_)
#axes[_][_].hist(_)
#axes[_][_].bar(_,_)

In [None]:
# Add titles to each of the axes objects
axes[0][0].set_title('Scatter plot')
#axes[_][_]._('Line plot')
#axes[_][_]._('Histogram')
#axes[_][_]._('Bar chart')

In [None]:
# And finally label your axes
axes[_][_]._('Independent variable name')
axes[_][_]._('Independent variable name')
axes[_][_]._('Independent variable name')
axes[_][_]._('Independent variable name')

axes[_][_]._('Dependent variable name')
axes[_][_]._('Dependent variable name')
axes[_][_]._('Dependent variable name')
axes[_][_]._('Dependent variable name')

In [None]:
# Now show the plot
fig

## Data Analysis Example & Using Seaborn

Seaborn is built on top of Matplotlib. Seaborn adds some styling to existing matplotlib graphs as well as adds some additional graphing capabilities. 

In [None]:
import seaborn as sns
sns.set()

In [None]:
plt.style.use('fivethirtyeight')

In [None]:
plt.plot([1,2], [3,4])

The most useful aspect of seaborn is that it allows you to use Pandas DataFrame Objects directly. 

In [None]:
#loads tips dataset
tips = sns.load_dataset("tips")

In [None]:
tips.head()

In [None]:
# visualize the relationship between bill and tip
sns.scatterplot('total_bill', 'tip', data=tips);

In [None]:
tips.sex

In [None]:
# calculate average tips amount by gender
y = None
x = None

In [None]:
# check to see if x and y are correct

In [None]:
# visualize the average tips men vs women pay - what's the appropriate plot?
#x = None
#average_tips_amount = None
fig, ax = plt.subplots()
ax.bar(x,y)
ax.set_title('average tip amount for male and female')
ax.set_xticks([0,1])
ax.set_xticklabels(['male','female'])

In [None]:
# visualize the average amount of total bill for time (lunch or dinner)
y = None
x = None
fig, ax = plt.subplots()
ax.bar(x,y, color = 'pink')
ax.set_title('average tip amount for time')
ax.set_xticks([0,1])
ax.set_xticklabels(['lunch','dinner'])

## Resources

- https://www.ted.com/talks/david_mccandless_the_beauty_of_data_visualization#t-503926
- https://realpython.com/python-matplotlib-guide/ 


- https://pudding.cool/
- http://setosa.io/#/