# Topic 5a: Data Visualization

## Why do we make visualizations?

- Can we really tell/discern patterns just by looking at DataFrames?
- https://www.autodeskresearch.com/publications/samestats

## How Data Scientists use visualizations

### 1. Exploratory Data Analysis (EDA)
- To EXPLORE the data that we have, to find out more about what we're working with
- We're trying to look for patterns and relationships

### 2. Explanatory Data Visualizations
- For stakeholders! 
- Convey our narratives/takeaways/insights

In [None]:
# what do we import?
import matplotlib.pyplot as plt
%matplotlib inline 
# ^^^ this is specifically for ipynb

import numpy as np
import pandas as pd

In [None]:
car_df = pd.read_csv('auto-mpg.csv')

In [None]:
car_df.head()

## Documentation:

- https://matplotlib.org/3.1.1/api/_as_gen/matplotlib.pyplot.html


### Most common visualizations:


    .plot()           Line plot
    .scatter()        Scatter plot
    .bar()	        Vertical bar graph
    .barh()	       Horizontal bar graph
    .axhline()	    Horizontal line across axes
    .vline()	      Vertical line across axes
    .stackplot()	  Stack plot


In [None]:
plt.scatter(car_df['displacement'], car_df['horsepower'], label='car')
# plt.title("Relationship between Displacement and Horsepower")
# plt.xlabel("Displacement")
# plt.ylabel("Horsepower")
# plt.legend()
# plt.show()

## Figure vs Axes vs Subplot
- Figure refers to the overall image space
- Axes define where data is plotted -- and is an array of plotting spaces
- A subplot is an axes

In [None]:
fig, axes = plt.subplots(1, 2, sharey=True)
fig.suptitle("Some Histograms")

axes[0].hist(car_df['weight'])
axes[0].set(xlabel="Weight", ylabel="Frequency")

axes[1].hist(car_df['model year'])
axes[1].set(xlabel="Year", ylabel="Frequency")

plt.show()

# Seaborn

In [None]:
import seaborn as sns

In [None]:
data = np.random.normal(size=(20, 10)) + np.arange(10) / 2

In [None]:
data.shape

In [None]:
boxplot = sns.boxplot(data=data)

In [None]:
boxplot.set(xlabel='xlabel', ylabel='ylabel', title='Boxplot')

# Plotly

In [None]:
import plotly.express as px

In [None]:
car_df

https://plotly.com/python-api-reference/generated/plotly.express.scatter

In [None]:
px.scatter(car_df, x='displacement', y='mpg', 
           title='Relationship between Displacement and MPG',
          hover_data= ['car name', 'model year'], 
          trendline='ols')