<center> <img src="res/ds3000.png"> </center>

<center> <h1> Week 7 - Day 1 </h1> </center>

<center> <h2> Part 1: Data Visualization </h2></center>

## Outline
1. <a href='#1'>Data Visualization</a>
2. <a href='#2'>Creating A Simple Bar Graph</a>
3. <a href='#3'>Adding a Title</a>
4. <a href='#4'>Adding Labels to Axes</a>
5. <a href='#5'>Specifying the Color of the Bars</a>
6. <a href='#6'>Specifying the Color of the Edges</a>
7. <a href='#7'>Specifying the Width of the Edges</a>
8. <a href='#8'>Specifiying the Color Palette</a>
9. <a href='#9'>Creating A Bar Graph from a DataFrame</a>
10. <a href='#10'>Saving a Seaborn Visualization</a>

<a id="1"></a>

## 1. Data Visualization
* Visualizations help you “get to know” your data. 
* Give you a powerful way to understand data that goes beyond simply looking at raw data.
* Make it easier to identify patterns in data
* Usually an integral step of Exploratory Data Analysis (EDA)

### 1.1. Matplotlib
* Common data visualization library for Python
* Built on NumPy arrays 
* Used to produce publication quality visualizations
* https://matplotlib.org/

### Installing the Matplotlib Jupyter Extension 
> pip install ipympl

### Setting up Matplotlib
* Need to set up matplotlib in Jupyter Notebook, before we can import the library package
* Two setups:
    * **%matplotlib**: visualizations are displayed in a new window (gtk backend)
    * **%matplotlib notebook**: visualizations are displayed inline as Notebook cells (interactive backend)

In [None]:
import matplotlib.pyplot as plt

In [None]:
%matplotlib notebook

### 1.2. Seaborn
* A statistical graphics library
    * https://seaborn.pydata.org/index.html
* Built over Matplotlib
* Simplifies many of Matplotlib's complex procedures for producing common visualization types
* Prettifies Matplotlib visualizations

In [None]:
import seaborn as sns

<a id="2"></a>

## 2. Creating A Simple Bar Graph
* Commonly used to show comparisons among groups or categories on a numeric value
    * x-axis shows the categories being compared
    * y-axis shows a numeric value for these categories



* Use the **barplot()** method in the Seaborn library

https://www.youtube.com/watch?v=7iGTlfmjtQU

In [None]:
#Hogwarts House Points 1991-1992
house_names = ["Gryffindor", "Hufflepuff", "Ravenclaw", "Slytherin"]
house_points = [482, 352, 426, 472]

In [None]:
import seaborn as sns

# create and display the bar plot
graph = sns.barplot(x=house_names, y=house_points)

> ```python
graph = sns.barplot(x=house_names, y=house_points)
```

* **x** corresponds to the labels to display on the x-axis
* **y** corresponds to values to display on the y-axis
* **graph** becomes a Seaborn object (Matplotlib axes object, more precisely) with the visualization you requested

In [None]:
import seaborn as sns
# create and display the bar plot
graph = sns.barplot(x=["Gryffindor", "Hufflepuff", "Ravenclaw", "Slytherin"], y=[482, 352, 426, 472])

In [None]:
type(graph)

<a id="3"></a>

## 3. Adding a Title
* Use the **set_title()** method of the Seaborn object

In [None]:
import seaborn as sns

# create and display the bar plot
graph = sns.barplot(x=house_names, y=house_points)

#specify the title
title = "House Points"

#set the title of the plot
graph.set_title(title)

<a id="4"></a>

## 4. Adding Labels to Axes
* Use **set_xlabel("labelname")** and **set_ylabel("labelname)**
    * Alternatively, use **set()** method and specify **xlabel** and **ylabel** keyword arguments

In [None]:
import seaborn as sns

# create and display the bar plot
graph = sns.barplot(x=house_names, y=house_points)

#specify the title
title = "House Points"

#set the title of the plot
graph.set_title(title, size = 16)

#add labels to the axes  
graph.set_xlabel("House", size = 16)
graph.set_ylabel("Points", size = 16)

<a id="5"></a>

## 5. Specifying the Color of the Bars
* Specify the "color" keyword argument in sns.barplot() method call
* Accepts Matplotlib color letters:
    * b: blue
    * g: green
    * r: red
    * c: cyan
    * m: magenta
    * y: yellow
    * k: black
    * w: white

In [None]:
import seaborn as sns

# create and display the bar plot
graph = sns.barplot(x=house_names, y=house_points, color = "r")

#specify the title
title = "House Points"

#set the title of the plot
graph.set_title(title, size = 16)

#add labels to the axes  
graph.set_xlabel("House", size = 16)
graph.set_ylabel("Points", size = 16)

### 5.1. Passing in Hex color codes
* Can use the hex or rgb code of the color as well:
    * HEX code for Northeastern's red color is #d4202f
    * Its RGB values are (212, 32, 47)

In [None]:
import seaborn as sns

# create and display the bar plot
graph = sns.barplot(x=house_names, y=house_points, color = "#d4202f")

#specify the title
title = "House Points"

#set the title of the plot
graph.set_title(title, size = 16)

#add labels to the axes  
graph.set_xlabel("House", size = 16)
graph.set_ylabel("Points", size = 16)

### 5.2. Passing in an RGB color
* RGB values are passed in as a tuple
* Seaborn and matplotlib uses float values for RGB color codes
* Simply divide your RGB color values (int) by 255 to get the float values for the same color

In [None]:
import seaborn as sns

# create and display the bar plot
graph = sns.barplot(x=house_names, y=house_points, color = (212/255, 32/255, 47/255))

#specify the title
title = "House Points"

#set the title of the plot
graph.set_title(title, size = 16)

#add labels to the axes  
graph.set_xlabel("House", size = 16)
graph.set_ylabel("Points", size = 16)

<a id="6"></a>

## 6. Specifying the Color of the Edges
* Use the **edgecolor** keyword argument in **sns.barplot()** method call

In [None]:
import seaborn as sns

sns.set_context(rc = {'patch.linewidth': 1})
# create and display the bar plot
graph = sns.barplot(x=house_names, y=house_points, color = "#d4202f", edgecolor = "k")

#specify the title
title = "House Points"

#set the title of the plot
graph.set_title(title, size = 16)

#add labels to the axes  
graph.set_xlabel("House", size = 16)
graph.set_ylabel("Points", size = 16)


<a id="7"></a>

## 7. Specifying the Width of the Edges
* Use **set_context()** method with the following argument:
    * rc = {'patch.linewidth': 1}

In [None]:
import seaborn as sns

#set the width of the edges of the bar
#set the width before producing the graph
sns.set_context(rc = {'patch.linewidth': 2})

# create and display the bar plot
graph = sns.barplot(x=house_names, y=house_points, color = "#d4202f", edgecolor = "k")


#specify the title
title = "House Points"

#set the title of the plot
graph.set_title(title, size = 16)

#add labels to the axes  
graph.set_xlabel("House", size = 16)
graph.set_ylabel("Points", size = 16)


<a id="8"></a>

## 8. Specifiying the Color Palette
* Use **sns.set_palette(palette_name)**
* Or specify the palette keyword argument in **sns.barplot()** method call
* Built-in palettes:
    * pastel
    * muted
    * bright
    * deep
    * dark
    * colorblind
* https://seaborn.pydata.org/tutorial/color_palettes.html

#### Built-in Color Palettes
<img src = "res/color_palettes.png" />


In [None]:
#displays current palette colors
sns.palplot(sns.color_palette())

In [None]:
sns.set_palette("colorblind")

In [None]:
#displays current palette colors
sns.palplot(sns.color_palette())

In [None]:
import seaborn as sns

# create and display the bar plot
graph = sns.barplot(x=house_names, y=house_points, palette="colorblind")

#specify the title
title = "House Points"

#set the title of the plot
graph.set_title(title, size = 16)

#add labels to the axes  
graph.set_xlabel("House", size = 16)
graph.set_ylabel("Points", size = 16)

### 8.1. Specific Palette Colors
* Can pass in a list of specific palette colors to use for the bars
* List of color names available at https://xkcd.com/color/rgb/

In [None]:
import seaborn as sns

# create and display the bar plot
graph = sns.barplot(x=house_names, y=house_points, palette=["darkred", "gold", "darkblue", "darkgreen"])

#specify the title
title = "House Points"

#set the title of the plot
graph.set_title(title, size = 16)

#add labels to the axes  
graph.set_xlabel("House", size = 16)
graph.set_ylabel("Points", size = 16)

* Can use a list of hex color codes too

In [None]:
hogwarts_colors = ["#8b0000", "#FFD700", "#00316e", "#013220"]

sns.set_palette(sns.color_palette(hogwarts_colors))

In [None]:
import seaborn as sns

# create and display the bar plot
graph = sns.barplot(x=house_names, y=house_points, palette= hogwarts_colors)

#specify the title
title = "House Points"

#set the title of the plot
graph.set_title(title, size = 16)

#add labels to the axes  
graph.set_xlabel("House", size = 16)
graph.set_ylabel("Points", size = 16)

<a id="9"></a>

## 9. Creating A Bar Graph from a DataFrame
* In the .barplot() method cal, specify the name of the dataframe in the **data** keyword argument 
    * x= the DataFrame column name for x-axis labels
    * y= the DataFrame column name for y-axis values

In [None]:
import pandas as pd

In [None]:
df = pd.read_csv("res/ave_grades.csv")

In [None]:
df_grades = df.groupby("House").mean()

In [None]:
df_grades = df_grades.reset_index()

In [None]:
df_grades

In [None]:
import seaborn as sns

# create and display the bar plot
graph = sns.barplot(x="House", y="Potion_Ave", data = df_grades, palette= hogwarts_colors)

#specify the title
title = "Potion Grades"

#set the title of the plot
graph.set_title(title, size = 16)

#add labels to the axes  
graph.set_xlabel("House", size = 16)
graph.set_ylabel("Potion_Ave", size = 16)

<a id="10"></a>

## 10. Saving a Seaborn Visualization

In [None]:
graph.figure.savefig("house_points.png")