# Worked Examples 03
In this exercise, you will practice creating data visualizations using Seaborn.

## Update seaborn library

Run the following cell to install or update the seaborn library if not already done. Afterwards, running it again after restarting the kernel will print out its version number if seaborn is correctly installed.

In [None]:
# Run but do not modify this code
try:
    import seaborn as sns
    print(sns.__version__, "# If you are running a version < 0.11, the next cell will not work! Make sure to update your seaborn library")
except Exception as e:
    !pip install seaborn

In [None]:
# Run but do not modify this code
import pandas as pd
import seaborn as sns

sns.set_theme()
sns.set_context('talk')

## Autograder Instructions: 
Run the following cell to install the ```Otter``` autograder package the first time it is run. Afterwards, running it again will print out its version number if ```Otter``` is correctly installed.

In [None]:
try:
    import otter
    grader = otter.Notebook("worked_example03.ipynb")
    print(otter.__version__)
    if (otter.__version__ != '6.1.6'): #update for latest otter version
        !pip install -q -U --user otter-grader
except Exception as e:
    !pip install -q otter-grader

To grade your own work, simply run the cell starting with grader.check immediately after each question. The cell calls Otter to run the tests for all subquestions and generate a report of what test(s) you pass/do not pass in the same style as on Gradescope.

### Question 1
In the "Why Visualize?" video, we showed the Anscombe's Quartet example. It contains four different sets of data, each of which have similar summary statistics, but which are very different visually. We import and preview the dataset below.

In [None]:
# Run but do not modify this code
quartet = sns.load_dataset('anscombe')
quartet.head()

The cell below groups the data by `dataset` and computes the mean of the `x` and `y` values for each. Notice that the mean is the same for the four datasets.

In [None]:
quartet.groupby("dataset").mean()

Answer the following questions:

1. With a single Seaborn plot function call, plot the scatterplot of `x` on the horizontal and `y` on the vertical with the `dataset`s distinguished by color (i.e., using the `hue` parameter).
1. With a single Seaborn plot function call, plot the scatterplot of `x` on the horizontal and `y` on the vertical, but this time have each of the four datasets as their own subplot (i.e., plot each as a separate subfigure using, for example, the `col` parameter).

In [None]:
# Put your code to answer the question here
# feel free to add more cells as needed
...

_Type your answer here, replacing this text._

In [None]:
grader.check("q1_manual")

<!-- END QUESTION -->

### Question 2
Below we import a dataset `cars` containing information about automobiles manufactured in the 1970s. We will use this for the next several questions.

In [None]:
# Run but do not modify this code
cars = sns.load_dataset('mpg')
cars.head()

The cell below groups the data by `origin` and computes the maximum `mpg` for each group.

In [None]:
cars.groupby("origin")["mpg"].max()

Answering the following questions:
1. Use a single **bar** plot to visualize the average mpg of the three different origins: `usa`, `japan`, and `europe`.
2. Use a single **box** plot to visualize the the distribution of mpg for each of the three origins of car (`usa`, `japan`, and `europe`).
3. Notice that the number of groups above is a length of 3 since there are only 3 origins in the dataset. Do these values match with the top horizontal line of the boxplot for each group? Why or why not? Put your answer in "Answer Q2" cell.

In [None]:
# Put your code to answer the question here
# feel free to add more cells as needed
...

### Answer Q2

_Type your answer here, replacing this text._

<!-- END QUESTION -->

### Question 3
1. Use a scatter plot to visualize the relationship between `horsepower` and `mpg`. Plot `horsepower` along the horizontal axis and `mpg` along the vertical axis. Color the points in the scatter plot according to the place of `origin`.
2. Use a lineplot to visualize the change in `mpg` with respect to `model_year` for each of the three origins (`usa`, `japan`, and `europe`). Plot three separate lines on the same visualization, one for each of the origins, colored or otherwise labeled to distinguish them.
3. Make a line plot just like in part 2, but show the change in `weight` with respect to `model_year` instead of `mpg`.

<!-- BEGIN QUESTION -->



In [None]:
# Put your code to answer the question here
# feel free to add more cells as needed
...

<!-- END QUESTION -->

### Question 4
1. Create a histogram showing the distribution of just `horsepower`, i.e., showing how many cars had different amounts of `horsepower` (there is no need to distinguish between origin).
2. With a single Seaborn plot function call, plot the histogram for `horsepower` for each possible value of the number of `cylinders` using the `col` parameter (this should result in a subfigure for each value in `cylinders`).
3. Plot a heat map with `horsepower` on the horizontal and `weight` on the vertical. Make sure to display a color bar to interpret the heatmap (e.g., by setting `cbar=True`).  

<!-- BEGIN QUESTION -->



In [None]:
# Put your code to answer the question here
# feel free to add more cells as needed
...

<!-- END QUESTION -->

### Question 5
Below we import the `covid-19.csv` dataset. In the same figure, for each of the `Province/State`s in the list `states = ["Washington", "North Carolina", "New York", "Ohio", "Hawaii"]`, plot the change in the `Confirmed` cases over time (i.e., `ObservationDate`). Distinguish the states with different colors (i.e., using the `hue` parameter).

Hint: To filter for only those states, use `.isin(list_of_states_you_want)` on the `Province/State` column.

In [None]:
# Run but do not modify this code
covid = pd.read_csv("covid-19.csv", parse_dates=["ObservationDate"])
states = ["Washington", "North Carolina", "New York", "Ohio", "Hawaii"]
sns.set_context("paper")
covid.head()

<!-- BEGIN QUESTION -->



In [None]:
# Put your code to answer the question here
# feel free to add more cells as needed
...

<!-- END QUESTION -->



---

To double-check your work, the cell below will rerun all of the autograder tests.

In [None]:
grader.check_all()