# Exercises for Data Visualization

## Setup

Run the next cell to import and configure the Python libraries that you need to complete the exercise.

In [1]:
import pandas as pd
from pandas.plotting import register_matplotlib_converters
import matplotlib.pyplot as plt
%matplotlib inline
import seaborn as sns

## Exercise 1: Line Charts

#### Scenario

You have recently been hired to manage the museums in the City of Los Angeles. Your first project focuses on the four museums pictured in the images below.

![ex1_museums](https://i.imgur.com/pFYL8J1.png)

You will leverage data from the Los Angeles [Data Portal](https://data.lacity.org/) that tracks monthly visitors to each museum.  

![ex1_xlsx](https://i.imgur.com/mGWYlym.png)

#### Step 1: Load your data

Your first assignment is to read the LA Museum Visitors data file "museum_visitors.csv" into `museum_data`.  Note that:
- The filepath to the dataset is stored as `museum_filepath`.  
- The name of the column to use as row labels is `"Date"`.  

In [None]:
# Path of the file to read
museum_filepath = " "

# Read the file into a variable museum_data
museum_data = 



#### Step 2: Review the data

Use a Python command to print the last 5 rows of the data.

In [None]:
# Print the last five rows of the data 



#### Step 3: Convince the museum board 

The Firehouse Museum claims they ran an event in 2014 that brought an incredible number of visitors, and 
that they should get extra budget to run a similar event again.  The other museums think these types of 
events aren't that important, and budgets should be split purely based on recent visitors on an average day.  

To show the museum board how the event compared to regular traffic at each museum, create a line chart 
that shows how the number of visitors to each museum evolved over time.  Your figure should have four 
lines (one for each museum).

In [None]:
# Line chart showing the number of visitors to each museum over time



#### Step 4: Assess seasonality

When meeting with the employees at Avila Adobe, you hear that one major pain point is that the number of 
museum visitors varies greatly with the seasons, with low seasons (when the employees are perfectly 
staffed and happy) and also high seasons (when the employees are understaffed and stressed).  You realize 
that if you can predict these high and low seasons, you can plan ahead to hire some additional seasonal 
employees to help out with the extra work.

#### Part A
Create a line chart that shows how the number of visitors to Avila Adobe has evolved over time.  
(_If your code returns an error, the first thing that you should check is that you've spelled the 
name of the column correctly!  You must write the name of the column exactly as it appears 
in the dataset._)

In [None]:
# Line plot showing the number of visitors to Avila Adobe over time



#### Part B

Does Avila Adobe get more visitors:
- in September-February (in LA, the fall and winter months), or 
- in March-August (in LA, the spring and summer)?  

Using this information, when should the museum staff additional seasonal employees?

In [None]:
### Exercise 2: Bar Charts and Heatmaps

In this exercise, you will use your new knowledge to propose a solution to a real-world scenario.  
To succeed, you will need to import data into Python, answer questions using the data, and 
generate **bar charts** and **heatmaps** to understand patterns in the data.

#### Scenario

You've recently decided to create your very own video game!  As an avid reader 
of [IGN Game Reviews](https://www.ign.com/reviews/games), you hear about all of the most recent 
game releases, along with the ranking they've received from experts, ranging 
from 0 (_Disaster_) to 10 (_Masterpiece_).

![ex2_ign](https://i.imgur.com/Oh06Fu1.png)

You're interested in using [IGN reviews](https://www.ign.com/reviews/games) to guide the 
design of your upcoming game.  Thankfully, someone has summarized the rankings in a 
really useful CSV file that you can use to guide your analysis.

#### Step 1: Load the data

Read the IGN data file "ign_scores.csv" into `ign_data`.  Use the `"Platform"` column to label the rows.

In [None]:
# Path of the file to read
ign_filepath = ""

# read the file into a variable ign_data
ign_data = 

#### Step 2: Review the data

Use a Python command to print the entire dataset.

In [None]:
# Print the data


#### Step 3: Which platform is best?

Since you can remember, your favorite video game has 
been [**Mario Kart Wii**](https://www.ign.com/games/mario-kart-wii), a racing game 
released for the Wii platform in 2008.  And, IGN agrees with you that it is a 
great game -- their rating for this game is a whopping 8.9!  Inspired by the 
success of this game, you're considering creating your very own racing game 
for the Wii platform.

#### Part A

Create a bar chart that shows the average score for **racing** games, for each platform.  
Your chart should have one bar for each platform. 

In [None]:
# Bar chart showing average score for racing games by platform



#### Part B

Based on the bar chart, do you expect a racing game for the **Wii** platform to receive a high 
rating?  If not, what gaming platform seems to be the best alternative?

#### Step 4: All possible combinations!

Eventually, you decide against creating a racing game for Wii, but you're still committed to 
creating your own video game!  Since your gaming interests are pretty 
broad (_... you generally love most video games_), you decide to use the IGN data to 
inform your new choice of genre and platform.

#### Part A

Use the data to create a heatmap of average score by genre and platform.  

In [None]:
# Heatmap showing average game score by platform and genre



#### Part B

Which combination of genre and platform receives the highest average ratings?  Which combination 
receives the lowest average rankings?

### Exercise 3: Scatter plots

In this exercise, you will use your new knowledge to propose a solution to a real-world scenario.  
To succeed, you will need to import data into Python, answer questions using the data, and 
generate **scatter plots** to understand patterns in the data.

## Scenario

You work for a major candy producer, and your goal is to write a report that your company can 
use to guide the design of its next product.  Soon after starting your research, you stumble 
across this [very interesting dataset](https://fivethirtyeight.com/features/the-ultimate-halloween-candy-power-ranking/) containing results from a fun survey to crowdsource favorite candies.


#### Step 1: Load the Data

Read the candy data file "candy.csv" into `candy_data`.  Use the `"id"` column to label the rows.

In [None]:
# Path of the file to read
candy_filepath = " "

# Read the file into a variable candy_data
candy_data = 



#### Step 2: Review the data

Use a Python command to print the first five rows of the data.

In [None]:
# Print the first five rows of the data



## Step 3: The role of sugar

Do people tend to prefer candies with higher sugar content?  

#### Part A

Create a scatter plot that shows the relationship between `'sugarpercent'` (on the horizontal x-axis) 
and `'winpercent'` (on the vertical y-axis).  _Don't add a regression line just yet -- you'll do that 
in the next step!_

In [None]:
# Scatter plot showing the relationship between 'sugarpercent' and 'winpercent'



#### Part B

Does the scatter plot show a **strong** correlation between the two variables?  If so, are candies with 
more sugar relatively more or less popular with the survey respondents?

In [None]:
## Write your answer to part B here.




#### Step 4: Take a closer look

#### Part A

Create the same scatter plot you created in **Step 3**, but now with a regression line!

In [None]:
# Scatter plot w/ regression line showing the relationship between 'sugarpercent' and 'winpercent'






In [None]:
sns.regplot(x=candy_data['sugarpercent'], y=candy_data['winpercent'])

#### Part B

According to the plot above, is there a **slight** correlation between `'winpercent'` and `'sugarpercent'`?  
What does this tell you about the candy that people tend to prefer?