In [None]:
# initializing otter-grader
import otter
grader = otter.Notebook()

# Homework 3: Bike Sharing
## Exploratory Data Analysis (EDA) and Visualization
## Due Date: Sunday 5/3, 11:59 PM

**Collaboration Policy**

Data science is a collaborative activity. While you may talk with others about
the homework, we ask that you **write your solutions individually**. If you do
discuss the assignments with others please **include their names** below.

**Collaborators**: *list collaborators here*

## Introduction

Bike sharing systems are new generation of traditional bike rentals where the process of signing up, renting and returning is automated. Through these systems, users are able to easily rent a bike from one location and return them to another. We will be analyzing bike sharing data from Washington D.C. 

In this assignment, you will perform tasks to clean, visualize, and explore the bike sharing data. You will also investigate open-ended questions. These open-ended questions ask you to think critically about how the plots you have created provide insight into the data.

After completing this assignment, you should be comfortable with:

* reading plaintext delimited data into `pandas`
* wrangling data for analysis
* using EDA to learn about your data 
* making informative plots with Altair

We recommend reading through the entire assignment first to get an idea of your goals.

## Grading
Grading is broken down into autograded answers and free response. 

For autograded answers, the results of your code are compared to provided and/or hidden tests.

For free response, readers will evaluate how well you answered the question and/or fulfilled the requirements of the question.

For plots, your plots should be *similar* to the given examples. We will tolerate small variations such as color differences or slight variations in scale. However it is in your best interest to make the plots as similar as possible, as similarity is subject to the readers.

**Note that for ALL plotting questions from here on out, we will expect appropriate titles, axis labels, legends, etc. The following question serves as a good guideline on what is "enough": If I directly downloaded the plot and viewed it, would I be able to tell what was being visualized without knowing the question?** 



In [1]:
# Run this cell to set up your notebook.  
# Make sure ds100_utils.py is in this assignment's folder
import csv
import numpy as np
import pandas as pd
import altair as alt
alt.data_transformers.disable_max_rows()

import zipfile
from pathlib import Path
import ds100_utils # custom file to add helpful routines

# Default plot configurations

from IPython.display import display, Latex, Markdown

## Loading Bike Sharing Data
The data we are exploring is collected from a bike sharing system in Washington D.C.

The variables in this data frame are defined as:

Variable       | Description
-------------- | ------------------------------------------------------------------
instant | record index
dteday | date
season | 1. spring <br> 2. summer <br> 3. fall <br> 4. winter
yr | year (0: 2011, 1:2012)
mnth | month ( 1 to 12)
hr | hour (0 to 23)
holiday | whether day is holiday or not
weekday | day of the week
workingday | if day is neither weekend nor holiday
weathersit | 1. clear or partly cloudy <br> 2. mist and clouds <br> 3. light snow or rain <br> 4. heavy rain or snow
temp | normalized temperature in Celsius (divided by 41)
atemp | normalized "feels-like" temperature in Celsius (divided by 50)
hum | normalized percent humidity (divided by 100)
windspeed| normalized wind speed (divided by 67)
casual | count of casual users
registered | count of registered users
cnt | count of total rental bikes including casual and registered  

### Download the Data

In [2]:
data_dir = Path('data')

In [3]:
# Run this cell to look at the top of the file.  No further action is needed
for line in ds100_utils.head(data_dir/'bikeshare.txt'):
    print(line,end="")

### Size
Is the file big?  How many records do we expect to find? 

Let's find out.

In [4]:
# Run this cell to view some metadata.  No further action is needed
print("Size:", (data_dir/"bikeshare.txt").stat().st_size, "bytes")
print("Line Count:", ds100_utils.line_count(data_dir/"bikeshare.txt"), "lines")

### Loading the data

The following code loads the data into a Pandas DataFrame.

In [5]:
# Run this cell to load the data.  No further action is needed
bike = pd.read_csv(data_dir/'bikeshare.txt')
bike.head()

Below, we show the shape of the file. You should see that the size of the DataFrame matches the number of lines in the file, minus the header row.

In [6]:
bike.shape

## 0: Examining the Data

Before we start working with the data, let's examine its granularity.


### Question 0A
Look at the data stored in the dataframe `bike`. 
What is the granularity of these data (i.e. what does each row represent)?

<!--
BEGIN QUESTION
name: q0a
points: 1
manual: true
-->

<!-- EXPORT TO PDF -->

*Write your answer here, replacing this text.*

In [7]:
# Use this cell for scratch work. 
# If you need to add more cells for scratch work, add them BELOW this cell.

### Question 0B
For this assignment, we'll be using this data to study bike usage in Washington D.C. Based on the granularity and the variables present in the data, what might some limitations of using these data be? What would be two additional data categories/variables that you can collect to address some of these limitations?

<!--
BEGIN QUESTION
name: q0b
points: 1
manual: true
-->
<!-- EXPORT TO PDF -->

*Write your answer here, replacing this text.*

In [8]:
# Use this cell for scratch work. If you need to add more cells for scratch work, add them BELOW this cell.

---
## 1: Data Preparation
A few of the variables that are numeric/integer actually encode categorical data. These include `holiday`, `weekday`, `workingday`, and `weathersit`. In the following problem, we will convert these four variables to strings specifying the categories. In particular, use 3-letter labels (`Sun`, `Mon`, `Tue`, `Wed`, `Thu`, `Fri`, and `Sat`) for `weekday`. You may simply use `yes`/`no` for `holiday` and `workingday`. 

In this exercise we will *mutate* the data frame, **overwriting the corresponding variables in the data frame.** However, our notebook will effectively document this in-place data transformation for future readers. Make sure to leave the underlying datafile `bikeshare.txt` unmodified.

### Question 1a (Decoding `weekday`, `workingday`, and `weathersit`)


Decode the `holiday`, `weekday`, `workingday`, and `weathersit` fields:

1. `holiday`: Convert to `yes` and `no`.  Hint: There are fewer holidays...
1. `weekday`: It turns out that Monday is the day with the most holidays.  Mutate the `'weekday'` column to use the 3-letter label (`'Sun'`, `'Mon'`, `'Tue'`, `'Wed'`, `'Thu'`, `'Fri'`, and `'Sat'` ...) instead of its current numerical values. Assume `0` corresponds to `Sun`, `1` to `Mon` and so on.
1. `workingday`: Convert to `yes` and `no`.
1. `weathersit`: You should replace each value with one of `Clear`, `Mist`, `Light`, or `Heavy`. Hint: Use the `'hum'` (humidity) column to determine whether your value should be `Clear` (less humid), `Mist` (slightly more humid), `Light` (more humid), or `Heavy` (very humid).

**Note:** In this exercise you need to *mutate* the data frame, **overwriting the corresponding variables in the data frame**. If you want to revert changes, run the cell that reloads the csv.

**Hint:**  One approach is to use the [replace](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.replace.html) method of the pandas DataFrame class. We haven't discussed how to do this so you'll need to look at the documentation. The most concise way is to use the approach described in the documentation as "nested-dictonaries" (which you can generate manually to describe the mapping of the values), though there are many possible solutions.
<!--
BEGIN QUESTION
name: q1a
points: 2
gradescope: show
-->

In [9]:
# Modify holiday weekday, workingday, and weathersit here
...

### Question 1b (Holidays)

How many entries in the data correspond to holidays?  Set the variable `num_holidays` to this value.
<!--
BEGIN QUESTION
name: q1b
points: 1
gradescope: show
-->

In [18]:
num_holidays = ...

### Question 1c (Computing Daily Total Counts)
In the next few questions we will be analyzing the daily number of registered and unregistered users.

Construct a data frame named `daily_counts` indexed by `dteday` with the following columns:
* `casual`: total number of casual riders for each day
* `registered`: total number of registered riders for each day
* `workingday`: whether that day is a working day or not (`yes` or `no`)
* `weathersit`: what was the weather situation like (`Clear`, `Mist`, `Light`, or `Heavy`)
* `season`: what season the day falls on

**Hint**: `groupby` and `agg`. For the `agg` method, please check the [documentation](https://pandas.pydata.org/pandas-docs/version/0.23.4/generated/pandas.core.groupby.DataFrameGroupBy.agg.html) for examples on applying different aggregations per column. If you use the capability to do different aggregations by column, you can do this task with a single call to `groupby` and `agg`. For the `workingday` column we can take any of the values since we are grouping by the day, thus the value will be the same within each group. Take a look at the `'first'` or `'last'` aggregation functions.

<!--
BEGIN QUESTION
name: q1c
points: 4
gradescope: show
-->

In [20]:
daily_counts = ...

---
# 2: Exploring the Distribution of Riders

In this question we'll begin by comparing the distribution of the daily counts of casual and registered riders.  

### Question 2a

Usually there are multiple ways to solve a problem. In this and the next question, create the same visualization using `pandas` methods and the built-in Altair transformations.


For this question, create a long table using the pandas function [melt](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.melt.html) that has one column indicating the rider type (`casual` vs `registered`) and another column with the number of riders. 

The temporal granularity of the records should be *daily counts*, which you should have after completing question 1c. 

As always, include a properly-labeled legend, xlabel, ylabel, and a title. 

After creating the plot, look at it and make sure you understand what the plot is actually telling us, e.g., on a given day, the most likely number of registered riders we expect is ~4000, but it could be anywhere from nearly 0 to 7000. (Note that our example uses a `bin=alt.Bin(maxbins=100)` parameter.)

<img src='images/casual_vs_registered.svg' width="600px" />

<!--
BEGIN QUESTION
name: q2c
points: 2
manual: true
-->
<!-- EXPORT TO PDF -->

In [24]:
## A long table with 2 columns
daily_counts_long = ...

## Call alt.Chart on daily_counts_long to make the look like the provided figure
...

### Question 2b

Use the [layered_histogram](https://altair-viz.github.io/gallery/layered_histogram.html) function to create a plot that overlays the histograms of the daily counts of bike users, using one color to represent `casual` riders, and another to represent `registered` riders. The temporal granularity of the records should be _daily counts_, which you should have after completing question 1c. The example on the [layered_histogram](https://altair-viz.github.io/gallery/layered_histogram.html) page should be very helpful for this question.

As always, include a properly-labeled legend, xlabel, ylabel, and a title. 

After creating the plot, look at it and make sure you understand what the plot is actually telling us, e.g., on a given day, the most likely number of registered riders we expect is ~4000, but it could be anywhere from nearly 0 to 7000. (Note that our example uses a `bin=alt.Bin(maxbins=100)` parameter.)

Your figure should look identical to the figure in 2a.


**Hint**: You will need to use the [`transform_fold`](https://altair-viz.github.io/user_guide/transform/fold.html) function shown in the [layered_histogram](https://altair-viz.github.io/gallery/layered_histogram.html) example.  This function is similar to `pd.melt` discussed in class, in that it takes a wide data frame and converts it to a long format.  [Altair works best with long format data](https://altair-viz.github.io/user_guide/transform/fold.html).

<!--
BEGIN QUESTION
name: q2a
points: 4
manual: true
-->

<!-- EXPORT TO PDF -->

In [26]:
...

### Question 2c

In the cell below, describe the differences you notice between the histograms for casual and registered riders.  Consider concepts such as modes, symmetry, skewness, tails, gaps and outliers.  Include a comment on the spread of the distributions. 
<!--
BEGIN QUESTION
name: q2b
points: 2
manual: true
-->
<!-- EXPORT TO PDF -->

*Write your answer here, replacing this text.*

### Question 2d

The density plots do not show us how the counts for registered and casual riders vary together. Use [mark_point]() to make a scatter plot to investigate the relationship between casual and registered counts. This time, let's use the `bike` DataFrame to plot hourly counts instead of daily counts.  Color the points in the scatterplot according to whether or not the day is a working day (your colors do not have to match ours exactly, but they should be different based on whether the day is a working day). In addition, there are many points in the scatter plot, so make them small to help reduce [overplotting](https://www.data-to-viz.com/caveat/overplotting.html).

The [transform_regression](https://altair-viz.github.io/user_guide/transform/regression.html) function will also try to draw a linear regression line. In order to add two regression lines, set the `groupby` argument of the `transform_regression` to the working day variable.  

Your image should look something like this:

<img src='images/workingday.svg' width="600px" />

<!--
BEGIN QUESTION
name: q2d
points: 2
manual: true
-->

<!-- EXPORT TO PDF -->

In [27]:

...

### Question 2e

What does this scatterplot reveal about the relationship (if any) between casual and registered riders and whether or not the day they ride is on the weekend? What effect does [overplotting](https://www.data-to-viz.com/caveat/overplotting.html) have on your ability to describe this relationship?

<!--
BEGIN QUESTION
name: q2e
points: 2
manual: true
-->
<!-- EXPORT TO PDF -->

*Write your answer here, replacing this text.*

# 3 Interactive Visualizations of Seasonal Patterns


### Question 3a

In this question we'll try visualizing the relationship between registered and casual riders using a heatmap. In Altair the heat map uses the [mark_rect](https://altair-viz.github.io/gallery/simple_heatmap.html#gallery-simple-heatmap) property.  Create a heatmap with causal riders on the x-axis and registered riders on the y-axis.  Use any color scheme you like.  

Save you chart in a variable named `rect` (which will be used in the next question) and display it.

[This example](https://altair-viz.github.io/gallery/binned_heatmap.html) may be particularly helpful.

    
<!--
BEGIN QUESTION
name: q3a
points: 2
manual: true
-->
<!-- EXPORT TO PDF -->

In [28]:
rect = ...
...

rect

### Question 3b

Now, let's see if we can create an interactive visualization around the heatmap we just drew.  To do that we'll follow the [iteractive cross highlight](https://altair-viz.github.io/gallery/interactive_cross_highlight.html) example in the documentation.  In our example we'll make a bar plot indicating the total number of riders per season.

When complete, if you hold the shift key and click on the bars, you can select which observation appear in the heat map.

1. Add a column to `daily_counts` called `total` which is the total number of riders each day (sum of `registered` and `casual`).
2. Make a barplot called `bar_plot`, which has the total riders in each season. 
    - Hint: the y encoding will need be the sum of total daily counts.  See the documentation on useful [encoding shortands](https://altair-viz.github.io/user_guide/encoding.html#encoding-shorthands)
    - Follow the interactive cross highilght example to set the color of the bars to change when you click on them.
    

Use the interactive visualization to quickly detect which season had the day with the most casual riders.  Save your answer as either "spring", "summer", "winter" or "fall" in a variable calle `most_casual`.  
    
<!--
BEGIN QUESTION
name: q3b
points: 2
manual: true
-->
<!-- EXPORT TO PDF -->

In [32]:
# This creates a selection tool
pts = alt.selection(type="multi", encodings=['x'])

## This is a point plot where the size corresponds to the number of riders in each rectangle
## We'll only show the circle if the relevant bar has been selected
## This is done with `transform_filter on out pts selection tool
circ = rect.mark_point().encode(
    alt.ColorValue('grey'),
    alt.Size('count()',
        legend=alt.Legend(title='Records in Selection')
    )
).transform_filter(
    pts
)

## Make a bar plot of total riders in each season and add the selection tool
...



## Put all of the plots together.
interactive_plot = alt.vconcat(
    rect + circ,
    bar_plot
).resolve_legend(
    color="independent",
    size="independent"
)

# Use the interactive visualization to quickly see which season had the day with most casual riders
# You response should be either "spring", "summer", "winter" or "fall"
most_casual = ...

interactive_plot

---
# 4: Understanding Daily Patterns

### Question 4a
Let's examine the behavior of riders by plotting the average number of riders for each hour of the day over the **entire dataset**, stratified by rider type.  

Your plot should look like the plot below. While we don't expect your plot's colors to match ours exactly, your plot should have different colored lines for different kinds of riders.

<img src="images/diurnal_bikes.svg" width="400px"/>

From the `bike` data frame, compute a new data frame called `hourly_means` with the average number of riders at each hour of the day.  

**Hint:** By default Altair cannot include indices in the channel encodings.  If `hr` is an index in your data frame, make it a column using the `reset_index` function.  See the note on [including index data in Altair](https://altair-viz.github.io/user_guide/data.html#including-index-data).

<!--
BEGIN QUESTION
name: q4a
points: 2
manual: true
-->
<!-- EXPORT TO PDF -->

In [34]:
hourly_means = ...

...

### Question 4b

What can you observe from the plot?  Hypothesize about the meaning of the peaks in the registered riders' distribution.

<!--
BEGIN QUESTION
name: q4b
points: 2
manual: true
-->
<!-- EXPORT TO PDF -->

*Write your answer here, replacing this text.*

---
# 5: Exploring Ride Sharing and Weather
In this question we'll start examining how the weather is affecting rider's behavior. 

### Question 5a

We can quickly view relationships between many variables using what is often called a "pairs plot".  In Altair they call this [repeated-charts](https://altair-viz.github.io/user_guide/compound_charts.html#repeated-charts). Make a 4x4 grid of plots showing the relationship between the `casual`, `hum`, `windspeed`, and `temp` variables.  Set the color of the points to the `season` variables.  Do the number of casual riders seem related to weather? Are any of the weather variables related?

To limit the computation time and memory required to make the plot, make the plot using `bike_sample`, which is a random sample of 1000 rows of `bike`.  You may want to first define `row_sample` as 1000 random row indices that you can use to make the selection.

<!--
BEGIN QUESTION
name: q5a
points: 4
-->


In [36]:
import altair as alt

row_sample = ...
bike_sample = ...

...

### Question 5b
Next, let's look at how the proportion of casual riders changes as weather changes.
Create a new column in the `bike` DataFrame calle `prop_casual` which represents the proportion of casual riders out of all riders for each record.

<!--
BEGIN QUESTION
name: q5b
points: 1
-->

In [38]:
...

### Question 5c
In order to examine the relationship between proportion of casual riders and temperature, we can again create a scatterplot using `mark_point`. We can even use color/hue to encode the information about day of week. Run the cell below, and you'll see we end up with a big mess that is impossible to interpret.
<!--
BEGIN QUESTION
name: q5c
points: 2
-->

In [40]:
alt.Chart(bike).mark_point().encode(
     x='temp',
    y='prop_casual',
    color='weekday'
).configure_mark(
    opacity=0.5,
)

### Question 5d

A better approach is to use local smoothing. The basic idea is that for each x value, we compute some sort of representative y value that captures the data close to that x value. One technique for local smoothing is LOESS transform (LOcally Estimated Scatterplot Smoothing), which is also known as "Locally Weighted Scatterplot Smoothing" (LOWESS). 

See an example is below. The blue curve shown is a smoothed version of the scatterplot. To include the LOESS fit to the scatter plot, we can use the [transform_loess](https://altair-viz.github.io/user_guide/transform/loess.html) function in Altair.


In [41]:

# Make noisy data
xobs = np.sort(np.random.rand(100)*4.0 - 2)
noisy_data = pd.DataFrame({
    'xobs' : xobs,    
    'yobs' : np.exp(xobs) + np.random.randn(100) / 2.0
})

chart = alt.Chart(noisy_data).transform_loess('xobs', 'yobs').mark_line(size=2, color="red").encode(
    x='xobs',
    y='yobs'
)

chart + alt.Chart(noisy_data).mark_point(color="blue").encode(
    x = "xobs",
    y = "yobs"
)

In our case with the bike ridership data, we want 7 curves, one for each day of the week. The x-axis will be the temperature and the y-axis will be a smoothed version of the proportion of casual riders.

<img src="images/curveplot_temp_prop_casual.svg" width="600px" />

You should use [transform_loess](https://altair-viz.github.io/user_guide/transform/loess.html) just like the example above. Unlike the example above, plot ONLY the lowess curve. Do not plot the actual data points, which would result in a mess of points covering the smooth line fits. 

You do not need to match the colors on our sample plot as long as the colors in your plot make it easy to distinguish which day they represent.

Set the plot title to "Temperature vs Casual Rider Proportion by Weekday".

**Hints:** 
* Start by creating a new column in the `bike` data frame, called `temp_f`.  Look at the top of this homework notebook for a description of the temperature field to know how to convert to Fahrenheit. By default, the temperature field ranges from 0.0 to 1.0. In case you need it, $\text{Fahrenheit} = \text{Celsius} * \frac{9}{5} + 32$.

<!--
BEGIN QUESTION
name: q5d
points: 4
-->


In [42]:
...

...


### Question 5e

Repeat the above plot, but this change the `bandwith` argument for the `transform_loess` function.  The bandwidth parameter controls the smoothness of the interpolating line, with smaller numbers leading to less smoothness and larger numbers to more smoothness.  Create one chart with `bandwith=0.1` (call it `chart1`) and another with `bandwidth=0.8` (call it `chart8`).  [Use the `|` operator to plot `chart1` and `chart8` side by side.](https://altair-viz.github.io/user_guide/compound_charts.html#horizontal-concatenation)

<!--
BEGIN QUESTION
name: q5e
points: 2
-->


In [43]:
...

chart1 | chart8


### Question 5f

The default bandwidth for `transform_loess` is 0.3.  Among the 0.1, 0.3 and 0.8, which amount of smoothness was most informative about the relationship between casual riders and temperature? Why? In your answer, you should try to address which patterns in the lowess fits you think are robust or likely to be repeated in future years.


*Write your answer here, replacing this text.*


### Question 5g
Remake the above plot, this time grouping by `workingday` rather than `weekday`.  Use a bandwidth of your choice.

<!--
BEGIN QUESTION
name: q5g
points: 2
-->


In [44]:
...


### Question 5h
What do you see from the plots above? How is `prop_casual` changing as a function of temperature? Do you notice anything else interesting? Which plot do you think most clearly tells the story about bike share use and temperature and days of the week.  Your answer should the plots the previous three parts.  
<!--
BEGIN QUESTION
name: q5h
points: 2
manual: true
-->
<!-- EXPORT TO PDF -->

*Write your answer here, replacing this text.*

# Question 6 Expanding our Analysis
### Question 6a

Imagine you are working for a Bike Sharing Company that collaborates with city planners, transportation agencies, and policy makers in order to implement bike sharing in a city. These stakeholders would like to reduce congestion and lower transportation costs. They also want to ensure the bike sharing program is implemented equitably. In this sense, equity is a social value that is informing the deployment and assessment of your bike sharing technology. 

Equity in transportation includes: improving the ability of people of different socio-economic classes, genders, races, and neighborhoods to access and afford the transportation services, and assessing how inclusive transportation systems are over time. 

Do you think the `bike` data as it is can help you assess equity? If so, please explain. If not, how would you change the dataset? You may discuss how you would change the granularity, what other kinds of variables you'd introduce to it, or anything else that might help you answer this question.

<!--
BEGIN QUESTION
name: q6a
points: 2
manual: true
-->

<!-- EXPORT TO PDF -->

*Write your answer here, replacing this text.*

In [45]:
# Use this cell for scratch work. If you need to add more cells for scratch work, add them BELOW this cell.

### Question 6b
[Bike sharing is growing in popularity](https://www.bts.gov/newsroom/bike-share-stations-us) and new cities and regions are making efforts to implement bike sharing systems that complement their other transportation offerings. The [goals of these efforts](https://www.wired.com/story/americans-falling-in-love-bike-share/) are to have bike sharing serve as an alternate form of transportation in order to alleviate congestion, provide geographic connectivity, reduce carbon emissions, and promote inclusion among communities.

Bike sharing systems have spread to many cities across the country. The company you work for asks you to determine the feasibility of expanding bike sharing to additional cities of the U.S. 

Based on your plots in this assignment, what would you recommend and why? Please list at least two reasons why, and mention which plot(s) you drew you analysis from. 

**Note**: There isn't a set right or wrong answer for this question, feel free to come up with your own conclusions based on evidence from your plots! 

<!--
BEGIN QUESTION
name: q6b
points: 2
manual: true
-->

<!-- EXPORT TO PDF -->

*Write your answer here, replacing this text.*

In [46]:
# Use this cell for scratch work. If you need to add more cells for scratch work, add them BELOW this cell.

# Submit
Make sure you have run all cells in your notebook in order before running the cell below, so that all images/graphs appear in the output.
**Please save before submitting!**

<!-- EXPECT 15 EXPORTED QUESTIONS -->

# Running Built-in Tests
1. All tests are in `tests` directory
1. Each python file in `tests` is a test
1. `grader.check('testname')` runs test `'testname'`, e.g. `'q1'`
1. `grader.check_all()` runs all visible tests

In [None]:
# Run built-in checks
grader.check_all()

In [None]:
# Generate pdf in classic notebook (does not work in JupyterLab)
import nb2pdf
nb2pdf.convert('hw3.ipynb')

# To generate pdf using command-line, run in terminal,
# nb2pdf hw3.ipynb

# Submission Checklist
1. Check filename is 'hw3.ipynb'
1. Save file to confirm all changes are on disk
1. Run *Kernel > Restart & Run All* to execute all code from top to bottom
1. Check `grader.check_all()` output
1. Save file again to write any new output to disk
1. Check generated pdf that all responses are displayed correctly
1. Submit to Gradescope