In [None]:
# Initialize Otter
import otter
grader = otter.Notebook("lab2.ipynb")

# Lab 2 - Bar charts

The questions in each lab will be autograded, and all the autograded tests will give you immediate feedback directly in the notebook. This way you can quickly iterate and learn how to create your visualization during the lab time. After you upload your lab on gradescope, the same tests will run again and you will be able to see your score right away.

## Submission instructions

<div class="alert alert-info" style="color:black">
<ul>
  <li>Download this lab from the Jupyter Book web page by clicking the download symbol at the top right of the page and selecting the <code>.ipynb</code> format.</li>
  <li>To submit this lab, answer all the questions and then upload the completed lab to Gradescope.</li>
    <ul>
      <li>Before submitting, make sure you restart the kernel and rerun all cells (click the ▶▶-button).</li>
    </ul>
  <li>Don't change any variable names that are given to you, don't move cells around, and don't include any code to install packages in the notebook.</li>
  </li>
</ul>
</div>

## The Gapminder dataset

We will continue working with the Gapminder dataset in this lab,
so here we are repeating the description of what each column contains
so that you can refer back to throughout the lab.

| Column                | Description                                                                                  |
|-----------------------|----------------------------------------------------------------------------------------------|
| country               | Country name                                                                                 |
| year                  | Year of observation                                                                          |
| population            | Population in the country at each year                                                       |
| region                | Continent the country belongs to                                                             |
| sub_region            | Sub-region the country belongs to                                                            |
| income_group          | Income group [as specified by the world bank in 2018]                                                |
| life_expectancy       | The mean number of years a newborn would <br>live if mortality patterns remained constant    |
| income                | GDP per capita (in USD) <em>adjusted <br>for differences in purchasing power</em>            |
| children_per_woman    | Average number of children born per woman                                                    |
| child_mortality       | Deaths of children under 5 years <break>of age per 1000 live births                          |
| pop_density           | Average number of people per km<sup>2</sup>                                                  |
| co2_per_capita        | CO2 emissions from fossil fuels (tonnes per capita)                                          |
| years_in_school_men   | Mean number of years in primary, secondary,<br>and tertiary school for 25-36 years old men   |
| years_in_school_women | Mean number of years in primary, secondary,<br>and tertiary school for 25-36 years old women |

[as specified by the world bank in 2018]: https://datahelpdesk.worldbank.org/knowledgebase/articles/378833-how-are-the-income-group-thresholds-determined

In [None]:
# Run this cell to ensure that altair plots show up on gradescope
# We will talk more about what these lines do later in the course
import altair as alt

# Handle large data sets without embedding them in the notebook
alt.data_transformers.enable('data_server')
# Include an image for each plot since Gradescope only supports displaying plots as images
alt.renderers.enable('mimetype')

### Question 1

<div class="alert alert-info" style="color:black">

I have uploaded the <a href=https://raw.githubusercontent.com/UofTCoders/workshops-dc-py/master/data/processed/world-data-gapminder.csv> 2018 Gapminder data at this URL.</a> Use <code>read_csv</code> from <code>pandas</code> to load the data directly from the URL and assign it a suitable variable name. Set the <code>parse_dates</code> parameter to <code>['year']</code> to ensure that Altair recognizes this columns as time data.

Filter the dataframe to only keep observations from a single year, 1982 and assigned this to a new variable name `gm_1982`. Create a bar chart of this filtered dataframe, which encodes the count of each region on the `x` channel and is sorted by count with the longest bar the closest to the x-axis line.<a href="https://altair-viz.github.io/gallery/bar_chart_sorted.html">Here is an example of how to sort in Altair</a>
</div>

_Points:_ 3

In [None]:
import pandas as pd


url = 'https://raw.githubusercontent.com/UofTCoders/workshops-dc-py/master/data/processed/world-data-gapminder.csv'
# Read in the data using pandas
gm = ...

# Only keep data from the year 1982
gm_1982 = ...

bar_num_countries = ...

# Show the chart
bar_num_countries

In [None]:
grader.check("q1")

### Question 2

<div class="alert alert-info" style="color:black">

Filter the dataframe to only keep observations from a single continent, "Europe" and assign this to a new variable name `gm_1982_europe`.

</div>

_Points:_ 1

In [None]:
gm_1982_europe = ...
gm_1982_europe 

In [None]:
grader.check("q2")

<!-- BEGIN QUESTION -->

### Question 3

<div class="alert alert-info" style="color:black">

Create a sorted bar chart for all the European nations
showing their life expectancy in 1982.
The countries should be spread out on the y-axis
with the nation with the longest life expectancy the closest to the x-axis.
    
</div>

_Points:_ 3

In [None]:
bar_life_exp = ...

# Show the chart
bar_life_exp

In [None]:
grader.check("q3")

<!-- END QUESTION -->

### Question 4

<div class="alert alert-info" style="color:black">

<ol type="1">
<li>Start with the entire gapminder data set and filter it to include only the most recent year when <code>'co2_per_capita'</code> was measured (it is up to you how you find out which year this is).</li>
<li>Use the data frame <code>nlargest</code> method to select the top 40 countries in CO2 production per capita for that year.</li>
<li>Since we have only one value per country per year, let’s create a bar chart to visualize it. Encode the CO2 per capita as on the x-axis and the country on the y-axis.</li>
<li>Sort your bar chart so that the highest CO2 per capita is the closest to the x-axis (the bottom of the chart). </li>
<li>Finally, encode the income group with color. You can use the default color palette provided. </li>

</ol>
</div>

</div>

_Points:_ 3

In [None]:
gm_40_largest_co2 = ...

bar_co2 = ...

# Show the chart
bar_co2

In [None]:
grader.check("q4")

### Question 5

<div class="alert alert-info" style="color:black">

<ol type="1">
<li>From the full gapminder dataset select only the observations from the years 1934, 1954, 1974, 1994, and 2014.</li>
<li>In addition to the co2 per capita, the total population also matter for a country’s overall co2 emissions. Compute a new column in the gapminder data set called <code>'co2_total'</code> which contains the total co2 emissions for each country.</li>
<li>Plot this new column over time in a bar chart with time on the x-axis. Instead of plotting one bar for each country, plot one stacked bar for each year colored by the region. The stacked bars should represent the sum of all countries co2 emissions in that region.</li>
<li>Decide whether plotting the year as a temporal, nominal, or ordinal data type makes the most sense for this plot.</li>
</ol>
</div>

</div>

_Points:_ 3

In [None]:
# You might see a `SettingWithCopyWarning` here, which you can ignore for this exercise
gm_five_years = ...
gm_five_years['co2_total'] = ...

                   
bar_co2_total = ...

bar_co2_total

In [None]:
grader.check("q5")

### Question 6

<div class="alert alert-info" style="color:black">

<ol type="1">
<li>Create the same bar chart in the previous question, but normalize the stacked bars so that it is easier to see how the proportion of each continent's contributions have changed over time.</li>
<li>Since the 1930's, which continents have increased their share of the worlds co2 emissions? Enter you answer as a Python list, e.g. <code>['Africa', 'Americas']`</code>
</ol>
</div>

</div>

_Points:_ 1

In [None]:
bar_co2_total_norm = ...
...

# Show the chart
bar_co2_total_norm

In [None]:
# Enter your answer to question two as a list, e.g. ['Africa', 'Americas']
increased_continents = ...

In [None]:
grader.check("q6")

<div class="alert alert-danger" style="color:black">
    
**Restart and run all cells before submitting**
    
Before submitting,
don't forget to run all cells in your notebook
to make sure there are no errors.
You can do this by clicking the ▶▶ button
or going to `Kernel -> Restart Kernel and Run All Cells...` in the menu.
This is not only important for this course,
but a good habit you should get into before ever committing a notebook to GitHub,
so that your collaborators can run it from top to bottom
without issues.
</div>