# Lab 01: CSS 120

In this self-paced lab, you'll get hands-on practice with the following **concepts**:

- Open file
- Perform Exploratory Data Analysis

### Note on hidden tests

Some of the questions in this lab will have **hidden tests**. This means that in addition to the `assert` statements you see in the code, there are *hidden* tests. The point is to check whether your code **generalizes** to other cases.

The lab will notify you when/where there is a hidden test, even if you can't see it.

### Motivation

One issue with climate change mitigation policies is that people prefer them but not close to their homes. For instance, imagine that SDGE intends to put a large solar farm close to your place. Imagine furthermore that the mountain you love to hike will have a solar farm. It seems something not very cool indeed.

In this Lab we are going to investigate a dataset of wind turbine proposals and government party vote share in Canada.

## Question 01

Load pandas and numpy using their common aliases.

In [1]:
### BEGIN SOLUTION
import pandas as pd
import numpy as np
### END SOLUTION

In [2]:
assert pd and np

## Question 02

The dataset is in <https://raw.githubusercontent.com/umbertomig/CSS120/main/stokes_electoral_2015.csv>. Load it on pandas, saving it as `dat`.

In [3]:
### BEGIN SOLUTION
dat = pd.read_csv("https://raw.githubusercontent.com/umbertomig/CSS120/main/stokes_electoral_2015.csv")
### END SOLUTION

In [4]:
assert dat.shape[0] == 1500 and type(dat) == pd.DataFrame

## Question 03

What is the average vote share of the Liberal Party (the ruling party at the time)?

1. Save the results as `avg_libvs`.
2. The `perc_lib` variable holds this information.

In [5]:
avg_libvs = ...

In [6]:
### BEGIN SOLUTION
avg_libvs = dat.perc_lib.mean()
### END SOLUTION

In [7]:
assert float(str(avg_libvs)[:4]) < 1/3 and float(str(avg_libvs)[:5]) > 1/3

## Question 04

How many turbine projects were proposed?

1. Save the results as `nprojs`.
2. The `propturbine` variable holds this information.

In [8]:
nprojs = ...

In [9]:
### BEGIN SOLUTION
nprojs = dat.propturbine.sum()
### END SOLUTION

In [10]:
assert nprojs % 42 == nprojs // 42

## Question 05

How many projects were proposed in 2003?

1. Save the results as `nprojs2003`.
2. The `propturbine` variable holds this information.

In [11]:
nprojs2003 = ...

In [12]:
### BEGIN SOLUTION
nprojs2003 = dat.loc[dat.elecyear == 2003].propturbine.sum()
### END SOLUTION

In [13]:
assert not nprojs2003 % 2

## Question 06

How many projects were proposed in 2007?

1. Save the results as `nprojs2007`.
2. The `propturbine` variable holds this information.

In [14]:
nprojs2007 = ...

In [15]:
### BEGIN SOLUTION
nprojs2007 = dat.loc[dat.elecyear == 2007].propturbine.sum()
### END SOLUTION

In [16]:
assert (nprojs2007 % 7) / (nprojs2007 // 3) <= 4/3

## Question 07

How many projects were proposed in 2011?

1. Save the results as `nprojs2011`.
2. The `propturbine` variable holds this information.

In [17]:
nprojs2011 = ...

In [18]:
### BEGIN SOLUTION
nprojs2011 = dat.loc[dat.elecyear == 2011].propturbine.sum()
### END SOLUTION

In [19]:
assert not nprojs2011 % 2 and nprojs2011 > nprojs2007

## Question 08

What was the vote share in the places with proposals?

1. Save the results as `vs_prop`.
2. The `propturbine` and `perc_lib` variables hold these informations.

In [20]:
vs_prop = ...

In [21]:
### BEGIN SOLUTION
vs_prop = dat.loc[dat.propturbine].perc_lib.mean()
### END SOLUTION

In [22]:
assert vs_prop / avg_libvs < 1

## Question 09

We may think that places with wind turbine had a lower vote share in the future because of the wind turbines. However, it could be that the government placed the wind turbines in places where voters disliked the government. Is that true?

To answer that, look the vote share in 2003 at places that were assigned wind turbines in the future versus places that were not assigned wind turbines.

1. Save the results as `mean_diff_vshare_before`.
2. The `propturbine`, `elecyear`, `precinct`, and `perc_lib` variables hold these informations.

In [23]:
mean_diff_vshare_before = ...

In [24]:
### BEGIN SOLUTION
mean_diff_vshare_before = dat.loc[(dat.precinct.isin(dat.loc[dat.propturbine == 1].precinct)) & (dat.elecyear == 2003)].perc_lib.mean() - dat.loc[(dat.precinct.isin(dat.loc[dat.propturbine == 0].precinct)) & (dat.elecyear == 2003)].perc_lib.mean()
### END SOLUTION

In [25]:
assert abs(mean_diff_vshare_before) > 1/29 and abs(mean_diff_vshare_before) < 1/28

## Question 10

True or False:

1. The vote share in places with turbine proposals is smaller than the overall voteshare (0.25pt)
1. There were more turbine projects in 2003 than in 2007 (0.25pt)
1. There are more turbine projects in 2007 than in 2011 (0.25pt)
1. Looking at 2003, vote shares, places with turbine proposals later had an average vote share higher thanplaces without turbine proposals (0.25pt)

Hint: To answer, make `q10_i = True` or `q10_i = False`

In [26]:
q10_1 = ...
q10_2 = ...
q10_3 = ...
q10_4 = ...

In [27]:
### BEGIN SOLUTION
q10_1 = True
q10_2 = False
q10_3 = False
q10_4 = True
### END SOLUTION

In [28]:
assert q10_1 or q10_2

In [29]:
assert q10_3 or q10_4

In [30]:
assert q10_1 == q10_4 or q10_2 == q10_3

In [31]:
assert q10_1 + q10_2 == q10_4 + q10_3

### Great work!