# DS Fellows Project: A 2016 Election Analysis

# By: Leron Reznikov

In this lab, we will be analyzing a dataset of polling data from the 2016 Presidential Election. If you follow politics, you may know that many polls strongly underestimated Donald Trump's performance in several key states, causing him to win the election despite many models showing it would be a Clinton victory. We will try to investigate and see if these errors are simple polling errors, or whether there were more serious miscalculations.

This data was taken from FiveThirtyEight, a respected organization that focuses on statistical and social analysis to predict election outcomes. You can find the raw source of the data, as well as more information about the dataset, [here] (https://www.kaggle.com/fivethirtyeight/2016-election-polls).

Let's import the required libraries and load our dataset into a variable called original_data.

In [None]:
from datascience import *
import numpy as np
import matplotlib.pyplot as plt
%matplotlib inline


```
BEGIN QUESTION
name: q0
manual: false
```


In [None]:
original_data = Table.read_table('presidential_polls.csv') # SOLUTION

Looking at the columns, there are many different features of this data. Luckily, we are only interested in a few of them:
* forecastdate: The date the forecast was uploaded to FiveThirtyEight
* adjpoll_clinton / adjpoll_trump: the adjusted/calculated percentage of people voting for each candidate. Since each poll was random, it likely oversurveyed people of different demographics - for example, 80% of people asked could have been members of Party A, when it is known that the true population proporion is 50%. Adjusting the average allows the bias to be removed and gives a more accurate prediction.
* state: the state where the entry was polled
* grade: The grade that the pollster has on FiveThirtyEight, ranging from A to F with +/-

In [None]:
original_data.labels

## Question 1 - A General Focus on Wisconsin



### 1.1 - Load the data
Let's spend some time looking at data from Wisconsin - one of the more surprising results of the election. Let's load all polls that take place in Wisconsin into a variable called wisconsin.

```
BEGIN QUESTION
name: q1.1
manual: false
```


In [None]:
wisconsin = original_data.where('state', 'Wisconsin') # SOLUTION

wisconsin



### 1.2 - Plotting polling averages

There are clearly many polls that take place in Wisconsin - luckily, they are in order chronologically, so we don't have to worry about sorting them. However, we should probably visualize the results in a more meaningful way. Plot the adjpoll_clinton and adjpoll_trump columns and comment on what you notice.

```
BEGIN QUESTION
name: q1.2
manual: true
```


In [None]:
wisconsin.plot(select=['adjpoll_clinton', 'adjpoll_trump']) # SOLUTION NO PROMPT


### 1.3 - Array of difference between Clinton, Trump

To further understand this relationship, create an array that contains the difference between the percent that voted for Clinton and those that voted for Trump. Then, run the cell below to plot this relationship.

```
BEGIN QUESTION
name: q1.3
manual: false
```


In [None]:
adj_diff_wisconsin = wisconsin['adjpoll_clinton'] - wisconsin['adjpoll_trump'] # SOLUTION


In [None]:
plt.title('Difference between Clinton Vote Percentage and Trump Vote Percentage')
plt.plot(adj_diff_wisconsin)

### 1.4

Based on the above analysis, who do you expect to win the election? How confident are you in your analysis?

```
BEGIN QUESTION
name: q1.4
manual: true
```


**SOLUTION**

### Question 2.1 - A Look at high ranking pollsters

Trump won Wisconsin by just under 1% in 2016. This may be surprising! FiveThirtyEight also ranks pollsters, based on factors such as historical reliability ad inherent bias. Let's work on selecting only the A+, A, and A- rated pollsters, to see if they had a better take on the result. Note: you will need to use the [datascience.are.contained_in](http://data8.org/datascience/predicates.html) module. Then, plot the adjusted averages again.

```
BEGIN QUESTION
name: q2.1
manual: true
```


In [None]:
high_rankings = wisconsin.where('grade', are.contained_in(['A+', 'A', 'A-'])) # SOLUTION

high_rankings.plot(select=['adjpoll_clinton', 'adjpoll_trump']) # SOLUTION

### Question 2.2

Do you notice a significant difference depending on the rankings? Did they do a better job at predicting the true outcome of the election?

```
BEGIN QUESTION
name: q2.2
manual: true
```


**SOLUTION**


# Question 3: Open Ended Analysis

1. Pick one of the following states: Pennsylvania, Ohio, Florida, Michigan
2. Perform an analysis, similar to what we did for Wisconsin, on the state. Then, look up and mark the actual result of the state.
3. Answer the following questions in some depth (feel free to use the resources below as a starting point for your research): 
  - What are some possible explanations for the discrepencies between the polls and the results of the election?
  - To what extent should we be trusting the polls' predictions?

```
BEGIN QUESTION
name: q3
manual: true
```


In [None]:
# Code here - feel free to add as many cells as you need

# Interested in learning more?

The subject of elections and polling has only gotten more relevant since 2016. Here are some resources to help you dve deeper into the topic!

[Why 2016 Election Polls Missed Their Mark] (https://www.pewresearch.org/fact-tank/2016/11/09/why-2016-election-polls-missed-their-mark/): a brief but informative article from Pew Research

FiveThirtyEight issued a [rebuttal] (https://fivethirtyeight.com/features/the-polls-are-all-right/) defnding their record, including the 2016 election. In fact, their predictions in 2018 would go on to be extremely accurate. Feel free to follow the hyperlinks throughout the article, as many lead to interesting articles and papers.


Finally, if you're interested in a more academic approach, see [this paper] (https://eprints.soton.ac.uk/413658/1/JenningsWlezienPollingErrors.pdf), which conducted an analysis of thousands of polls in the last 60+ years and came to some interesting conclusions.
