<a href="https://colab.research.google.com/github/xumeiying/Stats-for-Political-Scientists/blob/main/Exercise_1_Election_Markets_and_Ballot_Ordering.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

In [2]:
%load_ext rpy2.ipython

# Exercise 1: Election Markets and Ballot Ordering

## Exercise 1

This problem is based on Rothschild, D. 2009. "[Forecasting Elections:
Comparing Prediction Markets, Polls, and Their
Biases.](http://dx.doi.org/10.1093/poq/nfp082)" *Public Opinion
Quarterly* 73(5): 895--916.

In this problem, we will analyze the Intrade betting market data from
the 2008 presidential election.

The 2008 Intrade data is available as *intrade08.csv*. The variable
names and descriptions of this data set are:

- `day`: date of the trading session
- `statename`: full state name
- `state`: state abbreviation
- `PriceD`: closing price (i.e. predicted vote share of the Democratic nominee) 
- `PriceR`: closing price (i.e. predicted vote share of the Republican nominee) 
- `VolumeD`: Total session trades for the Democratic party
- `VolumeR`: Total session trades for the Republican Party

Each row of the data set represents daily trading information about the contracts for either the Democratic or Republican Party nominee’s victory in a particular state.

The 2008 election results data are available as `pres08.csv`. The variables in the data set are

- `state.name`: full state name
- `state`: state abbreviation
- `Obama`: Obama actual vote share
- `McCain`: McCain actual vote share 
- `EV`: Electoral votes in the states

We analyze the contract of the Democratic Party nominee winning a given state `j`. The data set contains the contract price of the market for each state on each day i leading up to the election. We will interpret the `PriceD` as the probability `pij` that the Democrat would win state j if the election were held on day `i`. To treat `PriceD` as a probability, divide it by 100 so it ranges from 0 to 1. Our research question is: How accurate is this probability?

To assess this, using only the data from the day before Election Day (November 4, 2008) within each state, compute the expected number of electoral votes Obama is predicted to win and compare it with the actual number of electoral votes Obama won.

Briefly interpret the result.

(Note: The actual total number of electoral votes for Obama is 365, not 364, which is the sum of electoral votes for Obama based on the results data. The 365-total includes a single electoral vote that Obama garnered from Nebraska’s 2nd Congressional District. McCain won Nebraska’s four other electoral votes because he won the state overall. You will have to use the merge command in R to merge the datasets.)

In [3]:
%%R
# (uncomment following line as needed for more interactive coding feel, see us with questiions 
#setwd("location/of/intrade-prob.Rmd/file")
## Load prediction market data
intrade08 <- read.csv("/content/intrade08.csv") 
## Load ground truth data containing EV info
pres08 <- read.csv("/content/pres08.csv")
## ...
# expected_total <- 300.3434 # Defining this as so for illustration purposes!

In [4]:
%%R
#look at the dataset; column name different
#head(intrade08,1)
#head(pres08,1)
names(pres08)[names(pres08) == 'state.name'] <- 'statename'

In [5]:
%%R
# merge dataset
library(dplyr)
df = merge(x=intrade08,y=pres08,by="statename")
head(df,3)

Attaching package: ‘dplyr’



    filter, lag



    intersect, setdiff, setequal, union




  statename     X        day
1   Alabama  8772 2007-05-04
2   Alabama  3366 2007-01-18
3   Alabama 20145 2007-12-13
                                                                             MarketD
1 Democratic Party Nominee to win Alabama's Electoral College Votes in 2008 Election
2 Democratic Party Nominee to win Alabama's Electoral College Votes in 2008 Election
3 Democratic Party Nominee to win Alabama's Electoral College Votes in 2008 Election
  PriceD VolumeD
1      6       0
2      8       0
3      6       0
                                                                             MarketR
1 Republican Party Nominee to win Alabama's Electoral College Votes in 2008 Election
2 Republican Party Nominee to win Alabama's Electoral College Votes in 2008 Election
3 Republican Party Nominee to win Alabama's Electoral College Votes in 2008 Election
  PriceR VolumeR state.x state.y Obama McCain EV
1      0       0      AL      AL    39     60  9
2     90       0      AL      AL    39

In [21]:
import numpy as np
import pandas as pd
pd.set_option('display.max_colwidth', 0)
intrade08 = pd.read_csv('/content/intrade08.csv')
pres08 = pd.read_csv("/content/pres08.csv")

In [33]:
intrade08.head(2)

Unnamed: 0.1,Unnamed: 0,day,statename,MarketD,PriceD,VolumeD,MarketR,PriceR,VolumeR,state
0,1,2006-11-12,Alabama,Democratic Party Nominee to win Alabama's Electoral College Votes in 2008 Election,40.0,0,Republican Party Nominee to win Alabama's Electoral College Votes in 2008 Election,40.0,0,AL
1,2,2006-11-12,Alaska,Democratic Party Nominee to win Alaska's Electoral College Votes in 2008 Election,40.0,0,Republican Party Nominee to win Alaska's Electoral College Votes in 2008 Election,40.0,0,AK


In [23]:
pres08.columns = pres08.columns.str.replace('state.name', 'statename')

  """Entry point for launching an IPython kernel.


In [26]:
pres08.head(2)

Unnamed: 0,statename,state,Obama,McCain,EV
0,Alabama,AL,39,60,9
1,Alaska,AK,38,59,3


In [34]:
fulldata=pd.merge(intrade08, pres08, how="outer", on=["statename"])
fulldata.head(3)

Unnamed: 0.1,Unnamed: 0,day,statename,MarketD,PriceD,VolumeD,MarketR,PriceR,VolumeR,state_x,state_y,Obama,McCain,EV
0,1.0,2006-11-12,Alabama,Democratic Party Nominee to win Alabama's Electoral College Votes in 2008 Election,40.0,0.0,Republican Party Nominee to win Alabama's Electoral College Votes in 2008 Election,40.0,0.0,AL,AL,39.0,60.0,9.0
1,51.0,2006-11-13,Alabama,Democratic Party Nominee to win Alabama's Electoral College Votes in 2008 Election,40.0,0.0,Republican Party Nominee to win Alabama's Electoral College Votes in 2008 Election,40.0,0.0,AL,AL,39.0,60.0,9.0
2,102.0,2006-11-14,Alabama,Democratic Party Nominee to win Alabama's Electoral College Votes in 2008 Election,40.0,0.0,Republican Party Nominee to win Alabama's Electoral College Votes in 2008 Election,40.0,0.0,AL,AL,39.0,60.0,9.0
