## Exercise

- In this exercise, we will work on a classification task of Brexit referendum vote
- The data is originally from British Election Study Online Panel
  - codebook: https://www.britishelectionstudy.com/wp-content/uploads/2020/05/Bes_wave19Documentation_V2.pdf
- The outcome is `LeaveVote` (1: Leave, 0: otherwise)
- The input we use are coming from the following article:
  - Hobolt, Sara (2016) The Brexit vote: a divided nation, a divided continent. _Journal of European Public Policy_, 23 (9) (https://doi.org/10.1080/13501763.2016.1225785)

In [None]:
!wget https://www.dropbox.com/s/up1zpkozgscaty1/brexit_bes_sampled_data.csv

## Import packages

In [None]:
import pandas as pd
import seaborn as sns
import matplotlib.pyplot as plt
import numpy as np

## Load data

In [None]:
df_bes = pd.read_csv("brexit_bes_sampled_data.csv")

# Model

- There are four models in the article. We will use the idenity model (Model 2 in Table 2)
- List of input variables:
  gender, age, edlevel, hhincome, EuropeanIdentity, EnglishIdentity, BritishIdentity

In [None]:
df_bes_sub = df_bes[['gender', 'age', 'edlevel', 'hhincome', 'EuropeanIdentity', 'EnglishIdentity', 'BritishIdentity', 'LeaveVote']]

# Train-test split

In [None]:
from sklearn.model_selection import train_test_split

# Data wrangling

In [None]:
from sklearn.preprocessing import StandardScaler
st_scaler = StandardScaler()

In [None]:
X_train = st_scaler.fit_transform(X_train)
X_test = st_scaler.transform(X_test)

In [None]:
X_test[:3]

## Fit logistic model

In [None]:
from sklearn.linear_model import LogisticRegression

In [None]:
from sklearn.metrics import classification_report, confusion_matrix

## KNN classifier

In [None]:
from sklearn.neighbors import KNeighborsClassifier

### Parameter tuning for KNN



In [None]:
from sklearn.model_selection import GridSearchCV
from sklearn.metrics import f1_score, make_scorer
f1 = make_scorer(f1_score, average = 'binary', pos_label = 1)

### Final model

## Support Vector Classifier

- We try SVC here
- This is non-linear, parametric classifier
- Much more flexible than Logistic regression
- Fore more information, see Gareth et al, Chapter 9



In [None]:
from sklearn.svm import SVC
svcmod = SVC(gamma='auto')