## Train a linear SVM

You are working for an online real estate marketplace. As part of their offerings, they include a "walkability" score with each listing that estimates how "walkable" each address is. Until now, this score was manually assigned by a human expert for each address; now, you are going to train a model to predict whether the human expert would have scored the address as "low walkability" (0) or "high walkability" (1).

To train your model, you have a dataset of already-scored properties. Each sample includes the following columns:

- Street address of the property
- Walkability score (assigned by a human expert, 0 or 1)
- Number of parks nearby (within two miles)
- Number of grocery stores nearby (within two miles)
- Number of schools nearby (within two miles)
- Number of public transit lines nearby (within two miles)

In the attached workspace, you will read this data from a file, and split it into training and test sets. Then, you will fit an `SVC` (using the `sklearn` implementation) on the training set, and evaluate its accuracy in predicting the walkability class on the test set.

You'll need to specify this random state in your notebook:

> random_state = 5

The following items will be graded:

| Name | Type | Description |
| ---- | ---- | ---- |
|`Xtr`	|pandas dataframe	|Training data - features used as input to model.|
|`Xts`	|pandas dataframe	|Test data - features used as input to model.|
|`ytr`	|pandas series OR pandas data frame OR 1d numpy array	|Training data - target variable.|
|`yts`	|pandas series OR pandas data frame OR 1d numpy array	|Test data - target variable.|
|`yts_hat`	|1d numpy array	|Model prediction for test data.|
|`acc`	|float	|Accuracy of model on test data.|

In [1]:
import numpy as np
import pandas as pd
from sklearn.model_selection import train_test_split
from sklearn.svm import SVC
from sklearn.metrics import accuracy_score

First, we'll load the dataset:

In [2]:
df = pd.read_csv('data.csv')

You can add some code here to inspect the data, see the names of features, and see the data types - the cell below will not be graded.

In [3]:
df.head()

Unnamed: 0,Address,Walkability_Score,Parks_Nearby,Grocery_Stores_Nearby,Schools_Nearby,Public_Transit_Nearby
0,"939 Maple Ave, Bloomington, TX",0,2,2,1,2
1,"986 Cedar Blvd, Greenville, CA",0,2,2,1,1
2,"795 Cedar Blvd, Madison, NC",1,1,3,3,2
3,"742 Main St, Riverside, OR",0,2,2,2,0
4,"515 Maple Ave, Brooklyn, NJ",0,2,2,1,3


(but, note that your code will be evaluated on *different* data organized in a data frame with the same columns - so in your solution, you should not hard-code anything specific to this data.)

Now we will split into training and test sets, using `train_test_split`! 

* Reserve 20% of the data for testing.
* Use the random state specified on the question page.

The following cell should create 

* `Xtr` and `Xts` as pandas data frames including *only* the features used to train the model, 
* and `ytr` and `yts` as either pandas data series or 1d numpy arrays containing the target variable. 

(For pandas data frames or data series, don't change the names of any columns.)

In [7]:
#grade (write your code in this cell and DO NOT DELETE THIS LINE)
df = df.drop('Address', axis=1)
X = df.drop('Walkability_Score', axis=1)
y = df['Walkability_Score']
random_state = 5
Xtr, Xts, ytr, yts = train_test_split(X, y, test_size=0.2, random_state=random_state)

Now we are ready to fit the `SVC`. Using 

* a linear kernel
* the random state specified in the question page
* and default settings for everything else, 

fit the model on the training data. Then, use it to make predictions for the test samples, and save this prediction in `yts_hat`. Evaluate the accuracy score of the model on the test data, and save this in `acc`.

In [8]:
#grade (write your code in this cell and DO NOT DELETE THIS LINE)
model = SVC(kernel='linear', random_state=random_state)
model.fit(Xtr, ytr)
yts_hat = model.predict(Xts)
acc = accuracy_score(yts, yts_hat)