<img src="http://imgur.com/1ZcRyrc.png" style="float: left; margin: 20px; height: 55px">

# Practice Gridsearch and Multinomial Models with SF Crime Data

_Authors: Joseph Nelson (DC), Sam Stack (DC)_

---

### Multinomial logistic regression models

So far, we have been using logistic regression for binary problems where there are only two class labels. Logistic regression can be extended to dependent variables with multiple classes.

There are two ways sklearn solves multiple-class problems with logistic regression: a multinomial loss or a "one vs. rest" (OvR) process where a model is fit for each target class vs. all the other classes. 

**Multinomial vs. OvR**
- (both) 'k' classes
- (M) 'k-1' models with 1 reference category
- (OvR) 'k*(k-1)/2' models

You will use the gridsearch in conjunction with multinomial logistic to optimize a model that predicts the category (type) of crime based on various features captured by San Francisco police departments.

**Necessary lab imports**

In [44]:
import numpy as np
import pandas as pd
import patsy

from sklearn.metrics import classification_report, confusion_matrix, accuracy_score
from sklearn.model_selection import train_test_split, GridSearchCV
from sklearn.linear_model import LogisticRegression, LogisticRegressionCV


import seaborn as sns

import matplotlib.pyplot as plt
plt.style.use('fivethirtyeight')

%matplotlib inline
%config InlineBackend.figure_format = 'retina'

### 1. Read in the data

In [45]:
crime_csv = './datasets/sf_crime_train.csv'

In [46]:
# A:
crime = pd.read_csv(crime_csv)

### 2. Create column for hour, month, and year from 'Dates' column.

> *Hint: `pd.to_datetime` may or may not be helpful.*


In [47]:
# A:
crime.Dates = pd.to_datetime(crime.Dates)

In [48]:
crime['Year'] = [i.year for i in crime.Dates]
crime['Month'] = [i.month for i in crime.Dates]
crime['Hour'] = [i.hour for i in crime.Dates]

In [49]:
crime.head()

Unnamed: 0,Dates,Category,Descript,DayOfWeek,PdDistrict,Resolution,Address,X,Y,Year,Month,Hour
0,2015-05-13 23:53:00,WARRANTS,WARRANT ARREST,Wednesday,NORTHERN,"ARREST, BOOKED",OAK ST / LAGUNA ST,-122.425892,37.774599,2015,5,23
1,2015-05-13 23:53:00,OTHER OFFENSES,TRAFFIC VIOLATION ARREST,Wednesday,NORTHERN,"ARREST, BOOKED",OAK ST / LAGUNA ST,-122.425892,37.774599,2015,5,23
2,2015-05-13 23:33:00,OTHER OFFENSES,TRAFFIC VIOLATION ARREST,Wednesday,NORTHERN,"ARREST, BOOKED",VANNESS AV / GREENWICH ST,-122.424363,37.800414,2015,5,23
3,2015-05-13 23:30:00,LARCENY/THEFT,GRAND THEFT FROM LOCKED AUTO,Wednesday,NORTHERN,NONE,1500 Block of LOMBARD ST,-122.426995,37.800873,2015,5,23
4,2015-05-13 23:30:00,LARCENY/THEFT,GRAND THEFT FROM LOCKED AUTO,Wednesday,PARK,NONE,100 Block of BRODERICK ST,-122.438738,37.771541,2015,5,23


### 3. Validate and clean the data.

In [50]:
# A:
crime.columns  = [i.lower() for i in crime.columns]

In [51]:
for i in ['category','descript','dayofweek','pddistrict']:
    crime[i] = crime[i].map(lambda x: x.lower())

In [52]:
crime.head()

Unnamed: 0,dates,category,descript,dayofweek,pddistrict,resolution,address,x,y,year,month,hour
0,2015-05-13 23:53:00,warrants,warrant arrest,wednesday,northern,"ARREST, BOOKED",OAK ST / LAGUNA ST,-122.425892,37.774599,2015,5,23
1,2015-05-13 23:53:00,other offenses,traffic violation arrest,wednesday,northern,"ARREST, BOOKED",OAK ST / LAGUNA ST,-122.425892,37.774599,2015,5,23
2,2015-05-13 23:33:00,other offenses,traffic violation arrest,wednesday,northern,"ARREST, BOOKED",VANNESS AV / GREENWICH ST,-122.424363,37.800414,2015,5,23
3,2015-05-13 23:30:00,larceny/theft,grand theft from locked auto,wednesday,northern,NONE,1500 Block of LOMBARD ST,-122.426995,37.800873,2015,5,23
4,2015-05-13 23:30:00,larceny/theft,grand theft from locked auto,wednesday,park,NONE,100 Block of BRODERICK ST,-122.438738,37.771541,2015,5,23


### 4. Set up a target and predictor matrix for predicting violent crime vs. non-violent crime vs. non-crimes.

**Non-Violent Crimes:**
- bad checks
- bribery
- drug/narcotic
- drunkenness
- embezzlement
- forgery/counterfeiting
- fraud
- gambling
- liquor
- loitering 
- trespass

**Non-Crimes:**
- non-criminal
- runaway
- secondary codes
- suspicious occ
- warrants

**Violent Crimes:**
- everything else



**What type of model do you need here? What should your "baseline" category be?**

In [53]:
# A:
nonviolent_crime = ['bad checks', 'bribery', 'drug/nacrotics', 'drunkenness','embezzlement','forgery/counterfeiting','fraud','gambling','liquor','loitering','trepass']
non_crime = ['non-criminal','runaway','secondary codes','suspicious occ','warrants']
violent_crime = [i for i in crime.category if i not in (nonviolent_crime + non_crime)]
crime['category_class'] = [0 if i in nonviolent_crime else 1 if i in non_crime else 2 for i in crime.category]

In [54]:
crime_days = pd.get_dummies(crime.dayofweek, drop_first = True)
crime_descript = pd.get_dummies(crime.descript, drop_first = True)
crime_pddistrict = pd.get_dummies(crime.pddistrict, drop_first = True)
crime = pd.concat([crime, crime_days, crime_descript, crime_pddistrict], axis = 1)
crime.drop('dates', inplace = True, axis = 1)
crime.drop('address', inplace = True, axis = 1)
crime.drop('resolution', inplace = True, axis = 1)
crime.drop('dayofweek', inplace = True, axis = 1)
crime.drop('descript', inplace = True, axis = 1)
crime.drop('pddistrict', inplace = True, axis = 1)
crime.drop('category', inplace = True, axis = 1)

In [55]:
baseline = float(np.max(crime.category_class.value_counts()))/np.sum(crime.category_class.value_counts())
baseline

0.7552222222222222

In [56]:
crime.head()

Unnamed: 0,x,y,year,month,hour,category_class,monday,saturday,sunday,thursday,...,willful cruelty to child,central,ingleside,mission,northern,park,richmond,southern,taraval,tenderloin
0,-122.425892,37.774599,2015,5,23,1,0,0,0,0,...,0,0,0,0,1,0,0,0,0,0
1,-122.425892,37.774599,2015,5,23,2,0,0,0,0,...,0,0,0,0,1,0,0,0,0,0
2,-122.424363,37.800414,2015,5,23,2,0,0,0,0,...,0,0,0,0,1,0,0,0,0,0
3,-122.426995,37.800873,2015,5,23,2,0,0,0,0,...,0,0,0,0,1,0,0,0,0,0
4,-122.438738,37.771541,2015,5,23,2,0,0,0,0,...,0,0,0,0,0,1,0,0,0,0


### 5. Standardize the predictor matrix

In [57]:
from sklearn.preprocessing import StandardScaler

In [58]:
# A:
X = crime[[i for i in crime.columns if i not in ['non_violent_crime', 'violent_crime']]]
ss = StandardScaler()
Xs = ss.fit_transform(X)
y = crime.category_class

In [70]:
Xs

array([[-0.08490254,  0.25145919,  0.        , ..., -0.47266042,
        -0.29332298, -0.28211656],
       [-0.08490254,  0.25145919,  0.        , ..., -0.47266042,
        -0.29332298, -0.28211656],
       [-0.0272831 ,  1.30992073,  0.        , ..., -0.47266042,
        -0.29332298, -0.28211656],
       ..., 
       [-0.5449949 ,  0.49911383,  0.        , ..., -0.47266042,
        -0.29332298, -0.28211656],
       [-0.0335324 ,  0.1383847 ,  0.        , ..., -0.47266042,
        -0.29332298, -0.28211656],
       [ 0.13694756, -0.92165168,  0.        , ..., -0.47266042,
        -0.29332298, -0.28211656]])

#### 6. Find the optimal hyperparameters (optimal regularization) to predict your crime categories.

> **Note:** Gridsearching can be done with `GridSearchCV` or `LogisticRegressionCV`. They operate differently - the gridsearch object is more general and can be applied to any model. The `LogisticRegressionCV` is specific to tuning the logistic regression hyperparameters. I recommend the logistic regression one, but the downside is that lasso and ridge must be searched separately.

**Reference for logistic regression regularization hyperparameters:**
- `solver`: algorithm used for optimization (relevant for multiclass)
    - Newton-cg - Handles Multinomial Loss, L2 only
    - Sag - Handles Multinomial Loss, Large Datasets, L2 Only, Works best on sclaed data
    - lbfgs - Handles Multinomial Loss, L2 Only
    - Liblinear - Small Datasets, no Warm Starts
- `Cs`: Regularization strengths (smaller values are stronger penalties)
- `cv`: vross-validations or number of folds
- `penalty`: `'l1'` - LASSO, `'l2'` - Ridge 

In [59]:
# Example:
# fit model with five folds and lasso regularization
# use Cs=15 to test a grid of 15 distinct parameters
# remember: Cs describes the inverse of regularization strength

# logreg_cv = LogisticRegressionCV(solver='liblinear', 
#                                  Cs=[1,5,10], 
#                                  cv=5, penalty='l1')

**Split data into training and testing with 50% in testing.**

In [60]:
# A:
from sklearn.model_selection import train_test_split
from sklearn.model_selection import GridSearchCV
from sklearn.linear_model import LogisticRegression

X_train, X_test, y_train, y_test = train_test_split(Xs, y, test_size = 0.5, random_state = 1)

**Gridsearch hyperparameters for the training data.**

In [82]:
lr_cv = LogisticRegression()

In [85]:
# A:
gs_para = {
    'solver': ['liblinear'],
    'penalty': ['l1'],
    'C': np.logspace(-5,0,100)}

lr_gs = GridSearchCV(lr_cv, gs_para, cv = 5, verbose = 1, n_jobs = -1)

In [86]:
lr_gs.fit(X_train, y_train)

Fitting 5 folds for each of 100 candidates, totalling 500 fits


[Parallel(n_jobs=-1)]: Done  42 tasks      | elapsed:    5.9s
[Parallel(n_jobs=-1)]: Done 192 tasks      | elapsed:   22.6s
[Parallel(n_jobs=-1)]: Done 442 tasks      | elapsed:  1.6min
[Parallel(n_jobs=-1)]: Done 500 out of 500 | elapsed:  3.1min finished


GridSearchCV(cv=5, error_score='raise',
       estimator=LogisticRegression(C=1.0, class_weight=None, dual=False, fit_intercept=True,
          intercept_scaling=1, max_iter=100, multi_class='ovr', n_jobs=1,
          penalty='l2', random_state=None, solver='liblinear', tol=0.0001,
          verbose=0, warm_start=False),
       fit_params={}, iid=True, n_jobs=-1,
       param_grid={'penalty': ['l1'], 'C': array([  1.00000e-05,   1.12332e-05, ...,   8.90215e-01,   1.00000e+00]), 'solver': ['liblinear']},
       pre_dispatch='2*n_jobs', refit=True, return_train_score=True,
       scoring=None, verbose=1)

**Find the best parameters for each target class.**

In [None]:
# A:

**Build three logisitic regression models using the best parameters for each target class.**

In [None]:
# A:

### 7. Build confusion matrices for the models above
- Use the holdout test data from the train-test split

In [None]:
# A:

### 8. Print classification reports for your three models.

In [None]:
# A:

**Describe the metrics in the classification report.**

In [None]:
# A: