# Code Sample: LogisticRegression, Grid and Random Search CV

We use breast_cancer data set from sklearn.datasets 

http://scikit-learn.org/stable/modules/generated/sklearn.datasets.load_breast_cancer.html

In [1]:
from sklearn.model_selection import train_test_split
from sklearn.grid_search import GridSearchCV
from sklearn.datasets import *
from sklearn.linear_model import LogisticRegression



In [2]:
data  = load_breast_cancer()

In [3]:
type(data)

sklearn.utils.Bunch

### About Data

Classes	2<br>
Samples per class	212(M),357(B)<br>
Samples total	569<br>
Dimensionality	30<br>
Features	real, positive<br>

### About LogisticRegression
http://scikit-learn.org/stable/modules/generated/sklearn.linear_model.LogisticRegression.html

LogisticRegression(penalty=’l2’, dual=False, tol=0.0001, C=1.0, fit_intercept=True, intercept_scaling=1, class_weight=None, random_state=None, solver=’liblinear’, max_iter=100, multi_class=’ovr’, verbose=0, warm_start=False, n_jobs=1)

<br>

#### Explanation of parameters:<br>

penalty -> is Regularization Term 'L1' or 'L2' Regularization<br>

C -> 1/lambda -> lambda is hyper parameter (check notebook for explanation)<br>

fit_intercept -> Whether to have the intercept term or not i.e our eq is W.T*X without intercept i.e pass through origin, with intercept W.T*X+b<br>

class_weight => When data is imbalanced it performs UpSampling or Down Sampling i.e if DTrain has 10% positive points and 90% negative points then we can give class weights as [9,1] so that data will be balanced its upsampling <br> 

solver => Which algo to be used in case of optimization problem<br>

max_iter -> no.of iterations, The way Optimization problem solves the LR is iterative way.<br>


### GridSearch CV

http://scikit-learn.org/stable/modules/generated/sklearn.model_selection.GridSearchCV.html

GridSearchCV(estimator, param_grid, scoring=None, fit_params=None, n_jobs=1, iid=True, refit=True, cv=None, verbose=0, pre_dispatch=‘2*n_jobs’, error_score=’raise’, return_train_score=’warn’)<br>

#### Explanation of parameters<br>

estimator => is ntg but what is the function on which we are going to perform GridSearch, in our case its LR<br>

param_grid => If we have one Hyper Parameter(lambda) then we can give the values of that parameter in the form of dictionary, if we have 2 params we need to give in terms of list of distionaries.<br>

scoring => accuracy, f1 score etc...<br>

n_jobs => multi core processing

pre_dispatch => no.of jobs get dispatched for parallel execution.
its val should be small(recommended) to avoid memory overflow<br>

cv => by defauld it performs 3 fold cv, if mention number then it performs that many folds cv<br>


refit => if we found best lambda at the end it will refit the data<br>


### RandomSearch CV

http://scikit-learn.org/stable/modules/generated/sklearn.model_selection.RandomizedSearchCV.html

RandomizedSearchCV(estimator, param_distributions, n_iter=10, scoring=None, fit_params=None, n_jobs=1, iid=True, refit=True, cv=None, verbose=0, pre_dispatch=‘2*n_jobs’, random_state=None, error_score=’raise’, return_train_score=’warn’)<br>

##### Every parameter is same as GridSearchCv except second parameter.
<br>
param_distribution=>scipy.stats.distributions, we need to use this type of distributions i.e val can be anything from this distribution<br>

# Code Example

In [4]:
data  = load_breast_cancer()

In [18]:
# Hyper param(lambda) vals
# here 'C' is ntg but 1/(lambda)
tuned_parameters = [{'C':[10**-4,10**-2,10**0,10**2,10**4]}]

X_train, X_test, y_train, y_test = train_test_split(data.data,data.target,train_size=0.9)



In [19]:
#using GridSearchCV
model = GridSearchCV(LogisticRegression(),tuned_parameters,
                     scoring='f1',cv=5)

model.fit(X_train,y_train)

print(model.best_estimator_)
print("Score: ",model.score(X_test,y_test))


LogisticRegression(C=100, class_weight=None, dual=False, fit_intercept=True,
          intercept_scaling=1, max_iter=100, multi_class='ovr', n_jobs=1,
          penalty='l2', random_state=None, solver='liblinear', tol=0.0001,
          verbose=0, warm_start=False)
Score:  0.9647058823529412


## Checking How Sparsity Wors with L1 Regularization

In [20]:
import numpy as np

clf = LogisticRegression(C=0.1, penalty='l1');
clf.fit(X_train,y_train)

w = clf.coef_
w

array([[ 0.4402271 ,  0.        ,  0.26669648, -0.00345869,  0.        ,
         0.        ,  0.        ,  0.        ,  0.        ,  0.        ,
         0.        ,  0.        ,  0.        , -0.05038257,  0.        ,
         0.        ,  0.        ,  0.        ,  0.        ,  0.        ,
         0.46141917, -0.13497397, -0.13210411, -0.02055487,  0.        ,
         0.        ,  0.        ,  0.        ,  0.        ,  0.        ]])

In [21]:
print(np.count_nonzero(w))

8


#### i.e when c=0.1 i.e lambda= 1/0.1 =10 with L1 Regularization we got 8 non zero values => these 8 features are important features

In [30]:
import numpy as np

clf = LogisticRegression(C=0.01, penalty='l1');
clf.fit(X_train,y_train)

w = clf.coef_
w

array([[ 0.        ,  0.        ,  0.14808882,  0.00792713,  0.        ,
         0.        ,  0.        ,  0.        ,  0.        ,  0.        ,
         0.        ,  0.        ,  0.        , -0.01887019,  0.        ,
         0.        ,  0.        ,  0.        ,  0.        ,  0.        ,
         0.        ,  0.        ,  0.        , -0.02188808,  0.        ,
         0.        ,  0.        ,  0.        ,  0.        ,  0.        ]])

In [31]:
print(np.count_nonzero(w))

4


#### i.e when c=0.01 i.e lambda= 1/0.01 =100 with L1 Regularization we got 4 non zero values => these 4 features are important features

### Note: as C val is decreasing or Lambda val is increasing the Sparsity increases, if we increase more Sparsity then model will be underfit