# Activity 1
## Adding regularization to the model

In this activity we will utilize the same logistic regression model from the scikit-learn package. This time however we will add regularization to the model, and search for the optimum regularization parameter, a process often called tuning the hyperparameters.

After training the models we will test the predictions and compare the model evaluation metrics to those produced by the baseline model and the model without regularization.

First, let's load the feature data from the first exercise and the target data from the second activity, the feature data from the second activity can also be used.

In [1]:
import pandas as pd
feats = pd.read_csv('data/bank_data_feats_e3.csv', index_col=0)
target = pd.read_csv('data/bank_data_target_e2.csv', index_col=0)

We will again create a test and train dataset. We will train the data using the training dataset, his time however we will use part of the training dataset for validation, in order to choose the most appropriate hyperparameter.

We will again use a test_size = 0.2 which means that 20% of the data will be reserved for testing. The size of our validation set will be determined by how many validation folds we have, if we do 10-fold cross validation this equates to reserving 10% of the training dataset to validate our model on, each fold will use a different 10% of the training dataset, and the average error across all folds is used to compare models with different hyperparameters.

In [2]:
from sklearn.model_selection import train_test_split
test_size = 0.2
random_state = 13
X_train, X_test, y_train, y_test = train_test_split(feats, target, test_size=test_size, random_state=random_state)

Let's make sure our dimensions are correct

In [3]:
print(f'Shape of X_train: {X_train.shape}')
print(f'Shape of y_train: {y_train.shape}')
print(f'Shape of X_test: {X_test.shape}')
print(f'Shape of y_test: {y_test.shape}')

Shape of X_train: (3616, 32)
Shape of y_train: (3616, 1)
Shape of X_test: (905, 32)
Shape of y_test: (905, 1)


We fit our model first by instantiating it, then by fitting the model to the training data.
We also add in a penalty, denoted by 'l1' and 'l2', our goal is to find the penalty type and penalty value that gives us the best results.

To get a reminder of how to use certain functions we can always use the help function to look at the details

In [4]:
from sklearn.linear_model import LogisticRegressionCV
help(LogisticRegressionCV)

Help on class LogisticRegressionCV in module sklearn.linear_model.logistic:

class LogisticRegressionCV(LogisticRegression, sklearn.base.BaseEstimator, sklearn.linear_model.base.LinearClassifierMixin, sklearn.feature_selection.from_model._LearntSelectorMixin)
 |  Logistic Regression CV (aka logit, MaxEnt) classifier.
 |  
 |  This class implements logistic regression using liblinear, newton-cg, sag
 |  of lbfgs optimizer. The newton-cg, sag and lbfgs solvers support only L2
 |  regularization with primal formulation. The liblinear solver supports both
 |  L1 and L2 regularization, with a dual formulation only for the L2 penalty.
 |  
 |  For the grid of Cs values (that are set by default to be ten values in
 |  a logarithmic scale between 1e-4 and 1e4), the best hyperparameter is
 |  selected by the cross-validator StratifiedKFold, but it can be changed
 |  using the cv parameter. In the case of newton-cg and lbfgs solvers,
 |  we warm start along the path i.e guess the initial coeffic

In [5]:
import numpy as np
from sklearn.linear_model import LogisticRegressionCV
Cs = np.logspace(-2, 6, 9)
model_l1 = LogisticRegressionCV(Cs=Cs, penalty='l1', cv=10, solver='liblinear', random_state=42)
model_l2 = LogisticRegressionCV(Cs=Cs, penalty='l2', cv=10, random_state=42)

model_l1.fit(X_train, y_train['y'])
model_l2.fit(X_train, y_train['y'])

LogisticRegressionCV(Cs=array([1.e-02, 1.e-01, 1.e+00, 1.e+01, 1.e+02, 1.e+03, 1.e+04, 1.e+05,
       1.e+06]),
           class_weight=None, cv=10, dual=False, fit_intercept=True,
           intercept_scaling=1.0, max_iter=100, multi_class='ovr',
           n_jobs=1, penalty='l2', random_state=42, refit=True,
           scoring=None, solver='lbfgs', tol=0.0001, verbose=0)

To test the model performance we will predict the outcome on the test features (X_test), and compare those outcomes to real values (y_test) for each of the models.

With the 'LogisiticRegressionCV' model when the is fit the hyperparameter wthat resulted in the lowest error is automatically chosen for any subsequent predictions.

We can also look at what the hyperparameters are.

In [6]:
print(f'Best hyperparameter for l1 regularization model: {model_l1.C_[0]}')
print(f'Best hyperparameter for l2 regularization model: {model_l2.C_[0]}')

Best hyperparameter for l1 regularization model: 1.0
Best hyperparameter for l2 regularization model: 100000.0


In [7]:
y_pred_l1 = model_l1.predict(X_test)
y_pred_l2 = model_l2.predict(X_test)

We can again compare the accuracy against the true values. Remember that our baseline model (predicting 'no' or False for all) gave us an accuracy of 88.476%, so we want to try and beat that.

In [8]:
from sklearn import metrics
accuracy_l1 = metrics.accuracy_score(y_pred=y_pred_l1, y_true=y_test)
accuracy_l2 = metrics.accuracy_score(y_pred=y_pred_l2, y_true=y_test)
print(f'Accuracy of the model with l1 regularization is {accuracy_l1*100:.4f}%')
print(f'Accuracy of the model with l2 regularization is {accuracy_l2*100:.4f}%')

Accuracy of the model with l1 regularization is 90.0552%
Accuracy of the model with l2 regularization is 89.6133%


They both performed roughly the same, and did not do much better than the model without regularization, which had an overall accuracy of 89.9448%, or even the baseline model

### Other evaluation metrics

Let's test again with the other evaluation metrics

In [9]:
precision_l1, recall_l1, fscore_l1, _ = metrics.precision_recall_fscore_support(y_pred=y_pred_l1, y_true=y_test, average='binary')
precision_l2, recall_l2, fscore_l2, _ = metrics.precision_recall_fscore_support(y_pred=y_pred_l2, y_true=y_test, average='binary')
print(f'l1\nPrecision: {precision_l1:.4f}\nRecall: {recall_l1:.4f}\nfscore: {fscore_l1:.4f}\n\n')
print(f'l2\nPrecision: {precision_l2:.4f}\nRecall: {recall_l2:.4f}\nfscore: {fscore_l2:.4f}')

l1
Precision: 0.6200
Recall: 0.3039
fscore: 0.4079


l2
Precision: 0.5769
Recall: 0.2941
fscore: 0.3896


Here the model with l1 outperforms the l2 model on all the evaluation metrics

### Feature importances
   
Examining the feature importances can show us how the regularization affected the values of the coefficients

In [10]:
coef_list = [f'{feature}: {coef}' for coef, feature in sorted(zip(model_l1.coef_[0], X_train.columns.values.tolist()))]
for item in coef_list:
    print(item)

is_loan: -0.8474572703740183
is_housing: -0.579000838262856
marital_married: -0.44532254073443145
job_blue-collar: -0.39555311928947184
marital_single: -0.17325311462431392
job_unemployed: -0.13411430783658618
job_entrepreneur: -0.10034667623333418
job_services: -0.09134387963624574
campaign: -0.06871503998769422
month: -0.03574802854504004
education_primary: -0.022463329457676342
job_technician: -0.01712422471985195
poutcome_failure: -0.011295531879278774
day: -0.0016886947998909876
education_secondary: 0.0
job_housemaid: 0.0
job_self-employed: 0.0
balance: 5.440788435851004e-06
age: 0.0008144244598940912
duration: 0.0040479163813614384
previous: 0.005406556796155721
job_management: 0.08746056447020656
education_tertiary: 0.1798507542548706
job_admin.: 0.2119734305727548
was_contacted: 0.29490708422824413
job_student: 0.3480117206677412
poutcome_other: 0.36111119058043945
is_default: 0.3973330185343257
job_retired: 0.5771288331964628
contact_telephone: 0.9444897771683772
contact_cellu

l1 regularization tends to send coefficients all the way down to zero, and is useful for reducing the total number of features in a training dataset. Here we can see that some job types have no effect on the overall outcome of this model, as well as the day contacted and the age of the customer. 

Let's take a look at the the model coefficients for the model with l2 reglarization.

In [11]:
coef_list = [f'{feature}: {coef}' for coef, feature in sorted(zip(model_l2.coef_[0], X_train.columns.values.tolist()))]
for item in coef_list:
    print(item)

is_loan: -0.8767703153503535
is_housing: -0.7930806291374497
marital_married: -0.6897874882412887
job_blue-collar: -0.6747849882734318
poutcome_failure: -0.5958898230191121
marital_single: -0.5928029863445086
education_primary: -0.3798399892849926
education_secondary: -0.37640141075644
job_services: -0.3492866847219953
job_unemployed: -0.2707250453950951
education_tertiary: -0.2512553701812126
job_entrepreneur: -0.2304804427179799
job_technician: -0.2267524219316696
job_self-employed: -0.15865824179785096
poutcome_other: -0.09679842143414642
job_management: -0.09232401077565568
job_housemaid: -0.08333763002349377
campaign: -0.0678304115272927
month: -0.04702006770586307
age: -0.01621962629985596
day: -0.004860182187899232
previous: -0.00459151071217967
balance: 5.053822999119466e-06
duration: 0.004007402837601134
job_admin.: 0.04293807187555001
job_student: 0.0486787801752278
is_default: 0.1506809961818302
contact_telephone: 0.5737347195029631
job_retired: 0.6824781839115689
contact_ce

Here we can see that none of the coefficients go right down to zero, this is because the feature coefficients get penalized less when they small, and much greater when the coefficients are larger.

In this activity we have seen how to create models that include regularization. While the regularization added little to model performance in this dataset, regularization is an important technique with which to prevent your models from overfitting to the training dataset.

In the following lesson we will apply many of the same techniques learned in the this lesson, namely creating test and training datasets, performaing cross-validation, and using model evaluation metrics to score our models, however we we apply them to deep learning models in using the Keras library.