In [3]:
import numpy as np
import pandas as pd

from sklearn.model_selection import train_test_split
from sklearn.linear_model import LogisticRegression

from sklearn.model_selection import GridSearchCV


## 0. Load Data

Here, I'll use the data preprocessed using the "Largest bouding cricle" method. 

In [4]:
X = np.load("../data/preproccessed/circle/X_train.npy")
y = np.load("../data/preproccessed/circle/y_train.npy")

Right now the picutures are stored as matrices - we can unroll them to vectors to be more friendly with logistic regression

In [9]:
X = X.reshape(-1,28*28)

In [12]:
y = y.reshape(-1,)

In [10]:
X.shape

(50000, 784)

In [13]:
y.shape

(50000,)

Split into training and validation splits:

In [14]:
X_train, X_valid, y_train, y_valid = train_test_split(
    X, y, test_size=0.2, random_state=1)

## 1. Creating a  hyperparameter parameter grid to search through

The most important hyperparameter with regularized logistic regression is the regularization coefficient, `C`. I'll also try using l1 and l2 loss

In [17]:
grid = {"penalty" : ["l1", "l2"],
        "C" : [.01 * 3**i for i in range(8)]}

## 2. Grid search through hyperparameters

In [18]:
clf = LogisticRegression(random_state=1)

In [20]:
gs = GridSearchCV(clf, grid, n_jobs=-1, verbose=3)

In [None]:
gs.fit(X_train, y_train)

Fitting 3 folds for each of 16 candidates, totalling 48 fits
[CV] C=0.01, penalty=l1 ..............................................
[CV] C=0.01, penalty=l1 ..............................................
[CV] C=0.01, penalty=l1 ..............................................
[CV] C=0.01, penalty=l2 ..............................................
[CV] ..... C=0.01, penalty=l1, score=0.7302444144549408, total= 2.4min
[CV] C=0.01, penalty=l2 ..............................................
[CV] ..... C=0.01, penalty=l1, score=0.7249062265566392, total= 2.5min
[CV] C=0.01, penalty=l2 ..............................................
[CV] ..... C=0.01, penalty=l1, score=0.7368736873687368, total= 2.5min
[CV] C=0.03, penalty=l1 ..............................................
[CV] ..... C=0.03, penalty=l1, score=0.7303943619733093, total= 3.8min
[CV] C=0.03, penalty=l1 ..............................................
