# Logistic Regression
There are two options to train a logistic regression classifier in sklearn :
1. by using sklearn.linear_model.SGDClassifier(loss='log_loss')
   - faster, flexible and ideal option for large datasets
   - relatively more complicated
2. by using sklearn.linear_model.LogisticRegression()
   - simpler, easier wrapper 
   - low flexibility, less control
  
There are two approaches to implement a model in sklearn:
- Conventional approach
- Pipeline

## Dependencies

In [3]:
import numpy as np
from sklearn.linear_model import LogisticRegression
from sklearn.preprocessing import StandardScaler, MinMaxScaler
from sklearn.datasets import make_classification
from sklearn.model_selection import train_test_split
from sklearn.pipeline import Pipeline


## Data

### Generate data

In [6]:
X, y = make_classification(n_samples=500, n_features=5, random_state=42)

print(X.shape)
print(f"Range: [{np.min(X)}, {np.max(X)}]")
print(np.unique(y))

(500, 5)
Range: [-4.887584279073824, 3.778559708620231]
[0 1]


### Split data

In [8]:
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=0)

print(X_train.shape)
print(X_test.shape)

(400, 5)
(100, 5)


### Scale data

In [10]:
# define a scaler
sc = MinMaxScaler()

X_train_scaled = sc.fit_transform(X_train)

X_test_scaled = sc.transform(X_test)

## Model

In [12]:
# define the logistic regression model
logreg = LogisticRegression()

# train the model
logreg.fit(X_train_scaled, y_train)

# evaluate the model
print(f"Training accuracy: {logreg.score(X_train_scaled, y_train):.4f}")
print(f"Test accuracy: {logreg.score(X_test_scaled, y_test):.4f}")

Training accuracy: 0.8975
Test accuracy: 0.8600


## Pipeline

### Pipeline definition

In [15]:
pipe = Pipeline([('scaler', StandardScaler()), ('clf', LogisticRegression())])
# The pipeline can be used as any other estimator
# and avoids leaking the test set into the train set

### Training

In [17]:
pipe.fit(X_train, y_train)


### Evaluation

In [19]:
print(f"Training accuracy: {pipe.score(X_train, y_train):.4f}")
print(f"Test accuracy: {pipe.score(X_test, y_test):.4f}")

Training accuracy: 0.8975
Test accuracy: 0.8600
