# Logistic Regression II Code Appendix

## Logistic Regression with Different Hyperparameters

When building a logistic regression model, there is a set of hyperparameters that can improve the performance of the model.  Here is a list of them with a brief description of each one.

Hyperparameters:

- penalty: Apply different regularization to the model. Note that regularization is applied by default ('l2').
- tol: Stopping criteria for gradient descent.
- C: Inverse of regularization strength, with default = 1.0. The smaller values specify stronger regularization.
- solver: Algorithm to use in the optimization problem, with default = 'lbfgs'.
- max_iter: Maximum number of iterations taken for the solvers to converge, with default = 100.

Resource: https://scikit-learn.org/stable/modules/generated/sklearn.linear_model.LogisticRegression.html

Python Code:
    
``` Python
# Import dependencies
from sklearn.linear_model import LogisticRegression
from sklearn.metrics import log_loss, confusion_matrix, accuracy_score

# Setting hyperparameters for the model
penalty = 'l1'         # regularization penalty
tolerance = 1e-2       # early stopping criteria
strength = 10          # regularization strength
solver = 'liblinear'   # algorithm to use in the optimization problem
max_iter = 100         # max number of iterations
multi_class = 'ovr'    # for solving binary class problem

# Creating the classifier object
clf = LogisticRegression(penalty=penalty,
                         tol=tolerance,
                         C=strength,
                         solver=solver,
                         max_iter=max_iter,
                         multi_class=multi_class)

# Fitting the data to the model
clf.fit(X_train, y_train)

# Make prediction of the class based on predicted probability
prob_hat = clf.predict_proba(X_train)
y_hat = [int(i[1] > 0.6) for i in prob_hat]

# Log loss from the model prediction
log_loss(y_train, prob_hat)

# Accuracy score 
accuracy_score(y_train, y_hat)

# Confusion matrix 
confusion_matrix(y_train, y_hat)
```

## Evaluate the Model with Cross-Validation

Resource: https://scikit-learn.org/stable/modules/generated/sklearn.model_selection.cross_validate.html

Python Code:

``` Python
# Import dependencies
from sklearn.model_selection import cross_validate
cv_results = cross_validate(estimator=clf,
                           X=X_train,
                           y=y_train,
                           cv=5,
                           scoring=['accuracy'],
                           return_train_score=True)

# Extract train accuracy from cv_results
cv_results['train_accuracy']

# Extract test accuracy from cv_results
cv_results['test_score']
```

## Logistic Regression Model for Multi-Class Problem

Python Code:

``` Python
# Import dependencies
from sklearn.linear_model import LogisticRegression

# Multi-class classifier
clf = LogisticRegression(solver='saga',
                         multi_class='multinomial')

# Fitting the data into the model
clf.fit(X_train, y_train)
```

Note: When solving for multi-class problems with logistic regression, we need to make sure to select the correct solver for the problem. Some solvers are better at handling multinomial loss than others.