# Logistic Regression
Logistic Regression is a statistical method used for binary classification that predicts the probability of a binary outcome based on one or more predictor variables. It is a linear model used to estimate the probability that an instance belongs to a particular class.

## Advantages
- Simple to Implement: Easy to understand and implement.
- Efficient: Performs well when the dataset is linearly separable.
- Probabilistic Predictions: Outputs probabilities, which can be useful in many applications.
- Feature Importance: Coefficients can be interpreted to understand the impact of features.

## Disadvantages
- Assumes Linearity: Assumes a linear relationship between the independent variables and the log-odds of the dependent variable.
- Not Suitable for Complex Relationships: Struggles with non-linear relationships unless feature engineering is applied.
- Overfitting: Can overfit with high-dimensional data.

## Use Cases
- Medical Diagnosis: Predicting the probability of a patient having a particular disease.
- Credit Scoring: Assessing the likelihood of a borrower defaulting on a loan.
- Marketing: Predicting whether a customer will purchase a product.
- Fraud Detection: Identifying potentially fraudulent transactions.

## Scaling(necessary)
Logistic Regression models require feature scaling to ensure that all features contribute equally to the model.

## Encoding(necessary) 
Categorical data needs to be encoded into numerical values.

# Import library

In [12]:
import numpy as np
import pandas as pd
from sklearn.model_selection import train_test_split, GridSearchCV, RandomizedSearchCV
from sklearn.linear_model import LogisticRegression
from sklearn.preprocessing import StandardScaler, OneHotEncoder
from sklearn.pipeline import Pipeline
from sklearn.compose import ColumnTransformer
from sklearn.datasets import load_iris
from scipy.stats import uniform

# Read data

In [13]:
df = pd.read_csv('Breast_Cancer.csv')
x = df.drop('diagnosis',axis=1)
y = df['diagnosis']

In [14]:
x_train, x_test, y_train, y_test = train_test_split(x, y, test_size=0.2, random_state=42)

# Scale data

In [15]:
scaler = StandardScaler()
x_train = scaler.fit_transform(x_train)
x_test = scaler.transform(x_test)

# Train

## Grid Search

In [16]:
from sklearn.linear_model import LogisticRegression
from sklearn.model_selection import GridSearchCV

logReg_l1 = LogisticRegression(penalty='l1')
logReg_l2 = LogisticRegression(penalty='l2')
logReg_elasticnet = LogisticRegression(penalty='elasticnet')

params = {
    'C': [0.1, 1, 10, 100],
    'solver': ['newton-cg', 'lbfgs', 'liblinear']
} 

param_grid = {
    'C': [0.001, 0.01, 0.1, 1, 10, 100, 1000],
    'solver': ['newton-cg', 'lbfgs', 'liblinear', 'sag', 'saga'],
    'fit_intercept': [True, False],
    'class_weight': [None, 'balanced']
}


grid_search = GridSearchCV(logReg_l2, params, scoring='accuracy', cv=5, n_jobs=-1)

# Train the grid search
grid_search.fit(x_train, y_train)  

In [17]:
print("Best Hyperparameter Index:", grid_search.best_index_)
print("Best Hyperparameters:", grid_search.best_params_)
print("Best Cross-Validated Score:", grid_search.best_score_)

Best Hyperparameter Index: 2
Best Hyperparameters: {'C': 0.1, 'solver': 'liblinear'}
Best Cross-Validated Score: 0.9802197802197803


In [18]:
# Get the model with best hyperparameters
model = grid_search.best_estimator_
y_pred = model.predict(x_test)

## Randomized Search

In [19]:
from sklearn.linear_model import LogisticRegression
from sklearn.model_selection import RandomizedSearchCV

logReg_l1 = LogisticRegression(penalty='l1')
logReg_l2 = LogisticRegression(penalty='l2')
logReg_elasticnet = LogisticRegression(penalty='elasticnet')

params = {
    'C': uniform(loc=0, scale=4),
    'solver': ['newton-cg', 'lbfgs', 'liblinear']
}

param_dist = {
    'C': uniform(loc=0, scale=10),
    'solver': ['newton-cg', 'lbfgs', 'liblinear', 'sag', 'saga'],
    'fit_intercept': [True, False],
    'class_weight': [None, 'balanced']
}


random_search = RandomizedSearchCV(logReg_l2, params, scoring='accuracy', n_iter=10, cv=5, n_jobs=-1, random_state=42)

# Train the random search
random_search.fit(x_train, y_train)

In [22]:
print("Best Hyperparameter Index:", random_search.best_index_)
print("Best Hyperparameters:", random_search.best_params_)
print("Best Cross-Validated Score:", random_search.best_score_)

Best Hyperparameter Index: 0
Best Hyperparameters: {'C': 1.49816047538945, 'solver': 'newton-cg'}
Best Cross-Validated Score: 0.9780219780219781


In [23]:
model = random_search.best_estimator_
y_pred = model.predict(x_test)

## Train LogisticRegression without search

In [21]:
from sklearn.linear_model import LogisticRegression
model=LogisticRegression(C=0.1, solver='liblinear')
model.fit(x_train, y_train)