# **Logistic regression** is a supervised machine learning algorithm mainly used for classification tasks where the goal is to predict the probability that an instance of belonging to a given class or not.


*   It is a powerful tool for decision-making.
*   It is a kind of statistical algorithm, which analyze the relationship between a set of independent variables and the dependent binary variables.
* It’s referred to as regression because it takes the output of the linear regression function as input and uses a sigmoid function to estimate the probability for the given class.
* For example email spam or not.


*Note:* The difference between linear regression and logistic regression is that linear regression output is the continuous value that can be anything while logistic regression predicts the probability that an instance belongs to a given class or not.

**Type of Logistic Regression:**

On the basis of the categories, Logistic Regression can be classified into three types:

*Binomial:* In binomial Logistic regression, there can be only two possible types of the dependent variables, such as 0 or 1, Pass or Fail, etc.

*Multinomial:* In multinomial Logistic regression, there can be 3 or more possible unordered types of the dependent variable, such as “cat”, “dogs”, or “sheep”

*Ordinal:* In ordinal Logistic regression, there can be 3 or more possible ordered types of dependent variables, such as “low”, “Medium”, or “High”.

**Logistic Function (Sigmoid Function):**


*   The sigmoid function is a mathematical function used to map the predicted values to probabilities.
*   It maps any real value into another value within a range of 0 and 1. o The value of the logistic regression must be between 0 and 1, which cannot go beyond this limit, so it forms a curve like the “S” form.
* The S-form curve is called the Sigmoid function or the logistic function.
In logistic regression, we use the concept of the threshold value, which defines the probability of either 0 or 1. Such as values above the threshold value tends to 1, and a value below the threshold values tends to 0.


**Use cases of logistic regression**

Logistic regression is commonly used for prediction and classification problems. Some of these use cases include:

*Fraud detection:* Logistic regression models can help teams identify data anomalies, which are predictive of fraud. Certain behaviors or characteristics may have a higher association with fraudulent activities, which is particularly helpful to banking and other financial institutions in protecting their clients. SaaS-based companies have also started to adopt these practices to eliminate fake user accounts from their datasets when conducting data analysis around business performance.

*Disease prediction:* In medicine, this analytics approach can be used to predict the likelihood of disease or illness for a given population. Healthcare organizations can set up preventative care for individuals that show higher propensity for specific illnesses.

*Churn prediction:* Specific behaviors may be indicative of churn in different functions of an organization. For example, human resources and management teams may want to know if there are high performers within the company who are at risk of leaving the organization; this type of insight can prompt conversations to understand problem areas within the company, such as culture or compensation. Alternatively, the sales organization may want to learn which of their clients are at risk of taking their business elsewhere. This can prompt teams to set up a retention strategy to avoid lost revenue.

In [3]:
#import lib
import numpy as np
import pandas as pd
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler
from sklearn.linear_model import LogisticRegression
from sklearn.metrics import confusion_matrix, classification_report

#import data
from sklearn.datasets import load_breast_cancer
data = load_breast_cancer()
x , y = data.data, data.target

#split data
X_train, X_test, y_train, y_test = train_test_split(x, y, test_size = 0.2, random_state= 42)

#Standarize features
scaler = StandardScaler()
X_train = scaler.fit_transform(X_train)
X_test = scaler.transform(X_test)

#model
model = LogisticRegression(random_state = 42)
model.fit(X_train, y_train)

#prediction
y_pred =model.predict(X_test)

#Evaluate
conf_matrix = confusion_matrix(y_test, y_pred)
class_report = classification_report(y_test, y_pred)

#Display results
print("Confusion Matrix:\n", conf_matrix)
print("Clasification Report:\n", class_report)

Confusion Matrix:
 [[41  2]
 [ 1 70]]
Clasification Report:
               precision    recall  f1-score   support

           0       0.98      0.95      0.96        43
           1       0.97      0.99      0.98        71

    accuracy                           0.97       114
   macro avg       0.97      0.97      0.97       114
weighted avg       0.97      0.97      0.97       114

