# Logistic Regression Definition
Logistic regression is a statistical method used for binary classification tasks. It models the probability of a binary outcome (1/0, True/False, Yes/No) based on one or more independent variables (features). The logistic regression model uses the logistic function (also known as the sigmoid function) to map a linear combination of features to a value between 0 and 1, which represents the probability of the binary outcome.

The logistic function can be expressed as:

\[
P(Y=1) = \frac{1}{1 + e^{-(\theta^T X)}}
\]

Where:
- \(P(Y=1)\) is the probability of the positive class.
- \(\theta\) is the parameter vector.
- \(X\) is the feature vector.

During training, the model learns the optimal values of \(\theta\) by minimizing a cost function, typically the log loss (cross-entropy), to make accurate predictions. Logistic regression is widely used in various applications, including spam detection, medical diagnosis, and sentiment analysis.


In [210]:
import numpy as np
import pandas as pd
import seaborn as sns
import matplotlib.pyplot as plt
sns.set_theme(context='notebook', style='darkgrid',palette='dark', font_scale=1.2)
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LogisticRegression as LogReg
from sklearn.metrics import classification_report
%matplotlib inline

In [211]:
df=pd.read_csv("./logistic-reg-sampleDataset.csv")
df.head()

Unnamed: 0,Feature1,Feature2,Target
0,2.5,1.2,0
1,3.9,2.8,0
2,1.4,1.3,1
3,4.6,3.5,0
4,3.1,2.4,1


In [212]:
# Extracting features and target-varialbes from the dataset
m=df.shape[0]
X=df.iloc[:,:-1].values
y=df.iloc[:,-1].values.reshape((m,1))
X[:5],y[:5]

(array([[2.5, 1.2],
        [3.9, 2.8],
        [1.4, 1.3],
        [4.6, 3.5],
        [3.1, 2.4]]),
 array([[0],
        [0],
        [1],
        [0],
        [1]], dtype=int64))

In [213]:
# Spliting Data for Training and testing
X_train, X_test,y_train,y_test=train_test_split(X,y,test_size=0.2)
X_train.shape,X_test.shape,y_train.shape,y_test.shape

((40, 2), (10, 2), (40, 1), (10, 1))

In [214]:
class LogisticRegression:
    def __init__(self, learning_rate=0.1, num_iters=100):
        self.learning_rate = learning_rate
        self.num_iters = num_iters

    def fit(self, X, y):
        def updateThetas(X,y):
            y_pred = self.predict(X)
            error = y_pred-y

            # Calculating Gradient
            gradient = (1/m)*(X.T.dot(error))

            # Updating thetas
            self.theta -= self.learning_rate*gradient

        m, n = X.shape
        if y.shape[0] != m:
            raise ValueError(
                f"Training and testing data should have same shape\nX-shape{x.shape} & y-shape{y.shape}")

        # Initializing thetas
        self.theta = np.random.rand(n+1).reshape((n+1, 1))

        # Inserting vector with all 1's for intercept
        X = np.append(np.ones((m, 1)), X, axis=1)

        # storing no of features as model parameter as it would be helpfull
        self.__no_of_features = (n+1)
        for _ in range(self.num_iters):
            updateThetas(X,y)

    def predict(self, X):
        m, n = X.shape
        # Checking if the dataset and training dataset are same or not as we have already appended this vector in training dataset and thus we have saved no of features also
        if n != self.__no_of_features:
            X = np.append(np.ones((m, 1)), X, axis=1)

        sig = np.ones((m, 1))/(1+np.exp(-(X @ self.theta)))
        y_pred = np.zeros((m, 1))
        y_pred[sig >= 0.5] = 1
        return y_pred


In [215]:
# My Models prediction
model=LogisticRegression()
model.fit(X_train,y_train)
y_pred=model.predict(X_test)
y_pred.ravel(),model.theta

(array([0., 1., 1., 1., 1., 1., 0., 1., 1., 1.]),
 array([[ 0.48881796],
        [ 0.09160497],
        [-0.30064862]]))

In [216]:
# sk-learns Model Prediction->
sk_mod=LogReg()
sk_mod.fit(X_train,y_train.ravel())
sk_pred=sk_mod.predict(X_test)
sk_pred

array([0, 0, 0, 0, 1, 1, 0, 1, 1, 1], dtype=int64)

In [217]:
# Classification-reports for both models
my_report=classification_report(y_test,y_pred)
sk_report=classification_report(y_test,sk_pred)

print(f"My Report:\n{my_report}\n")
print(f"sk-learns Report:\n{sk_report}")

My Report:
              precision    recall  f1-score   support

           0       1.00      0.50      0.67         4
           1       0.75      1.00      0.86         6

    accuracy                           0.80        10
   macro avg       0.88      0.75      0.76        10
weighted avg       0.85      0.80      0.78        10


sk-learns Report:
              precision    recall  f1-score   support

           0       0.60      0.75      0.67         4
           1       0.80      0.67      0.73         6

    accuracy                           0.70        10
   macro avg       0.70      0.71      0.70        10
weighted avg       0.72      0.70      0.70        10



`Since this is not a really good-dataset to test logistic regression the values and acurracy fluctuate in every run. :(`