# Logistic Regression

In [None]:
from sklearn.datasets import make_classification
from sklearn.linear_model import LogisticRegression
from sklearn.model_selection import train_test_split
from sklearn.metrics import accuracy_score, classification_report, confusion_matrix
import matplotlib.pyplot as plt
import seaborn as sns

## Logistic Regression

**Logistic Regression** is a statistical and machine learning method used for **binary classification**.  
It models the probability that an observation belongs to a particular class using the **logistic (sigmoid) function**.

### 1. Hypothesis (Sigmoid function)

The model predicts the probability that the target variable $y$ equals 1 given the input vector $x$:

$$
P(y = 1 \mid x) = \frac{1}{1 + e^{-(w_0 + w_1x_1 + w_2x_2 + \dots + w_nx_n)}}
$$

where  

* $w_0$ — bias (intercept),  
* $w_i$ — coefficient for feature $x_i$,  
* $e$ — base of the natural logarithm.

The expression inside the exponent is the **linear combination**:

$$
z = w_0 + \sum_{i=1}^{n} w_i x_i
$$

and the logistic (sigmoid) transformation maps it to a probability range $(0, 1)$:

$$
\sigma(z) = \frac{1}{1 + e^{-z}}
$$

### 2. Decision rule

The predicted class $\hat{y}$ is determined by thresholding the probability:

$$
\hat{y} =
\begin{cases}
1, & \text{if } P(y = 1 \mid x) \geq 0.5 \\
0, & \text{otherwise}
\end{cases}
$$

### 3. Cost function (Log-loss)

To train the model, we minimize the **logistic loss** (also known as cross-entropy loss):

$$
J(w) = -\frac{1}{N} \sum_{i=1}^{N}
\left[
y_i \ln(\hat{y}_i) + (1 - y_i)\ln(1 - \hat{y}_i)
\right]
$$

where  

* $N$ — number of training examples,  
* $\hat{y}_i = P(y_i = 1 \mid x_i)$ — model’s predicted probability.

### 4. Parameter optimization (Gradient Descent)

In theory, the model parameters $w$ can be found by minimizing $J(w)$ through **gradient descent**:

$$
w_j := w_j - \eta \, \frac{\partial J(w)}{\partial w_j}
$$

where  

* $\eta$ — learning rate,  
* $\frac{\partial J(w)}{\partial w_j}$ — gradient of the cost function w.r.t. parameter $w_j$.

### 5. Optimization in `scikit-learn`

Unlike basic gradient descent, `scikit-learn`’s `LogisticRegression` uses **advanced optimization solvers**  
that efficiently minimize the cost function. Depending on the setting, these include:

| **Solver** | **Method** | **Description** |
| ----------- | ----------- | --------------- |
| `liblinear` | Coordinate Descent | Works well for small datasets; supports L1 and L2 regularization. |
| `lbfgs` | Quasi-Newton (BFGS approximation) | Default solver; efficient for multiclass and large datasets. |
| `newton-cg` | Newton’s Method | Uses second-order derivatives (Hessian) for precise optimization. |
| `saga` | Stochastic Gradient Descent variant | Scales to very large datasets; supports L1 and ElasticNet penalties. |

### Summary

Logistic Regression transforms a **linear combination of inputs** into a **probability** via the sigmoid function.  
It is simple, interpretable, and effective for **linearly separable** data.  
With advanced solvers, `scikit-learn` implementations are **numerically stable**, **efficient**, and support **regularization** for better generalization.

In [None]:
X, y = make_classification(
    n_samples=200,
    n_features=2,
    n_redundant=0,
    n_informative=2,
    n_clusters_per_class=1,
    random_state=42
)

plt.scatter(X[:,0], X[:,1], c=y)
plt.show()

In [None]:
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

In [None]:
model = LogisticRegression()
model.fit(X_train, y_train)
y_pred = model.predict(X_test)

In [None]:
print(f"Accuracy: {accuracy_score(y_test, y_pred) * 100:.2f}%")
print("Classification report:")
print(classification_report(y_test, y_pred))

In [None]:
sns.heatmap(confusion_matrix(y_test, y_pred), annot=True, fmt=".2f", cmap="crest")
plt.xlabel("Predicted")
plt.ylabel("True")
plt.show()

## Logistic Regression on the Two Moons Dataset

In [None]:
from sklearn.preprocessing import PolynomialFeatures
from sklearn.datasets import make_moons

In [None]:
X, y = make_moons(n_samples=200, noise=0.2, random_state=42)

plt.scatter(X[:,0], X[:,1], c=y)
plt.show()

In [None]:
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

poly = PolynomialFeatures(degree=3)
X_train_poly = poly.fit_transform(X_train)
X_test_poly = poly.transform(X_test)

In [None]:
model = LogisticRegression()
model.fit(X_train_poly, y_train)
y_pred = model.predict(X_test_poly)

In [None]:
print(f"Accuracy: {accuracy_score(y_test, y_pred) * 100:.2f}%")
print("Classification report:")
print(classification_report(y_test, y_pred))

In [None]:
sns.heatmap(confusion_matrix(y_test, y_pred), annot=True, fmt=".2f", cmap="crest")
plt.xlabel("Predicted")
plt.ylabel("True")
plt.show()