<a href="https://colab.research.google.com/github/werowe/HypatiaAcademy/blob/master/ml/logistic_regression_intro.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

#Log Odds

The **log odds** is the natural logarithm of the odds, which provides a way to express the relationship between probabilities in a continuous scale. Let’s break this down using the example of the horse with odds of 9 to 1:

---

### **Step 1: Recall the Odds**
Odds are expressed as the ratio of the probability of an event happening to the probability of it not happening. For odds of 9 to 1:
- Probability of winning: $$ P(\text{Win}) = \frac{1}{10} = 0.1 $$
- Probability of losing: $$ P(\text{Lose}) = 1 - P(\text{Win}) = 0.9 $$

The odds are calculated as:
$$
\text{Odds} = \frac{P(\text{Win})}{P(\text{Lose})} = \frac{0.1}{0.9} = \frac{1}{9}.
$$

---

### **Step 2: Log Odds**
The **log odds** is simply the natural logarithm ($$\ln$$) of the odds:
$$
\text{Log Odds} = \ln(\text{Odds}) = \ln\left(\frac{1}{9}\right).
$$

Using a calculator:
$$
\ln\left(\frac{1}{9}\right) = \ln(1) - \ln(9) = 0 - 2.197 = -2.197.
$$

---

### **Interpretation**
- A **negative log odds** ($$-2.197$$) indicates that the event (the horse winning) is less likely than its complement (the horse losing). The further negative the log odds, the less likely the event is.
- If log odds were **positive**, it would mean the event is more likely than its complement.

---

### **In Summary**
- Odds quantify how likely an event is relative to its complement.
- Log odds provide a continuous scale for measuring likelihood, which is useful in statistical modeling (e.g., logistic regression).


If the linear function in logistic regression is expressed as $$ mx + b $$ (where $m$ is the slope and $b$ is the intercept), we can derive the **logistic regression function** step by step. Here's the explanation:

---

### **Step 1: Log Odds Formula**
The log odds (logit function) is defined as:
$$
\ln\left(\frac{p}{1-p}\right),
$$
where:
- $p$ is the probability of success,
- $1-p$ is the probability of failure.

In logistic regression, we assume that the log odds is a linear function of the predictor variable $x$:
$$
\ln\left(\frac{p}{1-p}\right) = mx + b,
$$
where:
- $m$ is the slope of the linear relationship,
- $b$ is the intercept.

---

### **Step 2: Solve for Odds**
Exponentiate both sides to remove the logarithm:
$$
\frac{p}{1-p} = e^{mx + b}.
$$

Here, $$ \frac{p}{1-p} $$ represents the odds, and now we express it as an exponential function of $$ x $$.

---

### **Step 3: Solve for Probability $p$**
Next, solve for $p$ (the probability of success). Start by isolating $p$:
$$
p = \frac{\text{Odds}}{1 + \text{Odds}} = \frac{e^{mx + b}}{1 + e^{mx + b}}.
$$

---

### **Step 4: Logistic Regression Function**
The resulting equation for probability is:
$$
p = \frac{1}{1 + e^{-(mx + b)}}.
$$

This is the **logistic regression function**, which maps any linear combination of predictors (in this case, $$ mx + b $$) to a probability value between 0 and 1.

---

### **Key Points**
- The log odds ($\ln(p/(1-p))$) are modeled as a linear function, here given by $mx + b$.
- Solving for $p$ gives us the logistic regression equation:
  $$
  p = \frac{1}{1 + e^{-(mx + b)}}.
  $$
- The sigmoid function ensures that probabilities stay within the range, making it ideal for classification problems.


## Google Sheets
All of this is manually calculated in [this google spreadsheet](https://docs.google.com/spreadsheets/d/1IVI6bVe33BRu9KdbJKwIw_k6IfD4_XjjBXfxNhBjiYw/edit?usp=sharing)

In [None]:
import matplotlib.pyplot as plt
from sklearn import linear_model
import numpy as np

x = np.array([1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12])
x1 = x.reshape(x.size, 1)
y = np.array([0, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1])

reg = linear_model.LogisticRegression(fit_intercept=True)
p = np.array([x1.size]).reshape(1, -1)
reg.fit(x1, y)

for i in range(0, x1.size):
    p = np.array(x[i]).reshape(1, -1)
    reg.predict(p)
    print("x[i]=", x[i], ", actual value=", y[i], ", predicted value=", reg.predict(p))


x[i]= 1 , actual value= 0 , predicted value= [0]
x[i]= 2 , actual value= 0 , predicted value= [0]
x[i]= 3 , actual value= 1 , predicted value= [1]
x[i]= 4 , actual value= 1 , predicted value= [1]
x[i]= 5 , actual value= 1 , predicted value= [1]
x[i]= 6 , actual value= 1 , predicted value= [1]
x[i]= 7 , actual value= 1 , predicted value= [1]
x[i]= 8 , actual value= 1 , predicted value= [1]
x[i]= 9 , actual value= 1 , predicted value= [1]
x[i]= 10 , actual value= 1 , predicted value= [1]
x[i]= 11 , actual value= 1 , predicted value= [1]
x[i]= 12 , actual value= 1 , predicted value= [1]
