<a href="https://colab.research.google.com/github/werowe/HypatiaAcademy/blob/master/ml/logistic_regression_intro.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Gambling Odds

[Europa League Betting Odds & Fixtures](https://www.oddsportal.com/football/europe/europa-league/)


| Column Heading      | Example Value   | Meaning                                                                 |
|---------------------|----------------|-------------------------------------------------------------------------|
| 01 May 2025 - Play Offs | —          | The date and round of the match (Play Offs on 01 May 2025).             |
| 22:00               | 22:00          | Kick-off time of the match.                                             |
| Ath Bilbao          | Ath Bilbao     | Home team name.                                                         |
| –                   | –              | Separator between home and away teams.                                  |
| Manchester Utd      | Manchester Utd | Away team name.                                                         |
| 1                   | 1.98           | Odds for a home win (Ath Bilbao to win in regular time).                |
| X                   | 3.45           | Odds for a draw (the match ends level after regular time).              |
| 2                   | 3.85           | Odds for an away win (Manchester Utd to win in regular time).           |
| B's                 | 20             | Number of bookmakers currently offering odds for this match.             |



If the decimal odds are **1.98**, the implied probability is calculated as:

$$
\text{Implied Probability} = \frac{1}{\text{Decimal Odds}}
$$

So,

$$
\text{Implied Probability} = \frac{1}{1.98} \approx 0.505
$$

To express this as a percentage:

$$
0.505 \times 100 = 50.5\%
$$

**Therefore, decimal odds of 1.98 imply a probability of approximately 50.5%.**

If you want to use **log odds** in a gambling context, you must first convert the bookmaker's odds back to an implied probability (removing the margin if you want the "fair" probability), and then use the statistical definition:

Convert decimal odds to implied probability:

$$\text{p} = \frac{1}{\text{Decimal Odds}}$$


#How Odds Define Payout
Odds in gambling directly determine your potential payout if your bet wins.

Decimal Odds
Formula:

$$\text{Payout}=\text{Stake} \times \text{Decimal Odds}$$

**Explanation**

Decimal odds show the total return for every unit wagered, including your original stake.


## Example

If you bet €20 at odds of 6.00, your total payout is €20 x 6.00 = €120 (this includes your €20 stake)


# Take Away

Gambling odds are calculated by boomakers using statistical distributions.  They are not linear equations.

## Log Odds


- **Log odds (logit):**  
  The log odds is the natural logarithm of the odds:
  
   $$ \ln\left(\frac{p}{1-p}\right) $$
   
   It maps probabilities (0 to 1) to the entire real number line ($-\infty$ to $+\infty$).  

  - Example: If $p = 0.9$, log odds =
  
  $$\ln\left(\frac{0.9}{0.1}\right) \approx 2.2$$

- **Sigmoid function (logistic function):**  
  The sigmoid function maps real numbers to probabilities (0 to 1). It is the **inverse** of the log odds.  
  
  - Formula:
  
  $$\sigma(x) = \frac{1}{1 + e^{-x}}$$
  
  - Example:  
  
  if $x = 2.2$
  
  sigmoid is:

  $$\sigma(2.2) \approx 0.9$$.

**Key difference:**  
- Log odds converts probabilities to real numbers.  
- Sigmoid converts real numbers to probabilities.


## Logistic Regression

If the linear function in logistic regression is expressed as $$ mx + b $$ (where $m$ is the slope and $b$ is the intercept), we can derive the **logistic regression function** step by step. Here's the explanation:

---

### **Step 1: Log Odds Formula**
The log odds (logit function) is defined as:
$$
\ln\left(\frac{p}{1-p}\right),
$$
where:
- $p$ is the probability of success,
- $1-p$ is the probability of failure.

#Logistic Regression

In logistic regression, we assume that the log odds is a linear function of the predictor variable $x$:
$$
\ln\left(\frac{p}{1-p}\right) = mx + b,
$$
where:
- $m$ is the slope of the linear relationship,
- $b$ is the intercept.

---

### **Step 2: Solve for Odds**
Exponentiate both sides to remove the logarithm:
$$
\frac{p}{1-p} = e^{mx + b}.
$$

Here, $$ \frac{p}{1-p} $$

represents the odds, and now we express it as an exponential function of $x$.

---

### **Step 3: Solve for Probability $p$**
Next, solve for $p$ (the probability of success). Start by isolating $p$:
$$
p = \frac{\text{Odds}}{1 + \text{Odds}} = \frac{e^{mx + b}}{1 + e^{mx + b}}.
$$

---

### **Step 4: Logistic Regression Function**
The resulting equation for probability is:
$$
p = \frac{1}{1 + e^{-(mx + b)}}.
$$

This is the **logistic regression function**, which maps any linear combination of predictors (in this case, $mx + b $) to a probability value between 0 and 1.

---

### **Key Points**
- The log odds ($\ln(p/(1-p))$) are modeled as a linear function, here given by $mx + b$.
- Solving for $p$ gives us the logistic regression equation:
  $$
  p = \frac{1}{1 + e^{-(mx + b)}}.
  $$
- The sigmoid function ensures that probabilities stay within the range, making it ideal for classification problems.


## Google Sheets
All of this is manually calculated in [this google spreadsheet](https://docs.google.com/spreadsheets/d/1IVI6bVe33BRu9KdbJKwIw_k6IfD4_XjjBXfxNhBjiYw/edit?usp=sharing)

In [None]:
import matplotlib.pyplot as plt
from sklearn import linear_model
import numpy as np

x = np.array([1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12])
x1 = x.reshape(x.size, 1)
y = np.array([0, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1])

reg = linear_model.LogisticRegression(fit_intercept=True)
p = np.array([x1.size]).reshape(1, -1)
reg.fit(x1, y)

for i in range(0, x1.size):
    p = np.array(x[i]).reshape(1, -1)
    reg.predict(p)
    print("x[i]=", x[i], ", actual value=", y[i], ", predicted value=", reg.predict(p))


x[i]= 1 , actual value= 0 , predicted value= [0]
x[i]= 2 , actual value= 0 , predicted value= [0]
x[i]= 3 , actual value= 1 , predicted value= [1]
x[i]= 4 , actual value= 1 , predicted value= [1]
x[i]= 5 , actual value= 1 , predicted value= [1]
x[i]= 6 , actual value= 1 , predicted value= [1]
x[i]= 7 , actual value= 1 , predicted value= [1]
x[i]= 8 , actual value= 1 , predicted value= [1]
x[i]= 9 , actual value= 1 , predicted value= [1]
x[i]= 10 , actual value= 1 , predicted value= [1]
x[i]= 11 , actual value= 1 , predicted value= [1]
x[i]= 12 , actual value= 1 , predicted value= [1]


# Homework

Pick one of these.  Or find another.  

1. **[Titanic: Machine Learning from Disaster](https://www.kaggle.com/c/titanic)**
   - Predict passenger survival on the Titanic using demographic and ticket information. Classic introductory dataset.

2. **[Iris Species](https://www.kaggle.com/datasets/uciml/iris)**
   - Classify iris flowers into species based on petal and sepal measurements. Can be adapted for binary or multiclass logistic regression.

3. **[Pima Indians Diabetes Database](https://www.kaggle.com/datasets/uciml/pima-indians-diabetes-database)**
   - Predict diabetes occurrence based on health attributes. Widely used for binary classification tasks.

4. **[Heart Disease UCI](https://www.kaggle.com/datasets/ronitf/heart-disease-uci)**
   - Predict the presence of heart disease using patient data. Well-suited for binary logistic regression.

5. **[Wine Quality Data Set (Red & White Wine)](https://www.kaggle.com/datasets/ruthgn/wine-quality-data-set-red-white-wine)**
   - Predict wine quality from physicochemical tests. Can be converted to binary classification (e.g., good vs. bad wine).

6. **[Student Performance in Exams](https://www.kaggle.com/datasets/spscientist/students-performance-in-exams)**
   - Predict student pass/fail outcomes using demographic and test score features.

7. **[Credit Card Transactions Fraud Detection](https://www.kaggle.com/datasets/kartik2112/fraud-detection)**
   - Classify credit card transactions as fraudulent or not. Real-world dataset for binary classification.

8. **[Polycystic Ovary Syndrome (PCOS) Diagnostic](https://www.kaggle.com/datasets/ambujtripathi/pcos-data)**
   - Predict PCOS diagnosis from health data. Useful for binary logistic regression in medical applications.
