# Logistic Regression

## Review: Linear Regression

In [None]:
import matplotlib.pyplot as plt
import pandas as pd
import seaborn as sns
import numpy as np
from sklearn.linear_model import LinearRegression
from sklearn.metrics import accuracy_score
%matplotlib inline

In [None]:
fpath = './data/glass.csv'
colnames = ['ri', 'na', 'mg', 'al', 'si', 'k', 'ca', 'ba', 'fe', 'glass_type']
df = pd.read_csv(fpath, names=colnames, skiprows=1)
df.head()

**Data Dictionary**

- `Id`: number: 1 to 214
- `RI`: refractive index  
- `Na`: Sodium (unit measurement: weight percent in corresponding oxide, as are attributes 4-10)
- `Mg`: Magnesium
- `Al`: Aluminum
- `Si`: Silicon
- `K` : Potassium
- `Ca`: Calcium
- `Ba`: Barium
- `Fe`: Iron
- `Type` : Type of glass:

**Let's build a regression model for refractice index against aluminum content.**

In [None]:
#scatter with regression line


**Exercise.**

- Instantiate and fit a linear regression model predicting `ri` from `al` (and an intercept).

In [None]:
# Fit a linear regression model (name the model "linreg").
linreg = LinearRegression()

- Add a column `y_pred` to `glass` that stores the model's fitted values for the refractice index.

In [None]:
# Make predictions for all values of X and add back to the original DataFrame.


- What do these coefficients mean?

- Manually compute the predicted value of `ri` when `al=2.0` using the regression equation.

In [None]:
# Compute prediction for al=2 using the equation.


- Confirm that this is the same value we would get when using the built-in `.predict()` method of the `LinearRegression` object.

In [None]:
# Compute prediction for al=2 using the predict method.


---

<a id="predicting-a-categorical-response"></a>
## Predicting a Single Categorical Response
---

Linear regression is appropriate when we want to predict the value of a continuous target/response variable, but what about when we want to predict membership in a class or category?

**Examine the glass type column in the data set. What are the counts in each category?**

In [None]:
# Examine glass_type.
df.glass_type.value_counts().sort_index()

Say these types are subdivisions of broader glass types:

> **Window glass:** types 1, 2, and 3

> **Household glass:** types 5, 6, and 7

**Create a new `household` column that indicates whether or not a row is household glass, coded as 1 or 0, respectively.**

In [None]:
# Types 1, 2, 3 are window glass.
# Types 5, 6, 7 are household glass.
df['household'] = df.glass_type.apply(lambda x: 0 if x in [1, 2, 3] else 1)

In [None]:
df.household.value_counts()

Let's change our task, so that we're predicting the `household` category using `al`. Let's visualize the relationship to figure out how to do this.

**Make a scatter plot comparing `al` and `household`.**

In [None]:
fig, ax = plt.subplots()
df.plot(kind='scatter', x='al', y='household', ax=ax, grid=True)

**Fit a new `LinearRegression` predicting `household` from `al`.**

Let's draw a regression line like we did before:

In [None]:
# Fit a linear regression model and store the predictions.
feature_cols = ['al']
X = df[feature_cols] 
y = df.loc[:, 'household'] 
linreg.fit(X, y)
df['household_pred'] = linreg.predict(X)

In [None]:
# Scatter plot that includes the regression line
fig, ax = plt.subplots()
df.plot(kind='scatter', x='al', y='household', ax=ax)
df.plot(x='al', y='household_pred', color='red', ax=ax, grid=True)

If **al=3**, what class do we predict for household?

If **al=1.5**, what class do we predict for household?

We predict the 0 class for **lower** values of al, and the 1 class for **higher** values of al. What's our cutoff value? Around **al=2**, because that's where the linear regression line crosses the midpoint between predicting class 0 and class 1.

Therefore, we'll say that if **household_pred >= 0.5**, we predict a class of **1**, else we predict a class of **0**.

**Using this threshold, create a new column of our predictions for whether a row is household glass.**

In [None]:
# Transform household_pred to 1 or 0.
df['household_pred_class'] = np.where(df.loc[:, 'household_pred'] >= 0.5, 1, 0)
df.head()

**Plot a line that shows our predictions for class membership in household vs. not.**

In [None]:
df.sort_values('al', inplace=True)

In [None]:
# Plot the class predictions.
fig, ax = plt.subplots()
df.plot(kind='scatter', x='al', y='household', ax=ax)
df.plot(x='al', y='household_pred_class', color='red', ax=ax, grid=True)

Linear regression yields a reasonable binary classifier in this case when we map values above 0.5 to 1 and values below 0.5 to 0.

It would be nice if we could also interpret the raw numbers it gives us, such as using probabilities. The problem is that linear regression is **unbounded**. As a result, it gives values below 0 and above 1, which cannot be probabilities.

This is where logistic regression comes in: it basically takes that linear regression line and bends its ends into an S-shape so that it always stays between 0 and 1, so that we can interpret its outputs as probabilities.

<a id="using-logistic-regression-for-classification"></a>
## Using Logistic Regression for Classification
---

**Import the `LogisticRegression` class from `linear_model` below and fit the same regression model predicting `household` from `al`.**

In [None]:
# Fit a logistic regression model and store the class predictions.
from sklearn.linear_model import LogisticRegression

logreg = LogisticRegression()

feature_cols = ['al']
X = df.loc[:, feature_cols]
y = df.loc[:, 'household']

logreg.fit(X,y)
pred = logreg.predict(X)

**Plot the predicted class using the logistic regression as we did for the linear regression predictions above.**

As you will see, the class predictions are nearly the same.

In [None]:
fig, ax = plt.subplots()
df.plot(kind='scatter', x='al', y='household', ax=ax, grid=True)
ax.plot(df.loc[:, 'al'], np.array(pred), c='r')

<center>This is what we got just now but less work??!!!</center>

<img src="https://static1.squarespace.com/static/58751ef6db29d6ff4a8f84b6/58751fc09de4bbe17f629c6d/59131a568419c2f417285bc2/1556913316191/catshock.jpg?format=750w">

What if we wanted the predicted probabilities instead of just the class predictions, to understand how confident we are in a given prediction?

**Using the built-in `.predict_proba()` function, examine the predicted probabilities for the first handful of rows of `X`.**

In [None]:
logreg.predict_proba(X)[0:10]

Sklearn orders the columns according to our class labels. The two-column output of `predict_proba` returns a column for each class of our `household` variable. The first column is the probability of `household=0` for a given row, and the second column is the probability of `household=1`.

**Store the predicted probabilities of class=1 in its own column in the data set.**

In [None]:
# Store the predicted probabilities of class 1.
df['household_pred_prob'] = logreg.predict_proba(X)[:, 1]

In [None]:
df.isnull().sum()

In [None]:
df.head(10)

**Plot the predicted probabilities as a line on our plot (probability of `household=1` as `al` changes).**

In [None]:
# Plot the predicted probabilities.
fig, ax = plt.subplots()
df.plot(kind='scatter', x='al', y='household', ax=ax, grid=True)
df.plot(x='al', y='household_pred_prob', c='r', ax=ax, grid=True)

In [None]:
# Examine some example predictions.
print(logreg.predict_proba([[1]]))

In [None]:
# Compute the accuracy of the model

accuracy_score(df.household, pred)

**Exercise**

- Build a logistic regression model for `household` using two features of your choice.

In [None]:
df.columns

In [None]:
# Fit a logistic regression model and store the class predictions.


In [None]:
# Compute the accuracy of the model



<a id="probability-odds-ratio-e-log-and-log-odds"></a>
## Understanding Logistic Regression
---

**Recall:** A coefficient in a *linear regression* model tells you how the *number* predicted by the model changes when the associated variable increases by one and all other variables remain the same.

**Similarly**, A coefficient in a *logistic regression* model tells you how the *log odds* predicted by the model changes when the associated variable increases by one and all other variables remain the same.

Let's try to develop some intuitions about log odds to help us reason about our logistic regression models.

#### Odds

$$probability = \frac {one\ outcome} {all\ outcomes}$$

$$odds = \frac {one\ outcome} {all\ other\ outcomes}$$

It is often useful to think of the numeric odds as a ratio. For example, 5/1 = 5 odds is "5 to 1" -- five wins for every one loss (e.g. of six total plays). 2/3 odds means "2 to 3" -- two wins for every three losses (e.g. of five total plays).

Examples:

- Dice roll of 1: probability = 1/6, odds = 1/5
- Even dice roll: probability = 3/6, odds = 3/3 = 1
- Dice roll less than 5: probability = 4/6, odds = 4/2 = 2

$$odds = \frac {probability} {1 - probability}$$

**As an example we can create a table of probabilities vs. odds, as seen below.**

In [None]:
# Create a table of probability versus odds.
table = pd.DataFrame({'probability':[0.1, 0.2, 0.25, 0.5, 0.6, 0.8, 0.9]})
table['odds'] = table.probability / (1 - table.probability)
table

**Exercise.**

Convert the following probabilities to odds:

1. .25
1. 1/3
1. 2/3
1. .95

<a id="understanding-e-and-the-natural-logarithm"></a>
### Understanding the Natural Logarithm

A logarithm tells you the *order of magnitude* of a number. The base-10 logarithm is a continuous version of "the number of times you would need to multiply 10 to get that number."

| number | number as a power of 10 | $\log_{10}$(number) |
| ------ | --- | --- |
| $1 $|$ 10^0$ | 0 |
| $10 $|$ 10^1$ | 1 |
| $100 $|$ 10^2$ | 2 |
| $1000 $|$ 10^3$ | 3 |

It also works in the other direction:

| number | number as a power of 10 | $\log_{10}$(number) |
| ------ | --- | --- |
| $.001 $ | $ 10^{-3}$ | -3 |
| $.01 $ | $ 10^{-2}$ | -2 |
| $.1 $|$ 10^{-1}$ | -1 |
| $1 $|$ 10^0$ | 0 |

And for numbers in between exact powers of 10:

| number | number as a power of 10 | $\log_{10}$(number) |
| ------ | --- | --- |
| $1$ | $ 10^{0}$ | 0 |
| $2$ | $ 10^{.301}$ | .301 |
| $5$|$ 10^{.699}$ | .699 |
| $10$|$ 10^1$ | 1 |
| $20$|$ 10^{1.301}$ | 1.301 |
| $50$|$ 10^{1.699}$ | 1.699 |
| $100$|$ 10^2$ | 2 |

**Base $e$.** It is often convenient to use the special number $e$ as a base instead of 10. The interpretation is analogous: the base-$e$ logarithm of a number is a continuous version of "the number of times you would have to multiple $e$ to get that number."

For instance:

| number | number as a power of $e$ | $\log_{e}$(number) |
| ------ | --- | --- |
| $1 $|$ e^0$ | 0 |
| $2.718$|$ e^1$ | 1 |
| $7.39$|$ e^2$ | 2 |
| $20.09$|$ e^3$ | 3 |

It also works in the other direction:

| number | number as a power of $e$ | $\log_{e}$(number) |
| ------ | --- | --- |
| $.050 $ | $ e^{-3}$ | -3 |
| $.135 $ | $ e^{-2}$ | -2 |
| $.368 $|$ e^{-1}$ | -1 |
| $1 $|$ e^0$ | 0 |

And for numbers in between exact powers of $e$:

| number | number as a power of $e$ | $\log_{e}$(number) |
| ------ | --- | --- |
| $1$ | $ e^{0}$ | 0 |
| $1.35$ | $ e^{.301}$ | .301 |
| $2.01$|$ e^{.699}$ | .699 |
| $2.718$|$ e^1$ | 1 |
| $3.67$|$ e^{1.301}$ | 1.301 |
| $5.47$|$ e^{1.699}$ | 1.699 |
| $7.39$|$ e^2$ | 2 |

When we take the **logarithm** of an **odds** we get the **log odds**.

The most common convention is to use base-$e$ logarithms unless otherwise specified.

In [None]:
# Add log odds to the table.
table['logodds'] = np.log(table['odds'])
table

**Notice:** log odds goes to $-\infty$ as probability goes to 0, and goes to $\infty$ as probability goes to 1.

**Consequence:** The fact that linear model is unbounded is fine if we use it to model *log odds* rather than *probability*.

<a id="what-is-logistic-regression"></a>
### What Is Logistic Regression?
---

**Linear regression:** *Continuous response* is modeled as a linear combination of the features.

$$y = \beta_0 + \beta_1x$$

**Logistic regression:** *Log odds* is modeled as a linear combination of the features.

$$\log \left(\frac{p}{1-p}\right) = \beta_0 + \beta_1x$$

This equation can be rearranged to get the predicted probability:

$$\hat{p} = \frac{e^{\beta_0 + \beta_1x}} {1 + e^{\beta_0 + \beta_1x}}$$

This equation gives us the "S" (sigmoid) shape for the predicted probability as a function of $\beta_1$.

### How do we interpret the regression parameters?

**Linear regression:**

$$y = \beta_0 + \beta_1x$$

- $\beta_0$ tells you the model's prediction for $y$ when all input features are zero.
- $\beta_1$ tells you how the model's prediction for $y$ changes with a one-unit increase in $x$ when all other variables remain the same.

**Logistic regression:**

$$\log \left({p\over 1-p}\right) = \beta_0 + \beta_1x$$

- $\beta_0$ tells you the model's prediction for the *log odds of $y$* when all input features are zero.
- $\beta_1$ tells you how the model's prediction for *the log odds of* $y$ changes with a one-unit increase in $x$ when all other variables remain the same.

**Bottom line:** A positive coefficient means that the predicted log odds of the response (and thus the predicted probability) increases with the associated variable, while a negative coefficient means that it decreases.

![Logistic regression beta values](./img/logistic_betas.png)

Changing the $\beta_0$ value shifts the curve horizontally, whereas changing the $\beta_1$ value changes the slope of the curve.

### Summary

- Logistic regression addresses a binary classification problem by modeling the *log odds* that an individual is in the class as a linear function of the model features.
- A coefficient in a logistic regression model tells you *how the log odds that the model predicts changes* with a one-unit increase in the associated input feature, while other features remain unchanged.
- The model's log-odds predictions can be transformed into *probabilities*.
- Those predicted probabilities follow an "s" (sigmoid) shape that is bounded by 0 and 1, as a function of the input features.
- Those predicted probabilities can be converted into "hard" class predictions by mapping everything above a threshold to 1 and everything below it to 0.

<a id="comparing-logistic-regression-to-other-models"></a>
## Comparing Logistic Regression to Other Models
---

Advantages of logistic regression:

- Somewhat interpretable.
- Training and prediction are fast.
- Outputs probabilities.
- Features don't need scaling.
- Can perform well with a small number of observations.

Disadvantages of logistic regression:

- Presumes a linear relationship between the features and the log odds of the response.
- Performance is (generally) not competitive with the best supervised learning methods.
- Can't automatically learn feature interactions.