## Logistic Regression with Python

The Python code in this Notebook is provided as part of a [Dave on Data](https://www.daveondata.com) crash course on logistic regression with Python.

The code is built using the mighty [statsmodels](https://www.statsmodels.org/) library. Instructions for installing statsmodels are available [here](https://www.statsmodels.org/stable/install.html).

This code is provided **as-is** for your use. No warranty for this code should be assumed or is implied.

### Load the *Heart* Dataset

The webinar uses the [Statlog (Heart) Data Set](https://archive.ics.uci.edu/dataset/145/statlog+heart) available from UCI Machine Learning Repository.

In [1]:
import pandas as pd

# Load the Heart dataset
heart = pd.read_csv('Heart.csv')
heart.head()

Unnamed: 0,HeartDisease,Age,Male,ChestPainType,BloodPressure,Cholesterol,BloodSugar,EEG,MaxHR,Angina,OldPeak,PeakST,Flourosopy,Thal
0,1,70,1,4,130,322,0,2,109,0,2.4,2,3,3
1,0,67,0,3,115,564,0,2,160,0,1.6,2,0,7
2,1,57,1,2,124,261,0,0,141,0,0.3,1,0,7
3,0,64,1,4,128,263,0,0,105,1,0.2,2,1,7
4,0,74,0,2,120,269,0,2,121,1,0.2,1,1,3


### Your First Logistic Regression Model

As both the *HeartDisease* label and *Male* feature are already binary (i.e., the values are either 0 or 1), they can be used directly in creating a logistic regression model. The code below uses a convenient way to specify models based on the R programming language's [formula syntax](https://www.statsmodels.org/dev/example_formulas.html).

In [2]:
import statsmodels.formula.api as smf

# Craft a logistic regression model to predict HeartDisease based on being Male
heart_model_1 = smf.logit(formula = 'HeartDisease ~ Male', data = heart)

# Train the model from the data
model_1_results = heart_model_1.fit()

# What are the model results?
print(model_1_results.summary())

Optimization terminated successfully.
         Current function value: 0.640593
         Iterations 5
                           Logit Regression Results                           
Dep. Variable:           HeartDisease   No. Observations:                  270
Model:                          Logit   Df Residuals:                      268
Method:                           MLE   Df Model:                            1
Date:                Fri, 01 Dec 2023   Pseudo R-squ.:                 0.06750
Time:                        08:14:01   Log-Likelihood:                -172.96
converged:                       True   LL-Null:                       -185.48
Covariance Type:            nonrobust   LLR p-value:                 5.618e-07
                 coef    std err          z      P>|z|      [0.025      0.975]
------------------------------------------------------------------------------
Intercept     -1.2090      0.255     -4.745      0.000      -1.708      -0.710
Male           1.3953      0.

### Your Second Logistic Regression Model

In [3]:
# A logistic regression model to predict HeartDisease using Male and Age
heart_model_2 = smf.logit(formula = 'HeartDisease ~ Male + Age', data = heart)

# Train the model from the data
model_2_results = heart_model_2.fit()

# What are the model results?
print(model_2_results.summary())

Optimization terminated successfully.
         Current function value: 0.607039
         Iterations 5
                           Logit Regression Results                           
Dep. Variable:           HeartDisease   No. Observations:                  270
Model:                          Logit   Df Residuals:                      267
Method:                           MLE   Df Model:                            2
Date:                Fri, 01 Dec 2023   Pseudo R-squ.:                  0.1163
Time:                        08:14:01   Log-Likelihood:                -163.90
converged:                       True   LL-Null:                       -185.48
Covariance Type:            nonrobust   LLR p-value:                 4.249e-10
                 coef    std err          z      P>|z|      [0.025      0.975]
------------------------------------------------------------------------------
Intercept     -4.8637      0.959     -5.071      0.000      -6.744      -2.984
Male           1.6222      0.

### Your Third Logistic Regression Model

In [4]:
# A logistic regression model to predict HeartDisease using Male, Age, & Angina
heart_model_3 = smf.logit(formula = 'HeartDisease ~ Male + Age + Angina', data = heart)

# Train the model from the data
model_3_results = heart_model_3.fit()

# What are the model results?
print(model_3_results.summary())

Optimization terminated successfully.
         Current function value: 0.538839
         Iterations 6
                           Logit Regression Results                           
Dep. Variable:           HeartDisease   No. Observations:                  270
Model:                          Logit   Df Residuals:                      266
Method:                           MLE   Df Model:                            3
Date:                Fri, 01 Dec 2023   Pseudo R-squ.:                  0.2156
Time:                        08:14:01   Log-Likelihood:                -145.49
converged:                       True   LL-Null:                       -185.48
Covariance Type:            nonrobust   LLR p-value:                 3.090e-17
                 coef    std err          z      P>|z|      [0.025      0.975]
------------------------------------------------------------------------------
Intercept     -5.2011      1.030     -5.051      0.000      -7.219      -3.183
Male           1.4648      0.

### Interpreting Coefficients

In [5]:
from math import exp

# Get the odds ratio for the Male coefficient
print(exp(model_3_results.params['Male']))

4.326528261966836


In [6]:
# Get the odds ratio for the Age coefficient
print(exp(model_3_results.params['Age']))

1.0633171823751426


In [7]:
# Get the odds ratio for the Angina coefficient
print(exp(model_3_results.params['Angina']))

6.020938842080951
