# Generalized Linear Models (GLS)

Typically we assume that random variables are normally distributed. 

Generalized linear models (GLMs) are a class of models that generalize linear regression by allowing the response variable to have a distribution other than a normal distribution. GLMs consist of three components: a random component (distribution of the response variable), a systematic component (linear predictor), and a *link function* that connects the two.

Examples of non-normally distributed measurements:
  - Death (0 or 1, so it will not be normally distributed)
	- The number of times people go to therapy (most people go zero times)

In [1]:
import statsmodels.api as sm
import numpy as np

# Sample data
X = np.array([[1, 2], [2, 3], [3, 4], [4, 5]]) # predictors
y = np.array([0, 1, 0, 1]) # binary response variable

# Add a constant to the predictors
X = sm.add_constant(X)

# Fit a logistic regression model (a type of GLM)
model = sm.GLM(y, X, family=sm.families.Binomial())
result = model.fit()

# Print the model summary
print(result.summary())


                 Generalized Linear Model Regression Results                  
Dep. Variable:                      y   No. Observations:                    4
Model:                            GLM   Df Residuals:                        2
Model Family:                Binomial   Df Model:                            1
Link Function:                  Logit   Scale:                          1.0000
Method:                          IRLS   Log-Likelihood:                -2.3475
Date:                Sun, 16 Jun 2024   Deviance:                       4.6950
Time:                        20:52:43   Pearson chi2:                     3.66
No. Iterations:                     4   Pseudo R-squ. (CS):             0.1915
Covariance Type:            nonrobust                                         
                 coef    std err          z      P>|z|      [0.025      0.975]
------------------------------------------------------------------------------
const         -1.8164      2.295     -0.791      0.4

Linear regression can be thought of as a special case of a GLM:

- The systematic component is $B_0 + B_1 x$.
- Link function: $y(x) = x$.
- Random distribution is the Normal distribution.

## Common Link Functions

Different distributions have particular link functions associated with them

| Name                | Link Function       | Distribution       |
|---------------------|---------------------|--------------------|
| Logistic Regression | $\log\left(\frac{p}{1-p}\right)$ (logit) | Binomial           |
| Poisson Regression  | $\log(\lambda)$ (log)     | Poisson            |
| Gamma Regression    | $\frac{1}{\mu}$ (inverse)    | Gamma              |

When should you use each regression type:
* Any time you have a binary outcome, logistic should be your go-to.
* Poisson: Use when you have a skewed discrete distribution (e.g. number of times you go to the doctor). Poisson assumes that your mean is equal to the variance (may not always be true).
* Negative Binomial: Similar to Poisson, except it allows for a different mean and variance.
* Gamma is good for skewed, continuous, positive distributions. The variables must be greater than zero.
