<a href="https://colab.research.google.com/github/zia207/01_Generalized_Linear_Models_R/blob/main/Note/02_01_09_00_glm_gamlss_introduction_r.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

![alt text](http://drive.google.com/uc?export=view&id=1bLQ3nhDbZrCCqy_WCxxckOne2lgVvn3l)

# 9. Generalized Additive Model for Location, Scale, and Shape (GAMLSS)

**GAMLSS (Generalized Additive Models for Location Scale and Shape)** are distributional regression models that allow explanatory variables (X) to affect all aspects of the response variable's distribution (y), not just the expected value. This is useful for analyzing shifts in not only the mean but also variance, skewness, kurtosis, and quantiles.

GAMLSS can be implemented in R via packages such as `gamlss`, `gamlss.data`, and `gamlss.dist`, which facilitate fitting algorithms, data, and distributions. The `gamboostLSS` package employs boosting, while `bamlss` uses Bayesian MCMC. Other packages connect GAMLSS to machine learning methods and visualization tools like ggplot2. An enhanced version, `gamlss2`, will be available on CRAN soon. GAMLSS offers over 100 continuous, discrete, and mixed distributions for response modeling, including truncated, censored, log and logit transformed types. The book *Distributions for Modelling Location, Scale, and Shape: Using GAMLSS in R* serves as a comprehensive guide to these distributions.

Within GAMLSS, machine learning techniques can model distribution parameters. Options for additive smoothing include P-splines and cubic smoothing splines, while methods for dimensionality reduction encompass ridge regression and LASSO. Interaction modeling can be achieved with regression trees and neural networks, and random effects can also be incorporated.

GAMLSS is particularly beneficial for practitioners handling skewed and kurtotic data, common in fields like medicine and finance. Notable users include the World Health Organization (WHO), the Global Lung Function Initiative, the International Monetary Fund (IMF), and the European Parliament, among others.

## Overview

In GAMLSS, the response variable ( Y ) is assumed to follow a parametric distribution with probability density function (pdf) or probability mass function (pmf):
$$
Y \sim D(\mu, \sigma, \nu, \tau, \dots)
$$
where the distribution $D$ has up to **four parameters**:
- $\mu$: **Location** (e.g., mean)
- $\sigma$: **Scale** (e.g., standard deviation)
- $\nu$: **Shape** (e.g., skewness)
- $\tau$: **Shape** (e.g., kurtosis)

Each parameter is modeled via a **link function** as an additive function of explanatory variables:
$$
\begin{align*}
g_1(\mu) &= \eta_1 = \beta_{10} + f_{11}(x_1) + f_{12}(x_2) + \dots + f_{1p}(x_p) \\
g_2(\sigma) &= \eta_2 = \beta_{20} + f_{21}(x_1) + f_{22}(x_2) + \dots + f_{2p}(x_p) \\
g_3(\nu) &= \eta_3 = \beta_{30} + f_{31}(x_1) + f_{32}(x_2) + \dots + f_{3p}(x_p) \\
g_4(\tau) &= \eta_4 = \beta_{40} + f_{41}(x_1) + f_{42}(x_2) + \dots + f_{4p}(x_p)
\end{align*}
$$
where:
- $g_k(\cdot)$ are **link functions** (e.g., identity, log, logit)
- $\eta_k$ are **linear predictors** for the $k$-th parameter
- $f_{kj}(\cdot)$ are **smooth functions** (splines, polynomials, etc.) of covariates
- $\beta_{k0}$ are intercepts

This framework allows **nonlinear, nonparametric relationships** between predictors and each distribution parameter.

### Key Features of GAMLSS

1. **Flexible Distribution Assumptions**
    - Can model responses using a wide range of distributions, including highly skewed, heavy-tailed, zero-inflated, or bounded distributions (e.g., Student's t, Johnson's SU, Beta, via `ondil.distributions`).
    - Not limited to exponential family (unlike GLMs/GAMs).

2. **Modeling All Distribution Parameters**
    - Unlike standard models that only model the mean, GAMLSS can model **variance, skewness, and kurtosis** as functions of covariates.
    - Enables **heteroscedasticity modeling** and **distributional regression**.

3. **Additive Predictors**
    - Each parameter is modeled as an **additive function** of covariates, similar to GAMs.
    - Can include smooth terms, random effects, spatial effects, and interactions (via `scikit-learn` integration).

4. **Robustness to Non-Normality**
    - Handles data that violate normality, constant variance, or other classical assumptions.

5. **Estimation via Maximum Likelihood**
    - Parameters are estimated using **(penalized) maximum likelihood** with online coordinate descent in `ondil`.
    - Smoothing parameters are selected using criteria like AIC, BIC, or GCV (via `ic` parameter).

6. **Software Support**
    - Implemented in Python via the `ondil` package (Hirsch et al., ongoing development).

### Applications of GAMLSS

GAMLSS is widely used in fields where data exhibit complex distributional behavior:

1. **Health and Medicine**
    - Modeling growth charts (e.g., WHO child growth standards), where percentiles (via ( \mu ) and ( \sigma )) vary with age.
    - Blood pressure, BMI, or biomarker modeling with age, sex, and lifestyle factors.

2. **Insurance and Risk Modeling**
    - Modeling claim sizes with heavy-tailed distributions (e.g., Pareto, GB2 approximations via custom distributions).
    - Actuarial risk assessment with variable scale and shape.

3. **Environmental Science**
    - Rainfall, temperature, or pollution levels with skewness and heteroscedasticity.

4. **Economics and Finance**
    - Income distribution modeling (skewed, heavy-tailed).
    - Volatility modeling (scale as a function of predictors).

5. **Ecology**
    - Species abundance with zero-inflation and overdispersion.

### How GAMLSS Differs from Standard GAM

| Feature                  | **Standard GAM**                                                                 | **GAMLSS**                                                                 |
|--------------------------|----------------------------------------------------------------------------------|----------------------------------------------------------------------------|
| **Distribution**         | Typically exponential family (Normal, Poisson, Binomial)                         | Any parametric distribution (including non-exponential family)              |
| **Parameters Modeled**   | Only the **mean** (location)                                                     | **All parameters**: mean, scale, skewness, kurtosis                         |
| **Variance Assumption**  | Often assumes constant variance or variance linked to mean                       | **Variance (and shape) can depend on covariates**                           |
| **Flexibility**          | Flexible mean function                                                           | Flexible **entire distribution**                                            |
| **Modeling Goal**        | Estimate conditional mean                                                        | Estimate full conditional distribution                                      |
| **Example Use Case**     | Predicting average house price (using `pygam`)                                   | Predicting full price distribution (mean, spread, skew) using `ondil`       |

## The {gamlss} packages

The GAMLSS framework comprise of several different packages written in the free software R, i.e. the original gamlss package and other add-on packages, i.e.

1.  the original {gamlss} package for fitting a GAMLSS model
2.  the {gamlss.add} package for extra additive teerms.
3.  the {gamlss.boot} package for bootstrapping centiles.
4.  the {gamlss.cens} package for fitting censored (left, right or interval) response variables.
5.  the {gamlss.data} package for data used for demostration.
6.  the {gamlss.demo} package for teaching purpose demos.
7.  the {gamlss.dist} package for gamlss.family distributions
8.  the {gamlss.mx} package for fitting finite mixture distributions.
9.  the {gamlss.nl} package for fitting non-linear models
10. the {gamlss.tr} package for fitting truncated distributions.

### Main Functions

| Function | Description |
|------------------------------------|------------------------------------|
| `gamlss()` | Fits GAMLSS models, similar to `gam()` but for all parameters. |
| `fitted()` | Extracts fitted values for parameters (e.g., μ, σ). |
| `predict()` | Predicts parameters for new data. |
| `residuals()` | Computes normalized quantile residuals. |
| `summary()` | Summarizes model fit, coefficients, and diagnostics. |
| `plot()` | Generates residual diagnostic plots (e.g., Q-Q, density). |
| `wp()` | Creates worm plots for residual diagnostics. |
| `centiles()` | Computes and plots centile curves. |
| `GAIC()` / `AIC()` | Calculates Generalized Akaike Information Criterion for model comparison. |
| `stepGAIC()` | Performs stepwise model selection. |
| `histDist()` | Fits and plots parametric distributions to histograms. |
| `pdf.plot()` | Plots probability density functions. |
| `rqres.plot()` | Plots randomized quantile residuals. |
| `find.hyper()` | Optimizes hyperparameters (e.g., smoothing degrees of freedom). |                         |

## Resources

1.  Rigby, R. A., & Stasinopoulos, D. M. (2005). *Generalized additive models for location, scale and shape*. Journal of the Royal Statistical Society: Series C (Applied Statistics), 54(3), 507–554.

2.  Stasinopoulos, D. M., & Rigby, R. A. (2007). *Generalized additive models for location scale and shape (GAMLSS) in R*. Journal of Statistical Software, 23(7), 1–46.

3.  [gamlss](https://www.gamlss.com/) - Official GAMLSS website with documentation and resources.

4. [Flexible Regression and Smoothing: Using GAMLSS in R](https://www.taylorfrancis.com/books/mono/10.1201/b21973/flexible-regression-smoothing-mikis-stasinopoulos-robert-rigby-gillian-heller-vlasios-voudouris-fernanda-de-bastiani)

5. [Distributions for Location Scale and Shape: Using GAMLSS in R](https://www.taylorfrancis.com/books/mono/10.1201/9780429298547/distributions-modeling-location-scale-shape-robert-rigby-mikis-stasinopoulos-gillian-heller-fernanda-de-bastiani)

6.  [Generalised Additive Models for Location Scale and Shape: A Distributional Regression Approach with Applications](https://www.cambridge.org/core/books/generalized-additive-models-for-location-scale-and-shape/3499AA3827369F287B604639D1127FE3/).