<a href="https://colab.research.google.com/github/zia207/r-colab/blob/main/NoteBook/Advance_Regression/02-01-00-glm-introduction-r.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

![alt text](http://drive.google.com/uc?export=view&id=1bLQ3nhDbZrCCqy_WCxxckOne2lgVvn3l)

# Generalized Linear Models

Generalized Linear Models (GLMs) are a versatile class of models that extend linear regression to handle a variety of response variable distributions and relationships. In R, GLMs are implemented through the glm() function and related packages, providing a powerful framework for analyzing both continuous and categorical data across a wide range of contexts. This tutorial introduces several types of GLMs, as well as related models, and demonstrates how to implement each in R.


1. [Generalized Linear Regression (Gaussian)](https://github.com/zia207/r-colab/blob/main/NoteBook/Advance_Regression/02-01-01-glm-regression-r.ipynb)

2. [Logistic Regression (Binary)](https://github.com/zia207/r-colab/blob/main/NoteBook/Advance_Regression/02-01-02-glm-logistic-r.ipynb)

3. [Probit Regression Model](https://github.com/zia207/r-colab/blob/main/NoteBook/Advance_Regression/02-01-03-glm-probit-r.ipynb)

4. [Ordinal Regression](https://github.com/zia207/r-colab/blob/main/NoteBook/Advance_Regression/02-01-04-glm-ordinal-r.ipynb)

5. [Multinomial Logistic Regression](https://github.com/zia207/r-colab/blob/main/NoteBook/Advance_Regression/02-01-05-glm-multinomial-logistic-r.ipynb)

6. [Poisson Regression ](https://github.com/zia207/r-colab/blob/main/NoteBook/Advance_Regression/02-01-06-00-poisson-regression-introduction-r.ipynb)

  6.1. [Standard Poisson Regression (count data)](https://github.com/zia207/r-colab/blob/main/NoteBook/Advance_Regression/02-01-06-01-poisson-regression-standard-r.ipynb)

  6.2.[Poisson Regression Model with Offset (rate data)](https://github.com/zia207/r-colab/blob/main/NoteBook/Advance_Regression/02-01-06-02-poisson-regression-offset-r.ipynb)

  6.3. [Poisson Regression Models for Overdispersed Data](https://github.com/zia207/r-colab/blob/main/NoteBook/Advance_Regression/02-01-06-03-poisson-regression-overdispersion-r.ipynb)

  6.4. [Zero-Inflated Models](https://github.com/zia207/r-colab/blob/main/NoteBook/Advance_Regression/02-01-06-04-poisson-regression-zeroinflated-r.ipynb)

 6.5. [Hurdle Model](https://github.com/zia207/r-colab/blob/main/NoteBook/Advance_Regression/02-01-06-05-poisson-regression-hurdle-r.ipynb)


7. [Gamma Regression](https://github.com/zia207/r-colab/blob/main/NoteBook/Advance_Regression/02-01-07-glm-gamma-regression-r.ipynb)

8. [Beta Regression](https://github.com/zia207/r-colab/blob/main/NoteBook/Advance_Regression/02-01-08-glm-gamma-regression-r.ipynb)

9. [Generalized Additive Model (GAM) ](https://github.com/zia207/r-colab/blob/main/NoteBook/Advance_Regression/02-02-09-glm-gam-regression.ipynb)



##  Introduction to Generalized Linear Model






The Generalized Linear Model (GLM) is a sophisticated extension of linear regression designed to model relationships between a dependent variable and independent variables when the underlying assumptions of linear regression are unmet. The GLM was first introduced by Sir John Nelder and Robert Wedderburn, both acclaimed statisticians, in 1972.

The GLM is an essential tool in modern data analysis, as it can be used to model a wide range of data types that may not conform to the assumptions of traditional linear regression. It allows for modeling non-normal distributions, non-linear relationships, and correlations between observations. By utilizing **maximum likelihood estimation (MLE)**, the GLM can also handle missing data and provide accurate estimates even when some observations are missing. This makes it a valuable tool in business and academia, where the ability to model complex relationships accurately is essential.

The GLM is a powerful and flexible tool integral to modern data analysis. Its ability to model complex relationships between variables and handle missing data has made it a valuable asset in business and academia.

**Maximum Likelihood Estimation (MLE)** is a statistical technique used to estimate the parameters of a model by analyzing the observed data. This method involves finding the optimal values for the model parameters by maximizing the likelihood function. The likelihood function measures how well the model can explain the observed data. The higher the likelihood function, the more accurate the model explains the data. MLE is widely used in fields such as finance, economics, and engineering to create models that can predict future outcomes based on the available data.








## Key features of Generalized Linear Models

1.  **Link Function:** GLMs are characterized by a **link function** that connects the linear predictor, a combination of independent variables, to the mean of the dependent variable. This connection enables the estimation of the relationship between independent and dependent variables in a non-linear fashion.

The selection of a link function in GLMs is contingent upon the nature of the data and the distribution of the response variable. The `identity` link function is utilized when the continuous response variable follows a normal distribution. The `logit` link function is employed when the response variable is binary, meaning it can only take on two values and follows a binomial distribution. The `log` link function is utilized when the response variable is count data and follows a Poisson distribution.

Choosing an appropriate link function is a crucial aspect of modeling, as it impacts the interpretation of the estimated coefficients for independent variables. Therefore, a thorough understanding of the nature of the data and the response variable's distribution is necessary when selecting a link function.

2.  **Distribution Family:** Unlike linear regression, which assumes a normal distribution for the residuals, GLMs allow for a variety of probability distributions for the response variable. The choice of distribution is based on the characteristics of the data. Commonly used distributions include:

    -   **Normal distribution (Gaussian):** For continuous data.

    -   **Binomial distribution:** For binary or dichotomous data.

    -   **Poisson distribution:** For count data.

    -   **Gamma distribution:** For continuous, positive, skewed data.

3.  **Variance Function:** GLMs accommodate heteroscedasticity (unequal variances across levels of the independent variables) by allowing the variance of the response variable to be a function of the mean.

4.  **Deviance:** Instead of using the sum of squared residuals as in linear regression, GLMs use deviance to measure lack of fit. Deviance compares the fit of the model to a saturated model (a model that perfectly fits the data).

The **mathematical expression** of a Generalized Linear Model (GLM) involves the linear predictor, the link function, and the probability distribution of the response variable.

Here's the general form of a GLM:

1.  **Linear Predictor (η):**

    $$ \eta = \beta_0 + \beta_1x_1 + \beta_2x_2 + \ldots + \beta_kx_k $$

    where:

-   $\eta$ is the linear predictor,

-   $\beta_0, \beta_1, \ldots, \beta_k$ are the coefficients,

-   $x_1, x_2, \ldots, x_k$ are the independent variables.

2.  **Link Function (**g):

$$ g(\mu) = \eta $$

The link function connects the linear predictor to the mean of the response variable. It transforms the mean (μ) to the linear predictor (η). Common link functions include:

-   Identity link (for normal distribution):

$$ g(\mu) = \mu $$

-   Logit link (for binary data in logistic regression):

$$ g(\mu) = log(\frac{\mu}{1-\mu}) $$

-   Log link(for Poisson regression):

$$ g(\mu) = \log(\mu )$$

3.  **Probability Distribution:** The response variable follows a probability distribution from the exponential family. The distribution is chosen based on the nature of the data. Common choices include:

    -   Normal distribution (Gaussian) for continuous data.

    -   Binomial distribution for binary or dichotomous data.

    -   Poisson distribution for count data.

    -   Gamma distribution for continuous, positive, skewed data.

Putting it all together, the probability mass function (PMF) or probability density function (PDF) for the response variable (Y) is expressed as:

$$ f(y;\theta,\phi) = \exp\left(\frac{y\theta - b(\theta)}{a(\phi)} + c(y,\phi)\right) $$

where:

-   f(y;θ,ϕ) is the PMF or PDF,

-   θ is the natural parameter,

-   ϕ is the dispersion parameter,

-   a(ϕ), b(θ), c(y,ϕ) are known functions.

## Linear Regression vs Generalized Linear Models

The primary difference between linear models (LM) and generalized linear models (GLM) is in their flexibility to handle different types of response variables and error distributions. Here’s a breakdown of the key distinctions:

### 1. **Type of Response Variable**

-   **LM (Linear Model)**: Assumes that the response variable is continuous and normally distributed. For example, predicting a continuous variable like height or weight.
-   **GLM (Generalized Linear Model)**: Extends linear models to accommodate response variables that are not normally distributed, such as binary outcomes (0 or 1), counts, or proportions. GLMs can handle a variety of distributions (e.g., binomial, Poisson).

### 2. **Link Function**

-   **LM**: The relationship between the predictor variables and the response is assumed to be linear, with an identity link function (i.e., ($Y = X \beta + \epsilon$), where ($\epsilon$) is normally distributed).
-   **GLM**: Uses a link function to transform the linear predictor to accommodate different types of response variables. Common link functions include:
    -   **Logit link** for binary data (logistic regression)
    -   **Log link** for count data (Poisson regression)
    -   **Identity link** for normal data (same as in LM)

### 3. **Error Distribution**

-   **LM**: Assumes errors are normally distributed with constant variance (homoscedasticity).
-   **GLM**: Allows for different error distributions (e.g., binomial, Poisson, gamma) to better suit the data.

### 4. **Use Cases**

-   **LM**: Used when the response variable is continuous, normally distributed, and has a linear relationship with predictors.
-   **GLM**: Used when the response variable does not fit these assumptions, such as binary outcomes (yes/no), counts, or proportions.

### 5. **Examples**

-   **LM**: Simple linear regression, multiple linear regression
-   **GLM**: Logistic regression, Poisson regression, negative binomial regression, etc.

In summary, GLMs generalize LMs by allowing for non-normal distributions and providing flexibility with link functions, making them more suitable for a wider range of data types and applications.

In summary, the GLM combines the linear predictor, link function, and probability distribution to model the relationship between the mean of the response variable and the predictors, allowing for flexibility in handling various data types. The specific form of the GLM will depend on the chosen link function and distribution.


## GLM Models in R

Before starting, ensure you have R and the necessary packages installed. Key packages include `stats` (for basic GLMs) and `mgcv` (for Generalized Additive Models). You may also need packages such as `MASS` for ordinal models and `betareg` for Beta regression.

``` r
# Install necessary packages if you haven't already
install.packages(c("MASS", "mgcv", "betareg"))
```

The basic form of the `glm()` function is:

> glm(formula , family= familytype(link=linkfunction), data=)

Family objects are a convenient way to specify the models used by functions like `glm()`. See `help(family)` for other allowable `link` functions for each family.

`binomial(link = "logit")`

`gaussian(link = "identity")`

`Gamma(link = "inverse")`

`inverse.gaussian(link = "1/mu\^2")`

`poisson(link = "log")`

`quasi(link = "identity", variance = "constant")`

`quasibinomial(link = "logit")`

`quasipoisson(link = "log")`

There are several GLM model families depending on the make-up of the response variable.

### 1. [Generalized Linear Regression (Gaussian)](https://github.com/zia207/r-colab/blob/main/NoteBook/Advance_Regression/02-01-01-glm-regression-r.ipynb)

A Gaussian GLM is essentially linear regression and is useful when the response variable is continuous and normally distributed.

``` r
# Example of linear regression
model_gaussian <- glm(y ~ x1 + x2, data = data, family = gaussian)
summary(model_gaussian)
```

### 2. [Logistic Regression (Binary)](https://github.com/zia207/r-colab/blob/main/NoteBook/Advance_Regression/02-01-02-glm-logistic-r.ipynb)

Logistic regression models binary outcomes (0 or 1) and uses the logit link function to model probabilities.

``` r
# Example of logistic regression
model_logistic <- glm(y ~ x1 + x2, data = data, family = binomial)
summary(model_logistic)
```

### 3. [Probit Regression Model](https://github.com/zia207/r-colab/blob/main/NoteBook/Advance_Regression/02-01-03-glm-probit-r.ipynb)

Probit regression is similar to logistic regression but uses the probit link function. It's useful for modeling binary outcomes when the probit function is a better fit than the logit.

``` r
# Example of probit regression
model_probit <- glm(y ~ x1 + x2, data = data, family = binomial(link = "probit"))
summary(model_probit)
```

### 4. [Ordinal Regression](https://github.com/zia207/r-colab/blob/main/NoteBook/Advance_Regression/02-01-04-glm-ordinal-r.ipynb)

Ordinal regression models ordered categorical outcomes. The `polr` function from the `MASS` package can be used for this purpose.

``` r
# Example of ordinal regression
library(MASS)
model_ordinal <- polr(y ~ x1 + x2, data = data, Hess = TRUE)
summary(model_ordinal)
```

### 5. [Multinomial Logistic Regression](https://github.com/zia207/r-colab/blob/main/NoteBook/Advance_Regression/02-01-05-glm-multinomial-logistic-r.ipynb)

For nominal categorical responses with more than two levels, multinomial logistic regression can be used. The `nnet` package provides `multinom` for fitting such models.

``` r
# Example of multinomial logistic regression
library(nnet)
model_multinom <- multinom(y ~ x1 + x2, data = data)
summary(model_multinom)
```

### 6. [Poisson Regression ](https://github.com/zia207/r-colab/blob/main/NoteBook/Advance_Regression/02-02-06-00-poisson-regression-introduction-r.ipynb)

Poisson regression is used for count data and models the log of the expected counts.

``` r
# Example of Poisson regression
model_poisson <- glm(y ~ x1 + x2, data = data, family = poisson)
summary(model_poisson)
```

### 7. [Gamma Regression](https://github.com/zia207/r-colab/blob/main/NoteBook/Advance_Regression/02-01-07-glm-gamma-regression-r.ipynb)

Gamma regression is useful for modeling positive continuous outcomes with skewness.

``` r
# Example of Gamma regression
model_gamma <- glm(y ~ x1 + x2, data = data, family = Gamma(link = "log"))
summary(model_gamma)
```

### 8. [Beta Regression](https://github.com/zia207/r-colab/blob/main/NoteBook/Advance_Regression/02-01-08-glm-gamma-regression-r.ipynb)

Beta regression is used for modeling continuous data bounded between 0 and 1. The `betareg` package provides the `betareg` function for this purpose.

``` r
# Example of Beta regression
library(betareg)
model_beta <- betareg(y ~ x1 + x2, data = data)
summary(model_beta)
```

### 9. [Generalized Additive Model (GAM) ](https://github.com/zia207/r-colab/blob/main/NoteBook/Advance_Regression/02-02-09-glm-gam-regression.ipynb)

GAMs allow for flexible relationships between predictors and the response by using smooth functions. The `mgcv` package’s `gam` function is used for GAMs.

``` r
# Example of GAM
library(mgcv)
model_gam <- gam(y ~ s(x1) + s(x2), data = data, family = gaussian)
summary(model_gam)
```


## Required R Packages

The following R packages are required for running the code examples in this tutorial:

Here's the organized list of R packages grouped by primary use, with duplicates removed:
Here's the comprehensive list with descriptions **and** references for all packages, organized by category:

---

### **Data Wrangling & Visualization**
- **`tidyverse`** (Wickham): *Meta-package for data science workflows (includes `dplyr`, `ggplot2`, `tidyr`). Streamlines data manipulation and visualization.*  
  [tidyverse.org](https://www.tidyverse.org/) | [CRAN](https://cran.r-project.org/web/packages/tidyverse)  
- **`plyr`** (Wickham): *Split-apply-combine workflows (predecessor to `dplyr`).*  
  [CRAN](https://cran.r-project.org/web/packages/plyr)  
- **`patchwork`** (Pedersen): *Combine multiple `ggplot2` plots into unified layouts.*  
  [CRAN](https://cran.r-project.org/web/packages/patchwork)  
- **`RColorBrewer`** (Neuwirth): *Color palettes for thematic maps and statistical graphics.*  
  [CRAN](https://cran.r-project.org/web/packages/RColorBrewer)  
- **`GGally`** (Schloerke): *Extend `ggplot2` with correlation matrices and multivariate plots.*  
  [CRAN](https://cran.r-project.org/web/packages/GGally)  

---

### **Exploratory Data Analysis (EDA)**
- **`dlookr`** (Ruy C): *Automate data quality checks, outlier detection, and EDA reports.*  
  [CRAN](https://cran.r-project.org/web/packages/dlookr)  
- **`DataExplorer`** (Peters): *Quickly profile datasets with automatic visualizations.*  
  [CRAN](https://cran.r-project.org/web/packages/DataExplorer)  

---

### **Statistical Tests & Diagnostics**
- **`rstatix`** (Kassambara): *Tidy-friendly interface for t-tests, ANOVA, and non-parametric tests.*  
  [CRAN](https://cran.r-project.org/web/packages/rstatix)  
- **`lmtest`** (Zeileis & Hothorn): *Diagnostic tests for linear models (e.g., Breusch-Pagan).*  
  [CRAN](https://cran.r-project.org/web/packages/lmtest)  
- **`generalhoslem`** (Kassambara): *Goodness-of-fit tests for logistic regression models.*  
  [CRAN](https://cran.r-project.org/web/packages/generalhoslem)  
- **`moments`** (Komsta & Novomestky): *Calculate skewness, kurtosis, and distribution moments.*  
  [CRAN](https://cran.r-project.org/web/packages/moments)  

---

### **Modeling & Regression**
- **`MASS`** (Venables & Ripley): *Robust regression, LDA, and negative binomial GLMs.*  
  [CRAN](https://cran.r-project.org/web/packages/MASS)  
- **`AER`** (Kleiber & Zeileis): *Applied econometric models (tobit, IV regression, etc.).*  
  [CRAN](https://cran.r-project.org/web/packages/AER)  
- **`VGAM`** (Yee): *Vector generalized linear/additive models for complex responses.*  
  [CRAN](https://cran.r-project.org/web/packages/VGAM)  
- **`pscl`** (Jackman): *Zero-inflated count models and hurdle regression.*  
  [CRAN](https://cran.r-project.org/web/packages/pscl)  
- **`betareg`** (Cribari-Neto & Zeileis): *Regression for proportional/rate data (0-1 outcomes).*  
  [CRAN](https://cran.r-project.org/web/packages/betareg)  
- **`glmnet`** (Friedman et al.): *Lasso, ridge, and elastic-net regularization for GLMs.*  
  [CRAN](https://cran.r-project.org/web/packages/glmnet)  
- **`nnet`** (Venables & Ripley): *Feed-forward neural networks and multinomial regression.*  
  [CRAN](https://cran.r-project.org/web/packages/nnet)  

---

### **Generalized Additive Models (GAMs)**
- **`mgcv`** (Wood): *GAMs with automatic smoothness selection via REML.*  
  [CRAN](https://cran.r-project.org/web/packages/mgcv)  
- **`gamlss`** (Rigby & Stasinopoulos): *Flexible GAMs for location, scale, and shape parameters.*  
  [CRAN](https://cran.r-project.org/web/packages/gamlss)  
- **`gam`** (Hastie & Tibshirani): *Original implementation of generalized additive models.*  
  [CRAN](https://cran.r-project.org/web/packages/gam)  
- **`gratia`** (Wood): *Diagnostic plots and utilities for `mgcv` models.*  
  [CRAN](https://cran.r-project.org/web/packages/gratia)  
- **`gamair`** (Wood): *Datasets companion for GAM modeling.*  
  [CRAN](https://cran.r-project.org/web/packages/gamair)  

---

### **Model Evaluation & Interpretation**
- **`performance`** (easystats): *Model diagnostics (R², RMSE, multicollinearity checks).*  
  [CRAN](https://cran.r-project.org/web/packages/performance)  
- **`Metrics`** (Hamner): *Common ML metrics (AUC, RMSE, MAE).*  
  [CRAN](https://cran.r-project.org/web/packages/Metrics)  
- **`metrica`** (Garcia): *Classification metrics (precision, recall, F1-score).*  
  [CRAN](https://cran.r-project.org/web/packages/metrica)  
- **`margins`** (Leeper): *Calculate marginal effects for regression models.*  
  [CRAN](https://cran.r-project.org/web/packages/margins)  
- **`marginaleffects`** (Arel-Bundock): *Predictions, contrasts, and slopes for models.*  
  [CRAN](https://cran.r-project.org/web/packages/marginaleffects)  
- **`ggeffects`** (Lüdecke): *Tidy marginal effects for plotting with `ggplot2`.*  
  [CRAN](https://cran.r-project.org/web/packages/ggeffects)  
- **`report`** (easystats): *Automatically generate model interpretation reports.*  
  [Documentation](https://easystats.github.io/report/)  

---

### **Visualization & Reporting**
- **`ggstatsplot`** (Patil): *`ggplot2` extensions with statistical annotations.*  
  [CRAN](https://cran.r-project.org/web/packages/ggstatsplot)  
- **`sjPlot`** (Lüdecke): *Visualize model coefficients and diagnostic plots.*  
  [CRAN](https://cran.r-project.org/web/packages/sjPlot)  
- **`ggpmisc`** (Aphalo): *Add statistical tables and annotations to plots.*  
  [CRAN](https://cran.r-project.org/web/packages/ggpmisc)  
- **`jtools`** (Long): *Simplify regression workflows with `summ()` and effect plots.*  
  [CRAN](https://cran.r-project.org/web/packages/jtools)  
- **pROC** (Robin et al.): Evaluates and visualizes ROC curves and AUC values.  
    Reference: [CRAN](https://cran.r-project.org/web/packages/pROC).
- **ROCR** (Sing et al.): Evaluates and visualizes ROC curves and AUC values.  
    [CRAN](https://cran.r-project.org/web/packages/ROCR) (Note: `pROC` is more popular).

---

### **Tables & Reporting**
- **`flextable`** (Gohel): *Create customizable tables for Word/HTML/PDF reports.*  
  [CRAN](https://cran.r-project.org/web/packages/flextable)  
- **`kable`** (Xie): *Simple table generator in `knitr`/R Markdown.*  
  [CRAN](https://cran.r-project.org/web/packages/knitr)  
- **`kableExtra`** (Zhu): *Enhance `kable` tables with styling and interactivity.*  
  [CRAN](https://cran.r-project.org/web/packages/kableExtra)  
- **`gt`** (Iannone): *Build publication-ready tables with a tidy syntax.*  
  [Documentation](https://gt.rstudio.com/)  
- **`gtsummary`** (Sjoberg): *Create demographic and regression summary tables.*  
  [Documentation](https://www.danieldsjoberg.com/gtsummary/)  

---

### **Robust Statistics**
- **`sandwich`** (Zeileis): *Robust covariance matrix estimators for model diagnostics.*  
  [CRAN](https://cran.r-project.org/web/packages/sandwich)  

---

### **Miscellaneous**
- **`agridat`** (Friendly): *Agricultural experiment datasets for statistical analysis.*  
  [CRAN](https://cran.r-project.org/web/packages/agridat)  
- **`epiDisplay`** (Tomas): *Tools for epidemiological data analysis and presentation.*  
  [CRAN](https://cran.r-project.org/web/packages/epiDisplay)  

---

### Key Notes:
1. Use `citation("package_name")` in R for academic references  
2. Packages like `tidyverse` and `gt` have dedicated documentation websites  
3. Conflict Alert: `plyr` and `dplyr` have overlapping functions (use `dplyr::` prefix if needed)

## Install Required Packages

``` r
# List of required packages
packages <- c(
  # Data Wrangling & Visualization
  "tidyverse", "plyr", "patchwork", "RColorBrewer", "GGally",
  
  # EDA
  "dlookr", "DataExplorer",
  
  # Statistical Tests
  "rstatix", "lmtest", "generalhoslem", "moments",
  
  # Modeling & Regression
  "MASS", "AER", "VGAM", "pscl", "betareg", "glmnet", "nnet",
  
  # GAMs
  "mgcv", "gamlss", "gam", "gratia", "gamair",
  
  # Model Evaluation
  "performance", "Metrics", "metrica", "margins", "marginaleffects", "ggeffects", "report",
  
  # Visualization & Reporting
  "ggstatsplot", "sjPlot", "ggpmisc", "jtools",
  
  # Tables
  "flextable", "knitr", "kableExtra", "gt", "gtsummary",
  
  # Robust Statistics
  "sandwich",
  
  # Miscellaneous
  "agridat", "epiDisplay"
)

# Install missing packages
new_packages <- packages[!(packages %in% installed.packages()[,"Package"])]
if(length(new_packages)) install.packages(new_packages)

# Verify installation
cat("Installed packages:\n")
print(sapply(packages, requireNamespace, quietly = TRUE))
```

## Load Packages

``` r
# List of packages
packages_to_load <- c(
  # Data Wrangling & Visualization
  "tidyverse", "plyr", "patchwork", "RColorBrewer", "GGally",
  
  # EDA
  "dlookr", "DataExplorer",
  
  # Statistical Tests
  "rstatix", "lmtest", "generalhoslem", "moments",
  
  # Modeling & Regression
  "MASS", "AER", "VGAM", "pscl", "betareg", "glmnet", "nnet",
  
  # GAMs
  "mgcv", "gamlss", "gam", "gratia", "gamair",
  
  # Model Evaluation
  "performance", "Metrics", "metrica", "margins", "marginaleffects", "ggeffects", "report",
  
  # Visualization & Reporting
  "ggstatsplot", "sjPlot", "ggpmisc", "jtools",
  
  # Tables
  "flextable", "knitr", "kableExtra", "gt", "gtsummary",
  
  # Robust Statistics
  "sandwich",
  
  # Miscellaneous
  "agridat", "epiDisplay"
)

# Load packages with error handling
loaded_packages <- sapply(packages_to_load, function(pkg) {
  tryCatch({
    suppressPackageStartupMessages(library(pkg, character.only = TRUE))
    TRUE
  }, error = function(e) FALSE)
})

# Check loading status
cat("\nLoading Status:\n")
print(loaded_packages)

# Show conflicts
cat("\nKey Conflicts:\n")
conflicts <- conflicts(detail = TRUE)
print(conflicts[names(conflicts) %in% unlist(packages_to_load)])

# Show final verification
cat("\nSuccessfully loaded", sum(loaded_packages), "of", length(loaded_packages), "packages\n")
if(any(!loaded_packages)) {
  cat("Failed to load:", names(loaded_packages)[!loaded_packages], "\n")
  cat("Install missing packages first using install.packages()\n")
}
```

## Summary

This tutorial covered various GLMs, each suited to different types of response data. Use `summary()` to inspect model results and diagnostics for each type of GLM. These models allow for a range of data structures and distributions, making them a versatile toolset in R for real-world applications.

## References

1. [An Introduction to Statistical Learning](https://www.stat.berkeley.edu/~rabbee/s154/ISLR_First_Printing.pdf)

2. [Generalized Linear Models With Examples in R](https://www.academia.edu/37886943/Springer_Texts_in_Statistics_Generalized_Linear_Models_With_Examples_in_R)

3. [6.1 - Introduction to GLMs}](https://online.stat.psu.edu/stat504/lesson/6/6.1)

4. [4 Generalized Linear Models](https://entnemdept.ufl.edu/Hahn/generalized-linear-models.html)

5. [Generalized Linear Model](https://www.sciencedirect.com/topics/mathematics/generalized-linear-model)






## Books

### **Generalized Linear Models (GLMs)**
1. **"Generalized Linear Models"** – *P. McCullagh and J.A. Nelder*  
   - A classic, foundational text on GLMs.  

2. **"Generalized Linear Models with Examples in R"** – *Peter K. Dunn and Gordon K. Smyth*  
   - Practical applications of GLMs, including Gamma and Beta models, using R.  

3. **"Generalized Linear Models and Extensions"** – *James W. Hardin and Joseph M. Hilbe*  
   - Covers extensions of GLMs, with real-world applications.  

4. **"An Introduction to Generalized Linear Models"** – *Annette J. Dobson and Adrian G. Barnett*  
   - A beginner-friendly introduction with examples in R.  

5. **"Extending the Linear Model with R: Generalized Linear, Mixed Effects and Nonparametric Regression Models"** – *Julian J. Faraway*  
   - Covers GLMs, mixed models, and nonparametric regression with R.  

6. **"Applied Regression Analysis and Generalized Linear Models"** – *John Fox*  
   - A comprehensive introduction to regression analysis, including GLMs.  

7. **"Regression Modeling Strategies: With Applications to Linear Models, Logistic Regression, and Survival Analysis"** – *Frank E. Harrell Jr.*  
   - A detailed discussion of various regression models, including logistic and Poisson regression.  

8. **"Categorical Data Analysis"** – *Alan Agresti*  
   - A detailed look at categorical data analysis, including GLMs.  


### **Logistic Regression and Multinomial Models**
9. **"An Introduction to Statistical Learning: with Applications in R"** – *Gareth James, Daniela Witten, Trevor Hastie, and Robert Tibshirani*  
   - Covers logistic regression and other machine learning techniques with R.  

10. **"Applied Logistic Regression"** – *David W. Hosmer Jr., Stanley Lemeshow, and Rodney X. Sturdivant*  
   - A practical guide to logistic regression, including real-world examples.  

11. **"Logistic Regression Models"** – *Joseph M. Hilbe*  
   - An in-depth exploration of logistic regression with applications in R.  



### **Poisson and Count Data Models**
12. **"Modeling Count Data"** – *Joseph M. Hilbe*  
   - A focused discussion on modeling count data, including Poisson regression.  

13. **"Zero-Inflated Models and Generalized Linear Mixed Models with R"** – *Alain F. Zuur and Elena N. Ieno*  
   - Covers Zero-Inflated Poisson models and generalized mixed models.  

14. **"Count Data Models with R"** – *John M. Hilbe*  
   - A detailed guide to count data models, including Hurdle and Zero-Inflated models.  

15. **"Statistical Methods for Rates and Proportions"** – *Joseph L. Fleiss, Bruce Levin, and Myunghee Cho Paik*  
   - Includes discussions on Poisson and binomial regression methods.  

16. **"Introduction to Probability Models"** – *Sheldon M. Ross*  
   - Covers the Poisson process and other probability models.  



### **Generalized Additive Models (GAMs)**
17. **"Generalized Additive Models"** – *Trevor Hastie and Robert Tibshirani*  
   - The foundational book introducing GAMs.  

18. **"Generalized Additive Models: An Introduction with R"** – *Simon N. Wood*  
   - Covers practical implementation of GAMs using the `mgcv` package in R.  

19. **"Introduction to Generalized Additive Models"** – *Gareth James*  
   - A beginner-friendly introduction to GAMs.  

