# Model Selection


------------------------------------------

__Fundamental of Poisson Regression:__

---------------------------

Poisson Regression models the expected count of events $Y$ as a function of predictor variables $X$ using the Poisson distribution:
$$
Y\sim Poisson(\lambda)
$$
Where $\lambda$ represents the expected count of events. The relationship between the predictor variables and $\lambda$ is modeled using the logarithm link function:

$$
log(\lambda) = \beta_0+\beta_1X_1+\beta_2X_2+ ... +\beta_kX_k
$$

where $\beta_0,\beta_1,...,\beta_k$ are the coefficients estimated by the regression and $X_1,X_2,...,X_k$ are the predictors. The interpretation of the coefficients is in terms of relative changes in the expected count of events, $\lambda$. For example, if $\beta_1$ ​ is positive, it indicates that for a one-unit increase in $X_1$​ , the expected count of events increases/decreases by a factor of $e^{\beta_1}$ with differences up to a sign change of $\beta_1$ ​, after holding other variables constant.

In [5]:
library(MASS)

In [6]:
library(boot)

In [7]:
# Fit poisson regression model with chosen variables
m1 <- glm( protests~year+month+prov+pop, data=df, family=poisson(link = "log") )
summary(m1)


Call:
glm(formula = protests ~ year + month + prov + pop, family = poisson(link = "log"), 
    data = df)

Coefficients:
                                Estimate Std. Error z value Pr(>|z|)    
(Intercept)                   -177.86192   74.54008  -2.386 0.017027 *  
year2023                        -0.29868    0.13766  -2.170 0.030034 *  
monthAugust                     -0.75206    0.09666  -7.780 7.23e-15 ***
monthDecember                   -0.73557    0.13446  -5.471 4.48e-08 ***
monthFebruary                    0.09703    0.07590   1.278 0.201140    
monthJanuary                    -0.30972    0.08397  -3.688 0.000226 ***
monthJuly                       -0.63486    0.09365  -6.779 1.21e-11 ***
monthJune                       -0.33059    0.08008  -4.128 3.65e-05 ***
monthMarch                      -0.05841    0.07870  -0.742 0.457968    
monthMay                        -0.08094    0.07475  -1.083 0.278928    
monthNovember                   -0.27110    0.11281  -2.403 0.016257 *  
mo

----------------------------------------
From the summary function, it's evident that when selecting a significance level of $\alpha = 0.05$, the variables: provBritish Columbia, monthSeptember, monthMay, monthMarch, and monthFebruary are deemed non-significant, meaning that we fail to reject the hypothesis:

$$
H_0: \beta_k = 0 \text{   vs.   } H_1: \beta_k \neq 0
$$