## Understanding the variance among houses

The variation among houses carries epidemiological meaning, as high variance leads to a higher R0. 

I hope to understand what we the estimated variance really represents through the simplified example below. The setting is as follows: 

Let us assume we have 30 houses from the same village. There is variation among the 30 houses. Overdispersion exists. We visit the houses at 8 different occasions, 4 times in month Jan, 4 in month July. Some houses also have mosquito net installed as a potential confounding factor. The mosquito count per visit follows a Poisson distribution with a rate parameter, which is a linear function of these explanatory variables. This model is a simplied version of our Burkina PSC dataset. 

Further, because it is a simulation, we know the values of the underlying parameters such that we can simulate the dataset. 

In [1]:
# ALL THE ESSENTIALS
require(compiler)
enableJIT(3)
require(lme4)
set.seed(111)

Loading required package: compiler


Loading required package: lme4
Loading required package: Matrix


### Simulating counts with known effects and parameters
#### Random effect 1: variation among houses
First we sample the intrinsic variation among our 20 houses. The variation follows a normal distribution with mean 0 and variance of 0.6. Note that this number is the epidemiologically meaningful parameter that we wish to estimate. 

In [2]:
# SAMPLE VARIATION AMONG HOUSES
num_house<-100
house_effect<-rnorm(num_house, mean=0, sd=sqrt(0.6))
# ALSO house_ID
house_ID<-1:num_house
cbind(house_ID, house_effect)

house_ID,house_effect
1,0.18220118
2,-0.25618690
3,-0.24138278
4,-1.78338928
5,-0.13236002
6,0.10865905
7,-1.15990170
8,-0.78248858
9,-0.73468604
10,-0.38262149


Somes houses have positive effects, driving the expected counts up. As we make multiple (8) visits to the houses, it is better to replicate these number 8 times and arrange them into a matrix, such that each column corresponds to the 8 visits to the same house

In [3]:
# EACH COLUMN IS A HOUSE
house_effect<-matrix(house_effect, nc=num_house, nr=8, byrow=T)
house_ID<-matrix(house_ID, nc=num_house, nr=8, byrow=T)
house_effect
house_ID

0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20
0.1822012,-0.2561869,-0.2413828,-1.783389,-0.13236,0.108659,-1.159902,-0.7824886,-0.734686,-0.3826215,...,1.588504,0.3801784,-1.341198,0.5506481,0.01070718,-1.085242,0.975313,-0.09874366,-0.5649804,-0.9383165
0.1822012,-0.2561869,-0.2413828,-1.783389,-0.13236,0.108659,-1.159902,-0.7824886,-0.734686,-0.3826215,...,1.588504,0.3801784,-1.341198,0.5506481,0.01070718,-1.085242,0.975313,-0.09874366,-0.5649804,-0.9383165
0.1822012,-0.2561869,-0.2413828,-1.783389,-0.13236,0.108659,-1.159902,-0.7824886,-0.734686,-0.3826215,...,1.588504,0.3801784,-1.341198,0.5506481,0.01070718,-1.085242,0.975313,-0.09874366,-0.5649804,-0.9383165
0.1822012,-0.2561869,-0.2413828,-1.783389,-0.13236,0.108659,-1.159902,-0.7824886,-0.734686,-0.3826215,...,1.588504,0.3801784,-1.341198,0.5506481,0.01070718,-1.085242,0.975313,-0.09874366,-0.5649804,-0.9383165
0.1822012,-0.2561869,-0.2413828,-1.783389,-0.13236,0.108659,-1.159902,-0.7824886,-0.734686,-0.3826215,...,1.588504,0.3801784,-1.341198,0.5506481,0.01070718,-1.085242,0.975313,-0.09874366,-0.5649804,-0.9383165
0.1822012,-0.2561869,-0.2413828,-1.783389,-0.13236,0.108659,-1.159902,-0.7824886,-0.734686,-0.3826215,...,1.588504,0.3801784,-1.341198,0.5506481,0.01070718,-1.085242,0.975313,-0.09874366,-0.5649804,-0.9383165
0.1822012,-0.2561869,-0.2413828,-1.783389,-0.13236,0.108659,-1.159902,-0.7824886,-0.734686,-0.3826215,...,1.588504,0.3801784,-1.341198,0.5506481,0.01070718,-1.085242,0.975313,-0.09874366,-0.5649804,-0.9383165
0.1822012,-0.2561869,-0.2413828,-1.783389,-0.13236,0.108659,-1.159902,-0.7824886,-0.734686,-0.3826215,...,1.588504,0.3801784,-1.341198,0.5506481,0.01070718,-1.085242,0.975313,-0.09874366,-0.5649804,-0.9383165


0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20
1,2,3,4,5,6,7,8,9,10,...,91,92,93,94,95,96,97,98,99,100
1,2,3,4,5,6,7,8,9,10,...,91,92,93,94,95,96,97,98,99,100
1,2,3,4,5,6,7,8,9,10,...,91,92,93,94,95,96,97,98,99,100
1,2,3,4,5,6,7,8,9,10,...,91,92,93,94,95,96,97,98,99,100
1,2,3,4,5,6,7,8,9,10,...,91,92,93,94,95,96,97,98,99,100
1,2,3,4,5,6,7,8,9,10,...,91,92,93,94,95,96,97,98,99,100
1,2,3,4,5,6,7,8,9,10,...,91,92,93,94,95,96,97,98,99,100
1,2,3,4,5,6,7,8,9,10,...,91,92,93,94,95,96,97,98,99,100


### Random effect 2: overdispersion
Overdispersion can be introduced as observation level random effects. Variance is 1. 

In [4]:
overdispersion_effect<-rnorm(num_house*8, mean=0, sd=1)

Let us also define the other fixed effects. 

#### Fixed effect 1: monthly variation
For further simply the problem we assume our samples are done in two months: Jan and July, representing the two seasons. For Jan, the effect is 0.5. For July, the effect is 4. The difference is 3.5

In [5]:
month<-c(rep('Jan', 4), rep('July', 4))
month<-matrix(month, nc=num_house, nr=8)
month_effect<-c(rep(0.5, 4), rep(4, 4))
month_effect<-matrix(month_effect, nc=num_house, nr=8)
month
month_effect

0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20
Jan,Jan,Jan,Jan,Jan,Jan,Jan,Jan,Jan,Jan,...,Jan,Jan,Jan,Jan,Jan,Jan,Jan,Jan,Jan,Jan
Jan,Jan,Jan,Jan,Jan,Jan,Jan,Jan,Jan,Jan,...,Jan,Jan,Jan,Jan,Jan,Jan,Jan,Jan,Jan,Jan
Jan,Jan,Jan,Jan,Jan,Jan,Jan,Jan,Jan,Jan,...,Jan,Jan,Jan,Jan,Jan,Jan,Jan,Jan,Jan,Jan
Jan,Jan,Jan,Jan,Jan,Jan,Jan,Jan,Jan,Jan,...,Jan,Jan,Jan,Jan,Jan,Jan,Jan,Jan,Jan,Jan
July,July,July,July,July,July,July,July,July,July,...,July,July,July,July,July,July,July,July,July,July
July,July,July,July,July,July,July,July,July,July,...,July,July,July,July,July,July,July,July,July,July
July,July,July,July,July,July,July,July,July,July,...,July,July,July,July,July,July,July,July,July,July
July,July,July,July,July,July,July,July,July,July,...,July,July,July,July,July,July,July,July,July,July


0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20
0.5,0.5,0.5,0.5,0.5,0.5,0.5,0.5,0.5,0.5,...,0.5,0.5,0.5,0.5,0.5,0.5,0.5,0.5,0.5,0.5
0.5,0.5,0.5,0.5,0.5,0.5,0.5,0.5,0.5,0.5,...,0.5,0.5,0.5,0.5,0.5,0.5,0.5,0.5,0.5,0.5
0.5,0.5,0.5,0.5,0.5,0.5,0.5,0.5,0.5,0.5,...,0.5,0.5,0.5,0.5,0.5,0.5,0.5,0.5,0.5,0.5
0.5,0.5,0.5,0.5,0.5,0.5,0.5,0.5,0.5,0.5,...,0.5,0.5,0.5,0.5,0.5,0.5,0.5,0.5,0.5,0.5
4.0,4.0,4.0,4.0,4.0,4.0,4.0,4.0,4.0,4.0,...,4.0,4.0,4.0,4.0,4.0,4.0,4.0,4.0,4.0,4.0
4.0,4.0,4.0,4.0,4.0,4.0,4.0,4.0,4.0,4.0,...,4.0,4.0,4.0,4.0,4.0,4.0,4.0,4.0,4.0,4.0
4.0,4.0,4.0,4.0,4.0,4.0,4.0,4.0,4.0,4.0,...,4.0,4.0,4.0,4.0,4.0,4.0,4.0,4.0,4.0,4.0
4.0,4.0,4.0,4.0,4.0,4.0,4.0,4.0,4.0,4.0,...,4.0,4.0,4.0,4.0,4.0,4.0,4.0,4.0,4.0,4.0


#### Fixed effect 2: mosquito net
Finally, we have the mosquito net, which is a binary factor. Let us consider a complete randomised design, that mosuqito net can be installed/removed from each house during the experiment. A mosquito net will marginally decrease the log mean count by 0.5. 

In [6]:
#k<-round(num_house/2)
#mosquito_net<-sample(c(rep('Yes', k), rep('No', num_house-k)))
#mosquito_net<-matrix(mosquito_net, nc=num_house, nr=8, byrow=T)
mosquito_net<-matrix(sample(c('Yes', 'No'), size=num_house*8, replace=T), nr=8)
mosquito_net
mosquito_net_effect<-(mosquito_net=='Yes')*(-0.5) # EFFECT IS -0.5
mosquito_net_effect

0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20
Yes,Yes,Yes,No,Yes,Yes,Yes,No,No,No,...,No,Yes,No,Yes,No,Yes,Yes,Yes,No,No
Yes,Yes,No,No,Yes,Yes,No,No,Yes,No,...,Yes,No,Yes,Yes,No,Yes,Yes,Yes,No,Yes
Yes,No,No,No,No,Yes,Yes,Yes,No,Yes,...,Yes,No,No,Yes,Yes,No,No,No,Yes,No
Yes,No,No,No,Yes,Yes,Yes,Yes,No,No,...,No,No,Yes,Yes,No,No,No,Yes,No,No
No,No,No,No,No,No,No,Yes,Yes,Yes,...,No,No,Yes,Yes,No,No,No,Yes,Yes,No
No,No,Yes,Yes,Yes,No,No,No,No,No,...,Yes,No,No,No,Yes,No,Yes,No,Yes,No
No,No,Yes,No,No,Yes,Yes,No,No,Yes,...,No,No,Yes,Yes,No,Yes,No,No,Yes,No
No,Yes,No,No,No,No,Yes,Yes,Yes,No,...,Yes,Yes,Yes,No,Yes,Yes,Yes,Yes,No,No


0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20
-0.5,-0.5,-0.5,0.0,-0.5,-0.5,-0.5,0.0,0.0,0.0,...,0.0,-0.5,0.0,-0.5,0.0,-0.5,-0.5,-0.5,0.0,0.0
-0.5,-0.5,0.0,0.0,-0.5,-0.5,0.0,0.0,-0.5,0.0,...,-0.5,0.0,-0.5,-0.5,0.0,-0.5,-0.5,-0.5,0.0,-0.5
-0.5,0.0,0.0,0.0,0.0,-0.5,-0.5,-0.5,0.0,-0.5,...,-0.5,0.0,0.0,-0.5,-0.5,0.0,0.0,0.0,-0.5,0.0
-0.5,0.0,0.0,0.0,-0.5,-0.5,-0.5,-0.5,0.0,0.0,...,0.0,0.0,-0.5,-0.5,0.0,0.0,0.0,-0.5,0.0,0.0
0.0,0.0,0.0,0.0,0.0,0.0,0.0,-0.5,-0.5,-0.5,...,0.0,0.0,-0.5,-0.5,0.0,0.0,0.0,-0.5,-0.5,0.0
0.0,0.0,-0.5,-0.5,-0.5,0.0,0.0,0.0,0.0,0.0,...,-0.5,0.0,0.0,0.0,-0.5,0.0,-0.5,0.0,-0.5,0.0
0.0,0.0,-0.5,0.0,0.0,-0.5,-0.5,0.0,0.0,-0.5,...,0.0,0.0,-0.5,-0.5,0.0,-0.5,0.0,0.0,-0.5,0.0
0.0,-0.5,0.0,0.0,0.0,0.0,-0.5,-0.5,-0.5,0.0,...,-0.5,-0.5,-0.5,0.0,-0.5,-0.5,-0.5,-0.5,0.0,0.0


#### Sample the counts and put everything into a data frame
In Poisson glm, the log of lambda is a linear combination of our factors. Let us work out the lambda for each house visit and then we draw Poisson counts from these lambdas

In [7]:
log_lambda<-house_effect+overdispersion_effect+month_effect+mosquito_net_effect
log_lambda
mosquito_count<-rpois(length(log_lambda), exp(log_lambda))

0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20
0.7818209,1.4782132,-1.5583796,0.2819368,-1.7292619,0.7831057,-1.2850257,-0.7098085,0.2370877,-0.2811265,...,0.9880065,1.5714329,-0.5161941,0.74065508,0.67839693,-0.8463077,-0.9546873,-0.7617622,0.6019834,-0.9806446
-0.9781283,-0.6060374,-1.1683179,-2.13050765,-0.2199486,0.7445796,0.1068987,0.1604907,-0.3451861,0.3925811,...,0.3167917,0.3925394,-1.1003602,0.15068141,-0.01231467,-2.3282912,-0.210153,-1.5323218,-1.2303508,0.386904
0.6212946,1.4429986,1.1380589,0.04427088,0.7284076,-0.5043113,-1.251422,-1.9702965,0.2872542,-1.6513403,...,-0.4816714,1.0479894,-1.4337662,0.14803982,-0.37278751,-1.2222982,1.0089023,1.7818815,-0.5524783,0.3000167
0.3870549,1.2677141,0.6526088,-0.80634295,-1.01232,0.5235504,-3.0342075,-0.1560631,0.6697668,0.1072972,...,2.4082128,2.1728712,-1.0778023,0.06367257,-1.08890537,-2.1386386,1.6215126,-1.0556069,-1.1446033,-0.5535393
3.4830198,3.6975379,4.4475427,2.40186166,0.544305,4.9860024,2.1759363,3.079158,4.4621049,3.7707802,...,5.8363414,5.2416996,0.2050719,3.15880275,4.40449132,2.5436932,5.3873421,2.3705292,3.1958533,2.8159049
3.2555755,5.1303215,5.6123337,1.96990561,2.9001245,4.1298166,3.0435111,2.7445508,2.3809765,3.7159585,...,5.781859,6.3756143,0.8711538,3.14200598,5.22712592,3.1053729,5.2559298,2.7259073,2.0697897,3.5299622
3.1687188,1.5485404,2.2435196,4.80883807,4.2991803,5.4190423,-0.2543451,4.1624051,4.1218037,4.0199955,...,4.4589167,3.6128137,1.1248611,4.27558168,3.74134505,2.8404798,4.8986723,4.0254016,3.193738,3.6646006
4.7871882,3.1162562,4.0495947,3.29095771,3.2636505,3.6565633,2.2463644,3.6239137,2.9402564,3.097481,...,6.1620013,3.1400845,0.8178246,5.94224514,4.92833401,3.7615401,4.6330418,3.2889985,4.531797,3.7935031


In [8]:
# PUT EVERYTHING INTO A DATA FRAME
dat<-data.frame(mosquito_count=as.vector(mosquito_count), 
                house_ID=as.vector(house_ID), 
                month=as.vector(month), mosquito_net=as.vector(mosquito_net))
# SET VARIABLES AS FACTORS
dat$house_ID<-factor(dat$house_ID)
dat$month<-factor(dat$month, levels=c('Jan', 'July'))
dat$mosquito_net<-factor(dat$mosquito_net, levels=c('No', 'Yes'))
dim(dat)
head(dat)

mosquito_count,house_ID,month,mosquito_net
0,1,Jan,Yes
1,1,Jan,Yes
1,1,Jan,Yes
2,1,Jan,Yes
38,1,July,No
21,1,July,No


### Re-estimating the parameters
Now we have our (simulated) dataset and let us re-estimate the parameters. 

This first model is without any fixed effects. We 

In [9]:
# RANDOM EFFECTS ONLY
overdispersion<-1:nrow(dat)
m<-glmer(mosquito_count~(1|house_ID)+(1|overdispersion), 
        data=dat, family='poisson', 
                    control=glmerControl(optimizer="bobyqa",optCtrl=list(maxfun=2e5)))
summary(m)

Generalized linear mixed model fit by maximum likelihood (Laplace
  Approximation) [glmerMod]
 Family: poisson  ( log )
Formula: mosquito_count ~ (1 | house_ID) + (1 | overdispersion)
   Data: dat
Control: glmerControl(optimizer = "bobyqa", optCtrl = list(maxfun = 2e+05))

     AIC      BIC   logLik deviance df.resid 
  6869.9   6883.9  -3431.9   6863.9      797 

Scaled residuals: 
     Min       1Q   Median       3Q      Max 
-0.75928 -0.29023  0.00023  0.04850  0.06084 

Random effects:
 Groups         Name        Variance Std.Dev.
 overdispersion (Intercept) 4.9380   2.2222  
 house_ID       (Intercept) 0.1407   0.3751  
Number of obs: 800, groups:  overdispersion, 800; house_ID, 100

Fixed effects:
            Estimate Std. Error z value Pr(>|z|)    
(Intercept)  1.96756    0.09063   21.71   <2e-16 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

The model significantly over-estimated the overdispersion (obervation level) variance. Next we introduce the first fixed effect: 

In [10]:
# WITH MONTH
overdispersion<-1:nrow(dat)
m<-glmer(mosquito_count~(1|house_ID)+(1|overdispersion)+month, 
        data=dat, family='poisson', 
                    control=glmerControl(optimizer="bobyqa",optCtrl=list(maxfun=2e5)))
summary(m)

Generalized linear mixed model fit by maximum likelihood (Laplace
  Approximation) [glmerMod]
 Family: poisson  ( log )
Formula: mosquito_count ~ (1 | house_ID) + (1 | overdispersion) + month
   Data: dat
Control: glmerControl(optimizer = "bobyqa", optCtrl = list(maxfun = 2e+05))

     AIC      BIC   logLik deviance df.resid 
  5864.9   5883.6  -2928.4   5856.9      796 

Scaled residuals: 
     Min       1Q   Median       3Q      Max 
-1.48230 -0.29804 -0.02508  0.08751  1.27156 

Random effects:
 Groups         Name        Variance Std.Dev.
 overdispersion (Intercept) 0.9080   0.9529  
 house_ID       (Intercept) 0.7252   0.8516  
Number of obs: 800, groups:  overdispersion, 800; house_ID, 100

Fixed effects:
            Estimate Std. Error z value Pr(>|z|)    
(Intercept)   0.2024     0.1092   1.854   0.0638 .  
monthJuly     3.5791     0.0833  42.967   <2e-16 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Correlation of Fixed Effects:
          (Intr)
month

Now the estimates are more reasonable. The two variance estimates are close to the true values (1 and 0.6). The difference between the two months is estimated correctly as well (true difference = 3.5). 

Let us add the second fixed effect, the mosquito net. 

In [11]:
# THE FULL MODEL
overdispersion<-1:nrow(dat)
m<-glmer(mosquito_count~(1|house_ID)+(1|overdispersion)+month+mosquito_net, 
        data=dat, family='poisson', 
                    control=glmerControl(optimizer="bobyqa",optCtrl=list(maxfun=2e5)))
summary(m)

Generalized linear mixed model fit by maximum likelihood (Laplace
  Approximation) [glmerMod]
 Family: poisson  ( log )
Formula: mosquito_count ~ (1 | house_ID) + (1 | overdispersion) + month +  
    mosquito_net
   Data: dat
Control: glmerControl(optimizer = "bobyqa", optCtrl = list(maxfun = 2e+05))

     AIC      BIC   logLik deviance df.resid 
  5840.1   5863.5  -2915.0   5830.1      795 

Scaled residuals: 
     Min       1Q   Median       3Q      Max 
-1.44820 -0.31145 -0.03464  0.09025  1.11285 

Random effects:
 Groups         Name        Variance Std.Dev.
 overdispersion (Intercept) 0.8695   0.9325  
 house_ID       (Intercept) 0.7028   0.8384  
Number of obs: 800, groups:  overdispersion, 800; house_ID, 100

Fixed effects:
                Estimate Std. Error z value Pr(>|z|)    
(Intercept)      0.41520    0.11392   3.645 0.000268 ***
monthJuly        3.57218    0.08197  43.581  < 2e-16 ***
mosquito_netYes -0.42316    0.08115  -5.215 1.84e-07 ***
---
Signif. codes:  0 '***' 0.

All parameters were estimated correctly. 

### [Extra 1] Permanent mosquito nets
Previously I assume mosquito nets can be installed/removed at any time during the experiment. This is a more balanced experimental design, but is less realistic. 

Below I consider a more realistic scenario that mosquito net is permanent, that the absense/precence of mosquito net per house will not change over time. We randomly install mosquito nets to half of the houses. 

In [12]:
# PERMANENT MOSQUITO NET
k<-round(num_house/2)
mosquito_net<-sample(c(rep('Yes', k), rep('No', num_house-k)))
mosquito_net<-matrix(mosquito_net, nc=num_house, nr=8, byrow=T)
mosquito_net
mosquito_net_effect<-(mosquito_net=='Yes')*(-0.5) # EFFECT IS -0.5
mosquito_net_effect
# CALCULATE LOG MEAN COUNT
log_lambda<-house_effect+overdispersion_effect+month_effect+mosquito_net_effect
# SAMEPLE POISSON COUNT
mosquito_count<-rpois(length(log_lambda), exp(log_lambda))
# PUT EVERYTHING INTO A DATA FRAME
dat<-data.frame(mosquito_count=as.vector(mosquito_count), 
                house_ID=as.vector(house_ID), 
                month=as.vector(month), mosquito_net=as.vector(mosquito_net))
# SET VARIABLES AS FACTORS
dat$house_ID<-factor(dat$house_ID)
dat$month<-factor(dat$month, levels=c('Jan', 'July'))
dat$mosquito_net<-factor(dat$mosquito_net, levels=c('No', 'Yes'))
dim(dat)

0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20
Yes,No,No,Yes,No,No,No,Yes,Yes,No,...,No,No,No,No,No,Yes,Yes,No,Yes,No
Yes,No,No,Yes,No,No,No,Yes,Yes,No,...,No,No,No,No,No,Yes,Yes,No,Yes,No
Yes,No,No,Yes,No,No,No,Yes,Yes,No,...,No,No,No,No,No,Yes,Yes,No,Yes,No
Yes,No,No,Yes,No,No,No,Yes,Yes,No,...,No,No,No,No,No,Yes,Yes,No,Yes,No
Yes,No,No,Yes,No,No,No,Yes,Yes,No,...,No,No,No,No,No,Yes,Yes,No,Yes,No
Yes,No,No,Yes,No,No,No,Yes,Yes,No,...,No,No,No,No,No,Yes,Yes,No,Yes,No
Yes,No,No,Yes,No,No,No,Yes,Yes,No,...,No,No,No,No,No,Yes,Yes,No,Yes,No
Yes,No,No,Yes,No,No,No,Yes,Yes,No,...,No,No,No,No,No,Yes,Yes,No,Yes,No


0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20
-0.5,0,0,-0.5,0,0,0,-0.5,-0.5,0,...,0,0,0,0,0,-0.5,-0.5,0,-0.5,0
-0.5,0,0,-0.5,0,0,0,-0.5,-0.5,0,...,0,0,0,0,0,-0.5,-0.5,0,-0.5,0
-0.5,0,0,-0.5,0,0,0,-0.5,-0.5,0,...,0,0,0,0,0,-0.5,-0.5,0,-0.5,0
-0.5,0,0,-0.5,0,0,0,-0.5,-0.5,0,...,0,0,0,0,0,-0.5,-0.5,0,-0.5,0
-0.5,0,0,-0.5,0,0,0,-0.5,-0.5,0,...,0,0,0,0,0,-0.5,-0.5,0,-0.5,0
-0.5,0,0,-0.5,0,0,0,-0.5,-0.5,0,...,0,0,0,0,0,-0.5,-0.5,0,-0.5,0
-0.5,0,0,-0.5,0,0,0,-0.5,-0.5,0,...,0,0,0,0,0,-0.5,-0.5,0,-0.5,0
-0.5,0,0,-0.5,0,0,0,-0.5,-0.5,0,...,0,0,0,0,0,-0.5,-0.5,0,-0.5,0


0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20
0.7818209,1.9782132,-1.0583796,-0.2180632,-1.2292619,1.283105739,-0.7850257,-1.2098085,-0.2629123,-0.2811265,...,0.9880065,2.0714329,-0.5161941,1.2406551,0.67839693,-0.8463077,-0.9546873,-0.2617622,0.1019834,-0.9806446
-0.9781283,-0.1060374,-1.1683179,-2.6305076,0.2800514,1.244579582,0.1068987,-0.3395093,-0.3451861,0.3925811,...,0.8167917,0.3925394,-0.6003602,0.6506814,-0.01231467,-2.3282912,-0.210153,-1.0323218,-1.7303508,0.886904
0.6212946,1.4429986,1.1380589,-0.4557291,0.7284076,-0.004311342,-0.751422,-1.9702965,-0.2127458,-1.1513403,...,0.0183286,1.0479894,-1.4337662,0.6480398,0.12721249,-1.7222982,0.5089023,1.7818815,-0.5524783,0.3000167
0.3870549,1.2677141,0.6526088,-1.3063429,-0.51232,1.023550391,-2.5342075,-0.1560631,0.1697668,0.1072972,...,2.4082128,2.1728712,-0.5778023,0.5636726,-1.08890537,-2.6386386,1.1215126,-0.5556069,-1.6446033,-0.5535393
2.9830198,3.6975379,4.4475427,1.9018617,0.544305,4.986002419,2.1759363,3.079158,4.4621049,4.2707802,...,5.8363414,5.2416996,0.7050719,3.6588027,4.40449132,2.0436932,4.8873421,2.8705292,3.1958533,2.8159049
2.7555755,5.1303215,6.1123337,1.9699056,3.4001245,4.129816629,3.0435111,2.2445508,1.8809765,3.7159585,...,6.281859,6.3756143,0.8711538,3.142006,5.72712592,2.6053729,5.2559298,2.7259073,2.0697897,3.5299622
2.6687188,1.5485404,2.7435196,4.3088381,4.2991803,5.919042341,0.2456549,3.6624051,3.6218037,4.5199955,...,4.4589167,3.6128137,1.6248611,4.7755817,3.74134505,2.8404798,4.3986723,4.0254016,3.193738,3.6646006
4.2871882,3.6162562,4.0495947,2.7909577,3.2636505,3.656563342,2.7463644,3.6239137,2.9402564,3.097481,...,6.6620013,3.6400845,1.3178246,5.9422451,5.42833401,3.7615401,4.6330418,3.7889985,4.031797,3.7935031


In [13]:
# ESTIMATE PARAMETERS 
overdispersion<-1:nrow(dat)
m<-glmer(mosquito_count~(1|house_ID)+(1|overdispersion)+month+mosquito_net, 
        data=dat, family='poisson', 
                    control=glmerControl(optimizer="bobyqa",optCtrl=list(maxfun=2e5)))
summary(m)

Generalized linear mixed model fit by maximum likelihood (Laplace
  Approximation) [glmerMod]
 Family: poisson  ( log )
Formula: mosquito_count ~ (1 | house_ID) + (1 | overdispersion) + month +  
    mosquito_net
   Data: dat
Control: glmerControl(optimizer = "bobyqa", optCtrl = list(maxfun = 2e+05))

     AIC      BIC   logLik deviance df.resid 
  5865.8   5889.2  -2927.9   5855.8      795 

Scaled residuals: 
     Min       1Q   Median       3Q      Max 
-1.16166 -0.32925 -0.01778  0.10307  1.14891 

Random effects:
 Groups         Name        Variance Std.Dev.
 overdispersion (Intercept) 0.8899   0.9434  
 house_ID       (Intercept) 0.6162   0.7850  
Number of obs: 800, groups:  overdispersion, 800; house_ID, 100

Fixed effects:
                Estimate Std. Error z value Pr(>|z|)    
(Intercept)      0.48643    0.13406   3.628 0.000285 ***
monthJuly        3.52748    0.08214  42.942  < 2e-16 ***
mosquito_netYes -0.47403    0.17524  -2.705 0.006829 ** 
---
Signif. codes:  0 '***' 0.

The parameter estimates are still ok. 

### [Extra 2] Confounding mosquito net
Below is an extreme case where we intentionally install mosquito nets to high risk houses. It is a plausible scenario for government or locals to distribution mosuqito nets to houses with highest exposure. It is just a natual thing to do. To model this, I give mosquito nets to houses with positive random effects. 

In [14]:
# CONFOUNDING MOSQUITO NET
mosquito_net<-matrix('No', nc=num_house, nr=8)
mosquito_net[house_effect>=0]<-'Yes' # IF HOUSE EFFECT >0, GIVE MOSQUITO NET
mosquito_net_effect<-(mosquito_net=='Yes')*(-0.5) 
mosquito_net
mosquito_net_effect
# CALCULATE LOG MEAN COUNT
log_lambda<-house_effect+overdispersion_effect+month_effect+mosquito_net_effect
log_lambda
mosquito_count<-rpois(length(log_lambda), exp(log_lambda))
# PUT EVERYTHING INTO A DATA FRAME
dat<-data.frame(mosquito_count=as.vector(mosquito_count), 
                house_ID=as.vector(house_ID), 
                month=as.vector(month), mosquito_net=as.vector(mosquito_net))
# SET VARIABLES AS FACTORS
dat$house_ID<-factor(dat$house_ID)
dat$month<-factor(dat$month, levels=c('Jan', 'July'))
dat$mosquito_net<-factor(dat$mosquito_net, levels=c('No', 'Yes'))
dim(dat)

0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20
Yes,No,No,No,No,Yes,No,No,No,No,...,Yes,Yes,No,Yes,Yes,No,Yes,No,No,No
Yes,No,No,No,No,Yes,No,No,No,No,...,Yes,Yes,No,Yes,Yes,No,Yes,No,No,No
Yes,No,No,No,No,Yes,No,No,No,No,...,Yes,Yes,No,Yes,Yes,No,Yes,No,No,No
Yes,No,No,No,No,Yes,No,No,No,No,...,Yes,Yes,No,Yes,Yes,No,Yes,No,No,No
Yes,No,No,No,No,Yes,No,No,No,No,...,Yes,Yes,No,Yes,Yes,No,Yes,No,No,No
Yes,No,No,No,No,Yes,No,No,No,No,...,Yes,Yes,No,Yes,Yes,No,Yes,No,No,No
Yes,No,No,No,No,Yes,No,No,No,No,...,Yes,Yes,No,Yes,Yes,No,Yes,No,No,No
Yes,No,No,No,No,Yes,No,No,No,No,...,Yes,Yes,No,Yes,Yes,No,Yes,No,No,No


0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20
-0.5,0,0,0,0,-0.5,0,0,0,0,...,-0.5,-0.5,0,-0.5,-0.5,0,-0.5,0,0,0
-0.5,0,0,0,0,-0.5,0,0,0,0,...,-0.5,-0.5,0,-0.5,-0.5,0,-0.5,0,0,0
-0.5,0,0,0,0,-0.5,0,0,0,0,...,-0.5,-0.5,0,-0.5,-0.5,0,-0.5,0,0,0
-0.5,0,0,0,0,-0.5,0,0,0,0,...,-0.5,-0.5,0,-0.5,-0.5,0,-0.5,0,0,0
-0.5,0,0,0,0,-0.5,0,0,0,0,...,-0.5,-0.5,0,-0.5,-0.5,0,-0.5,0,0,0
-0.5,0,0,0,0,-0.5,0,0,0,0,...,-0.5,-0.5,0,-0.5,-0.5,0,-0.5,0,0,0
-0.5,0,0,0,0,-0.5,0,0,0,0,...,-0.5,-0.5,0,-0.5,-0.5,0,-0.5,0,0,0
-0.5,0,0,0,0,-0.5,0,0,0,0,...,-0.5,-0.5,0,-0.5,-0.5,0,-0.5,0,0,0


0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20
0.7818209,1.9782132,-1.0583796,0.2819368,-1.2292619,0.7831057,-0.7850257,-0.7098085,0.2370877,-0.2811265,...,0.4880065,1.5714329,-0.5161941,0.74065508,0.1783969,-0.3463077,-0.9546873,-0.2617622,0.60198342,-0.9806446
-0.9781283,-0.1060374,-1.1683179,-2.13050765,0.2800514,0.7445796,0.1068987,0.1604907,0.1548139,0.3925811,...,0.3167917,-0.1074606,-0.6003602,0.15068141,-0.5123147,-1.8282912,-0.210153,-1.0323218,-1.23035082,0.886904
0.6212946,1.4429986,1.1380589,0.04427088,0.7284076,-0.5043113,-0.751422,-1.4702965,0.2872542,-1.1513403,...,-0.4816714,0.5479894,-1.4337662,0.14803982,-0.3727875,-1.2222982,0.5089023,1.7818815,-0.05247828,0.3000167
0.3870549,1.2677141,0.6526088,-0.80634295,-0.51232,0.5235504,-2.5342075,0.3439369,0.6697668,0.1072972,...,1.9082128,1.6728712,-0.5778023,0.06367257,-1.5889054,-2.1386386,1.1215126,-0.5556069,-1.14460331,-0.5535393
2.9830198,3.6975379,4.4475427,2.40186166,0.544305,4.4860024,2.1759363,3.579158,4.9621049,4.2707802,...,5.3363414,4.7416996,0.7050719,3.15880275,3.9044913,2.5436932,4.8873421,2.8705292,3.69585333,2.8159049
2.7555755,5.1303215,6.1123337,2.46990561,3.4001245,3.6298166,3.0435111,2.7445508,2.3809765,3.7159585,...,5.781859,5.8756143,0.8711538,2.64200598,5.2271259,3.1053729,5.2559298,2.7259073,2.56978966,3.5299622
2.6687188,1.5485404,2.7435196,4.80883807,4.2991803,5.4190423,0.2456549,4.1624051,4.1218037,4.5199955,...,3.9589167,3.1128137,1.6248611,4.27558168,3.2413451,3.3404798,4.3986723,4.0254016,3.69373804,3.6646006
4.2871882,3.6162562,4.0495947,3.29095771,3.2636505,3.1565633,2.7463644,4.1239137,3.4402564,3.097481,...,6.1620013,3.1400845,1.3178246,5.44224514,4.928334,4.2615401,4.6330418,3.7889985,4.53179697,3.7935031


In [15]:
# FIT MODEL
overdispersion<-1:nrow(dat)
m<-glmer(mosquito_count~(1|house_ID)+(1|overdispersion)+month+mosquito_net, 
        data=dat, family='poisson', 
                    control=glmerControl(optimizer="bobyqa",optCtrl=list(maxfun=2e5)))
summary(m)

Generalized linear mixed model fit by maximum likelihood (Laplace
  Approximation) [glmerMod]
 Family: poisson  ( log )
Formula: mosquito_count ~ (1 | house_ID) + (1 | overdispersion) + month +  
    mosquito_net
   Data: dat
Control: glmerControl(optimizer = "bobyqa", optCtrl = list(maxfun = 2e+05))

     AIC      BIC   logLik deviance df.resid 
  5757.5   5780.9  -2873.7   5747.5      795 

Scaled residuals: 
     Min       1Q   Median       3Q      Max 
-1.18689 -0.31374 -0.00347  0.10315  1.11023 

Random effects:
 Groups         Name        Variance Std.Dev.
 overdispersion (Intercept) 0.9114   0.9547  
 house_ID       (Intercept) 0.2341   0.4838  
Number of obs: 800, groups:  overdispersion, 800; house_ID, 100

Fixed effects:
                Estimate Std. Error z value Pr(>|z|)    
(Intercept)     -0.19579    0.10883  -1.799    0.072 .  
monthJuly        3.58777    0.08321  43.118  < 2e-16 ***
mosquito_netYes  0.72474    0.12499   5.798  6.7e-09 ***
---
Signif. codes:  0 '***' 0.

The house variance is severly under-estimated because of the confounding factor. The mosquito net effect estimate is completely wrong as well. 