## Understanding the variance among houses

### Settings
The variation among houses carries epidemiological meaning, as R0 increases with the variance. 

I hope to understand what we the estimated variance really represents through the simplified example below. The setting is as follows: 

Let us assume we sameple from houses within the same village. These houses have some intrinsic variations. Overdispersion exists. We visit each house at 8 different occasions, 4 times in month Jan (representing dry month), 4 in month July (wet). Some houses also have mosquito net installed as a potential confounding factor. We will study several ways on which the mosquito nets are distributed (i.e. correlation with the houses' intrinsic variation). The mosquito count per visit follows a Poisson distribution with a rate parameter, which is a linear function of these explanatory variables. This model is a simplied version of our Burkina PSC dataset, but still preserves most characteristics. 

Further, because it is a simulation, we know the true values of all the underlying parameters. 

### Which variance?


In [1]:
# ALL THE ESSENTIALS
require(compiler)
enableJIT(3)
require(lme4)
set.seed(111)

Loading required package: compiler


Loading required package: lme4
Loading required package: Matrix


### Simulating counts with known effects and parameters
#### Random effect 1: variation among houses
First we sample the intrinsic variation among our houses. The variation follows a normal distribution with mean 0 and variance of 0.6. 

A key question is whether this variance of 0.6 is the epidemiologically meaningful parameter that we wish to estimate. 

In [2]:
# SAMPLE VARIATION AMONG HOUSES
num_house<-100
house_effect<-rnorm(num_house, mean=0, sd=sqrt(0.6))
var(house_effect) # SAMPLE VARIANCE OF HOUSE EFFECT
# ALSO house_ID
house_ID<-1:num_house
cbind(house_ID, house_effect)[1:5,] # SHOW THE FIRST 5 ROWS

house_ID,house_effect
1,0.1822012
2,-0.2561869
3,-0.2413828
4,-1.7833893
5,-0.13236


Somes houses have positive effects, meaning they will have a higher expected counts. As we make multiple (8) visits to the houses, it is better to replicate these numbers 8 times and arrange them into a matrix, such that each column corresponds to a house:

In [3]:
# EACH COLUMN IS A HOUSE
house_effect<-matrix(house_effect, nc=num_house, nr=8, byrow=T)
house_ID<-matrix(house_ID, nc=num_house, nr=8, byrow=T)
house_effect
house_ID

0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20
0.1822012,-0.2561869,-0.2413828,-1.783389,-0.13236,0.108659,-1.159902,-0.7824886,-0.734686,-0.3826215,...,1.588504,0.3801784,-1.341198,0.5506481,0.01070718,-1.085242,0.975313,-0.09874366,-0.5649804,-0.9383165
0.1822012,-0.2561869,-0.2413828,-1.783389,-0.13236,0.108659,-1.159902,-0.7824886,-0.734686,-0.3826215,...,1.588504,0.3801784,-1.341198,0.5506481,0.01070718,-1.085242,0.975313,-0.09874366,-0.5649804,-0.9383165
0.1822012,-0.2561869,-0.2413828,-1.783389,-0.13236,0.108659,-1.159902,-0.7824886,-0.734686,-0.3826215,...,1.588504,0.3801784,-1.341198,0.5506481,0.01070718,-1.085242,0.975313,-0.09874366,-0.5649804,-0.9383165
0.1822012,-0.2561869,-0.2413828,-1.783389,-0.13236,0.108659,-1.159902,-0.7824886,-0.734686,-0.3826215,...,1.588504,0.3801784,-1.341198,0.5506481,0.01070718,-1.085242,0.975313,-0.09874366,-0.5649804,-0.9383165
0.1822012,-0.2561869,-0.2413828,-1.783389,-0.13236,0.108659,-1.159902,-0.7824886,-0.734686,-0.3826215,...,1.588504,0.3801784,-1.341198,0.5506481,0.01070718,-1.085242,0.975313,-0.09874366,-0.5649804,-0.9383165
0.1822012,-0.2561869,-0.2413828,-1.783389,-0.13236,0.108659,-1.159902,-0.7824886,-0.734686,-0.3826215,...,1.588504,0.3801784,-1.341198,0.5506481,0.01070718,-1.085242,0.975313,-0.09874366,-0.5649804,-0.9383165
0.1822012,-0.2561869,-0.2413828,-1.783389,-0.13236,0.108659,-1.159902,-0.7824886,-0.734686,-0.3826215,...,1.588504,0.3801784,-1.341198,0.5506481,0.01070718,-1.085242,0.975313,-0.09874366,-0.5649804,-0.9383165
0.1822012,-0.2561869,-0.2413828,-1.783389,-0.13236,0.108659,-1.159902,-0.7824886,-0.734686,-0.3826215,...,1.588504,0.3801784,-1.341198,0.5506481,0.01070718,-1.085242,0.975313,-0.09874366,-0.5649804,-0.9383165


0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20
1,2,3,4,5,6,7,8,9,10,...,91,92,93,94,95,96,97,98,99,100
1,2,3,4,5,6,7,8,9,10,...,91,92,93,94,95,96,97,98,99,100
1,2,3,4,5,6,7,8,9,10,...,91,92,93,94,95,96,97,98,99,100
1,2,3,4,5,6,7,8,9,10,...,91,92,93,94,95,96,97,98,99,100
1,2,3,4,5,6,7,8,9,10,...,91,92,93,94,95,96,97,98,99,100
1,2,3,4,5,6,7,8,9,10,...,91,92,93,94,95,96,97,98,99,100
1,2,3,4,5,6,7,8,9,10,...,91,92,93,94,95,96,97,98,99,100
1,2,3,4,5,6,7,8,9,10,...,91,92,93,94,95,96,97,98,99,100


#### Random effect 2: overdispersion
Overdispersion can be introduced as observation level random effects. Variance is 1. 

In [4]:
overdispersion_effect<-matrix(rnorm(num_house*8, mean=0, sd=1), nr=8)
head(overdispersion_effect)
var(as.vector(overdispersion_effect)) # SAMPLE VARIANCE

0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20
0.5996197,1.73440013,-1.3169968,1.5653261,-1.5969019,0.67444669,-0.12512399,-0.4273199,0.4717737,-0.39850496,...,-1.1004972,1.1912546,0.3250041,0.190007,0.1676897,0.2389344,-1.9300003,-0.6630185,0.66696378,-0.5423281
-1.1603295,-0.34985053,-1.4269351,-0.8471184,-0.0875886,0.63592054,0.76680042,0.4429793,0.3894999,0.27520256,...,-1.2717121,-0.487639,0.240838,-0.3999667,-0.5230218,-1.243049,-1.185466,-1.4335782,-1.16537046,1.3252205
0.4390934,1.19918551,0.8794416,1.3276602,0.3607676,-0.61297039,-0.09152026,-1.1878079,0.5219403,-1.2687188,...,-2.0701752,0.167811,-0.592568,-0.4026083,-0.3834947,-0.637056,-0.4664107,1.3806252,0.01250208,0.7383331
0.2048537,1.023901,0.3939916,0.4770463,-0.87996,0.41489135,-1.87430581,0.6264255,0.9044528,-0.01008133,...,0.319709,1.2926928,0.2633959,-0.4869755,-1.5996125,-1.5533964,0.1461996,-0.9568633,-1.07962295,-0.1152228
-0.6991813,-0.04627515,0.6889255,0.1852509,-3.323335,0.87734337,-0.66416202,0.3616466,1.6967909,0.65340172,...,0.2478376,0.8615212,-1.9537299,-0.8918454,0.3937841,-0.3710646,0.4120291,-1.0307271,0.26083369,-0.2457786
-0.9266257,1.38650843,2.3537165,0.2532949,-0.4675155,0.02115758,0.20341282,-0.4729606,-0.8843374,0.09857998,...,0.6933552,1.995436,-1.787648,-1.4086421,1.7164187,0.1906151,0.7806168,-1.1753491,-0.86522997,0.4682787


Let us also define the other fixed effects. 

#### Fixed effect 1: monthly variation
Just like our real Burkina dataset we have monthly factors as our fixed effect. To simplify the model we assume there are only two month levels: Jan and July, representing the two seasons. For Jan, the effect is 0.5. For July, the effect is 4. The difference between the two levels is 3.5. 

In [5]:
month<-c(rep('Jan', 4), rep('July', 4))
month<-matrix(month, nc=num_house, nr=8)
month_effect<-c(rep(0.5, 4), rep(4, 4))
month_effect<-matrix(month_effect, nc=num_house, nr=8)
month
month_effect

0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20
Jan,Jan,Jan,Jan,Jan,Jan,Jan,Jan,Jan,Jan,...,Jan,Jan,Jan,Jan,Jan,Jan,Jan,Jan,Jan,Jan
Jan,Jan,Jan,Jan,Jan,Jan,Jan,Jan,Jan,Jan,...,Jan,Jan,Jan,Jan,Jan,Jan,Jan,Jan,Jan,Jan
Jan,Jan,Jan,Jan,Jan,Jan,Jan,Jan,Jan,Jan,...,Jan,Jan,Jan,Jan,Jan,Jan,Jan,Jan,Jan,Jan
Jan,Jan,Jan,Jan,Jan,Jan,Jan,Jan,Jan,Jan,...,Jan,Jan,Jan,Jan,Jan,Jan,Jan,Jan,Jan,Jan
July,July,July,July,July,July,July,July,July,July,...,July,July,July,July,July,July,July,July,July,July
July,July,July,July,July,July,July,July,July,July,...,July,July,July,July,July,July,July,July,July,July
July,July,July,July,July,July,July,July,July,July,...,July,July,July,July,July,July,July,July,July,July
July,July,July,July,July,July,July,July,July,July,...,July,July,July,July,July,July,July,July,July,July


0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20
0.5,0.5,0.5,0.5,0.5,0.5,0.5,0.5,0.5,0.5,...,0.5,0.5,0.5,0.5,0.5,0.5,0.5,0.5,0.5,0.5
0.5,0.5,0.5,0.5,0.5,0.5,0.5,0.5,0.5,0.5,...,0.5,0.5,0.5,0.5,0.5,0.5,0.5,0.5,0.5,0.5
0.5,0.5,0.5,0.5,0.5,0.5,0.5,0.5,0.5,0.5,...,0.5,0.5,0.5,0.5,0.5,0.5,0.5,0.5,0.5,0.5
0.5,0.5,0.5,0.5,0.5,0.5,0.5,0.5,0.5,0.5,...,0.5,0.5,0.5,0.5,0.5,0.5,0.5,0.5,0.5,0.5
4.0,4.0,4.0,4.0,4.0,4.0,4.0,4.0,4.0,4.0,...,4.0,4.0,4.0,4.0,4.0,4.0,4.0,4.0,4.0,4.0
4.0,4.0,4.0,4.0,4.0,4.0,4.0,4.0,4.0,4.0,...,4.0,4.0,4.0,4.0,4.0,4.0,4.0,4.0,4.0,4.0
4.0,4.0,4.0,4.0,4.0,4.0,4.0,4.0,4.0,4.0,...,4.0,4.0,4.0,4.0,4.0,4.0,4.0,4.0,4.0,4.0
4.0,4.0,4.0,4.0,4.0,4.0,4.0,4.0,4.0,4.0,...,4.0,4.0,4.0,4.0,4.0,4.0,4.0,4.0,4.0,4.0


#### Fixed effect 2: mosquito net (complete randomised)
Finally, we have the mosquito net, which is a binary factor. Let us consider a complete randomised design, that mosuqito net can be installed/removed from each house at any time during the experiment. It is a randomised design hence more balanced, but perhaps less realistic. A mosquito net will marginally decrease the log mean count by 0.5. 

We will discuss other distributions of nets below. 

In [6]:
k<-round(num_house*8/2)
mosquito_net<-sample(c(rep('Yes', k), rep('No', num_house*8-k)))
mosquito_net<-matrix(mosquito_net, nc=num_house, nr=8, byrow=T)
mosquito_net
mosquito_net_effect<-(mosquito_net=='Yes')*(-0.5) # EFFECT IS -0.5
mosquito_net_effect

0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20
Yes,Yes,Yes,Yes,No,No,No,No,Yes,Yes,...,No,Yes,No,No,No,No,No,No,Yes,Yes
Yes,No,No,No,Yes,No,Yes,No,No,No,...,No,Yes,Yes,Yes,No,Yes,Yes,No,Yes,Yes
Yes,No,No,Yes,Yes,No,Yes,No,No,No,...,No,Yes,No,No,Yes,Yes,Yes,Yes,Yes,No
Yes,No,No,Yes,Yes,No,No,Yes,No,No,...,No,No,Yes,Yes,No,No,No,Yes,No,Yes
Yes,No,Yes,No,No,No,Yes,No,Yes,No,...,Yes,No,Yes,No,Yes,No,No,Yes,No,Yes
Yes,Yes,No,Yes,Yes,Yes,Yes,No,Yes,Yes,...,Yes,Yes,No,Yes,No,No,Yes,Yes,Yes,Yes
No,Yes,Yes,No,Yes,Yes,No,Yes,Yes,Yes,...,No,No,No,Yes,No,No,No,Yes,No,No
Yes,Yes,No,Yes,No,No,No,No,No,No,...,Yes,No,Yes,No,Yes,Yes,Yes,Yes,Yes,Yes


0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20
-0.5,-0.5,-0.5,-0.5,0.0,0.0,0.0,0.0,-0.5,-0.5,...,0.0,-0.5,0.0,0.0,0.0,0.0,0.0,0.0,-0.5,-0.5
-0.5,0.0,0.0,0.0,-0.5,0.0,-0.5,0.0,0.0,0.0,...,0.0,-0.5,-0.5,-0.5,0.0,-0.5,-0.5,0.0,-0.5,-0.5
-0.5,0.0,0.0,-0.5,-0.5,0.0,-0.5,0.0,0.0,0.0,...,0.0,-0.5,0.0,0.0,-0.5,-0.5,-0.5,-0.5,-0.5,0.0
-0.5,0.0,0.0,-0.5,-0.5,0.0,0.0,-0.5,0.0,0.0,...,0.0,0.0,-0.5,-0.5,0.0,0.0,0.0,-0.5,0.0,-0.5
-0.5,0.0,-0.5,0.0,0.0,0.0,-0.5,0.0,-0.5,0.0,...,-0.5,0.0,-0.5,0.0,-0.5,0.0,0.0,-0.5,0.0,-0.5
-0.5,-0.5,0.0,-0.5,-0.5,-0.5,-0.5,0.0,-0.5,-0.5,...,-0.5,-0.5,0.0,-0.5,0.0,0.0,-0.5,-0.5,-0.5,-0.5
0.0,-0.5,-0.5,0.0,-0.5,-0.5,0.0,-0.5,-0.5,-0.5,...,0.0,0.0,0.0,-0.5,0.0,0.0,0.0,-0.5,0.0,0.0
-0.5,-0.5,0.0,-0.5,0.0,0.0,0.0,0.0,0.0,0.0,...,-0.5,0.0,-0.5,0.0,-0.5,-0.5,-0.5,-0.5,-0.5,-0.5


#### Sample the counts and put everything into a data frame
In Poisson glm, the log of lambda is a linear combination of our factors. Let us work out the lambda for each house visit and then we draw Poisson counts from these lambdas

In [7]:
log_lambda<-house_effect+overdispersion_effect+month_effect+mosquito_net_effect
log_lambda
mosquito_count<-rpois(length(log_lambda), exp(log_lambda))

0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20
0.7818209,1.4782132,-1.5583796,-0.2180632,-1.2292619,1.283105739,-0.7850257,-0.7098085,-0.2629123,-0.7811265,...,0.9880065,1.5714329,-0.5161941,1.24065508,0.67839693,-0.3463077,-0.4546873,-0.2617622,0.1019834,-1.4806446
-0.9781283,-0.1060374,-1.1683179,-2.1305076,-0.2199486,1.244579582,-0.3931013,0.1604907,0.1548139,0.3925811,...,0.8167917,-0.1074606,-1.1003602,0.15068141,-0.01231467,-2.3282912,-0.210153,-1.0323218,-1.7303508,0.386904
0.6212946,1.4429986,1.1380589,-0.4557291,0.2284076,-0.004311342,-1.251422,-1.4702965,0.2872542,-1.1513403,...,0.0183286,0.5479894,-1.4337662,0.64803982,-0.37278751,-1.7222982,0.5089023,1.2818815,-0.5524783,0.3000167
0.3870549,1.2677141,0.6526088,-1.3063429,-1.01232,1.023550391,-2.5342075,-0.1560631,0.6697668,0.1072972,...,2.4082128,2.1728712,-1.0778023,0.06367257,-1.08890537,-2.1386386,1.6215126,-1.0556069,-1.1446033,-1.0535393
2.9830198,3.6975379,3.9475427,2.4018617,0.544305,4.986002419,1.6759363,3.579158,4.4621049,4.2707802,...,5.3363414,5.2416996,0.2050719,3.65880275,3.90449132,2.5436932,5.3873421,2.3705292,3.6958533,2.3159049
2.7555755,4.6303215,6.1123337,1.9699056,2.9001245,3.629816629,2.5435111,2.7445508,1.8809765,3.2159585,...,5.781859,5.8756143,0.8711538,2.64200598,5.72712592,3.1053729,5.2559298,2.2259073,2.0697897,3.0299622
3.1687188,1.0485404,2.2435196,4.8088381,3.7991803,5.419042341,0.2456549,3.6624051,3.6218037,4.0199955,...,4.4589167,3.6128137,1.6248611,4.27558168,3.74134505,3.3404798,4.8986723,3.5254016,3.693738,3.6646006
4.2871882,3.1162562,4.0495947,2.7909577,3.2636505,3.656563342,2.7463644,4.1239137,3.4402564,3.097481,...,6.1620013,3.6400845,0.8178246,5.94224514,4.92833401,3.7615401,4.6330418,3.2889985,4.031797,3.2935031


In [8]:
# PUT EVERYTHING INTO A DATA FRAME
dat<-data.frame(mosquito_count=as.vector(mosquito_count), 
                house_ID=as.vector(house_ID), 
                month=as.vector(month), mosquito_net=as.vector(mosquito_net))
# SET VARIABLES AS FACTORS
dat$house_ID<-factor(dat$house_ID)
dat$month<-factor(dat$month, levels=c('Jan', 'July'))
dat$mosquito_net<-factor(dat$mosquito_net, levels=c('No', 'Yes'))
dim(dat)
head(dat)

mosquito_count,house_ID,month,mosquito_net
0,1,Jan,Yes
1,1,Jan,Yes
1,1,Jan,Yes
2,1,Jan,Yes
24,1,July,Yes
12,1,July,Yes


### Re-estimating the parameters
Now we have our (simulated) dataset and let us re-estimate the parameters. 

This first (wrong) model is without any fixed effects. We are estimating the two variances. 

In [9]:
# RANDOM EFFECTS ONLY, NO FIXED EFFECTS
overdispersion<-1:nrow(dat)
m<-glmer(mosquito_count~(1|house_ID)+(1|overdispersion), 
        data=dat, family='poisson', 
                    control=glmerControl(optimizer="bobyqa",optCtrl=list(maxfun=2e5)))
summary(m)

Generalized linear mixed model fit by maximum likelihood (Laplace
  Approximation) [glmerMod]
 Family: poisson  ( log )
Formula: mosquito_count ~ (1 | house_ID) + (1 | overdispersion)
   Data: dat
Control: glmerControl(optimizer = "bobyqa", optCtrl = list(maxfun = 2e+05))

     AIC      BIC   logLik deviance df.resid 
  6835.4   6849.5  -3414.7   6829.4      797 

Scaled residuals: 
     Min       1Q   Median       3Q      Max 
-0.74775 -0.28372 -0.00246  0.04902  0.06058 

Random effects:
 Groups         Name        Variance Std.Dev.
 overdispersion (Intercept) 4.9498   2.225   
 house_ID       (Intercept) 0.1096   0.331   
Number of obs: 800, groups:  overdispersion, 800; house_ID, 100

Fixed effects:
            Estimate Std. Error z value Pr(>|z|)    
(Intercept)  1.94625    0.08899   21.87   <2e-16 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

The model significantly over-estimated the overdispersion (obervation level) variance. Next we introduce the first fixed effect, the monthly effect: 

In [10]:
# WITH ONE (MONTH) FIXED EFFECT
overdispersion<-1:nrow(dat)
m<-glmer(mosquito_count~(1|house_ID)+(1|overdispersion)+month, 
        data=dat, family='poisson', 
                    control=glmerControl(optimizer="bobyqa",optCtrl=list(maxfun=2e5)))
summary(m)

Generalized linear mixed model fit by maximum likelihood (Laplace
  Approximation) [glmerMod]
 Family: poisson  ( log )
Formula: mosquito_count ~ (1 | house_ID) + (1 | overdispersion) + month
   Data: dat
Control: glmerControl(optimizer = "bobyqa", optCtrl = list(maxfun = 2e+05))

     AIC      BIC   logLik deviance df.resid 
  5849.6   5868.4  -2920.8   5841.6      796 

Scaled residuals: 
     Min       1Q   Median       3Q      Max 
-1.46918 -0.32109 -0.02052  0.09512  1.05961 

Random effects:
 Groups         Name        Variance Std.Dev.
 overdispersion (Intercept) 0.9353   0.9671  
 house_ID       (Intercept) 0.6885   0.8298  
Number of obs: 800, groups:  overdispersion, 800; house_ID, 100

Fixed effects:
            Estimate Std. Error z value Pr(>|z|)    
(Intercept)  0.18213    0.10790   1.688   0.0914 .  
monthJuly    3.57492    0.08426  42.425   <2e-16 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Correlation of Fixed Effects:
          (Intr)
month

Now the estimates are more reasonable. The two variance estimates are close to the true values (1 and 0.6). The difference between the two months is estimated correctly as well (true difference = 3.5). 

Let us add the second fixed effect, the mosquito net. 

In [11]:
# THE FULL MODEL, WITH BOTH FIXED EFFECTS
overdispersion<-1:nrow(dat)
m<-glmer(mosquito_count~(1|house_ID)+(1|overdispersion)+month+mosquito_net, 
        data=dat, family='poisson', 
                    control=glmerControl(optimizer="bobyqa",optCtrl=list(maxfun=2e5)))
summary(m)

Generalized linear mixed model fit by maximum likelihood (Laplace
  Approximation) [glmerMod]
 Family: poisson  ( log )
Formula: mosquito_count ~ (1 | house_ID) + (1 | overdispersion) + month +  
    mosquito_net
   Data: dat
Control: glmerControl(optimizer = "bobyqa", optCtrl = list(maxfun = 2e+05))

     AIC      BIC   logLik deviance df.resid 
  5823.5   5846.9  -2906.8   5813.5      795 

Scaled residuals: 
     Min       1Q   Median       3Q      Max 
-1.44322 -0.33076 -0.02312  0.09482  1.20364 

Random effects:
 Groups         Name        Variance Std.Dev.
 overdispersion (Intercept) 0.8957   0.9464  
 house_ID       (Intercept) 0.6660   0.8161  
Number of obs: 800, groups:  overdispersion, 800; house_ID, 100

Fixed effects:
                Estimate Std. Error z value Pr(>|z|)    
(Intercept)      0.39289    0.11228   3.499 0.000467 ***
monthJuly        3.58763    0.08312  43.164  < 2e-16 ***
mosquito_netYes -0.43631    0.08182  -5.333 9.67e-08 ***
---
Signif. codes:  0 '***' 0.

All parameters estimates are very close to the true values. 

One can possibly see that under the complete randomised net the effect of mosquito net is actually confounded mostly with overdispersion. Each observation (visit to house) has its own overdispersion effect, as well as its own mosquito net status. Hence in the last model when mosquito net is included, the variance of overdispersion is lowered (from 0.9353 to 0.8957). 

Now let us look at other ways to distribute mosquito nets. 

### [Extra 1] Permanent mosquito nets
Previously I assume mosquito nets can be installed/removed at any time during the experiment. 

Below I consider a more realistic scenario that mosquito net is permanent, that the absense/precence of mosquito net per house will not change over time. We randomly install mosquito nets to half of the houses. The effect of mosquito net remains the same. 

In [12]:
# PERMANENT MOSQUITO NET
k<-round(num_house/2)
mosquito_net<-sample(c(rep('Yes', k), rep('No', num_house-k)))
mosquito_net<-matrix(mosquito_net, nc=num_house, nr=8, byrow=T)
mosquito_net
mosquito_net_effect<-(mosquito_net=='Yes')*(-0.5) # EFFECT IS -0.5
mosquito_net_effect
# CALCULATE LOG MEAN COUNT
log_lambda<-house_effect+overdispersion_effect+month_effect+mosquito_net_effect
# SAMEPLE POISSON COUNT
mosquito_count<-rpois(length(log_lambda), exp(log_lambda))
# PUT EVERYTHING INTO A DATA FRAME
dat<-data.frame(mosquito_count=as.vector(mosquito_count), 
                house_ID=as.vector(house_ID), 
                month=as.vector(month), mosquito_net=as.vector(mosquito_net))
# SET VARIABLES AS FACTORS
dat$house_ID<-factor(dat$house_ID)
dat$month<-factor(dat$month, levels=c('Jan', 'July'))
dat$mosquito_net<-factor(dat$mosquito_net, levels=c('No', 'Yes'))
dim(dat)

0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20
Yes,Yes,No,No,Yes,Yes,No,No,No,No,...,Yes,Yes,No,Yes,Yes,Yes,Yes,Yes,No,Yes
Yes,Yes,No,No,Yes,Yes,No,No,No,No,...,Yes,Yes,No,Yes,Yes,Yes,Yes,Yes,No,Yes
Yes,Yes,No,No,Yes,Yes,No,No,No,No,...,Yes,Yes,No,Yes,Yes,Yes,Yes,Yes,No,Yes
Yes,Yes,No,No,Yes,Yes,No,No,No,No,...,Yes,Yes,No,Yes,Yes,Yes,Yes,Yes,No,Yes
Yes,Yes,No,No,Yes,Yes,No,No,No,No,...,Yes,Yes,No,Yes,Yes,Yes,Yes,Yes,No,Yes
Yes,Yes,No,No,Yes,Yes,No,No,No,No,...,Yes,Yes,No,Yes,Yes,Yes,Yes,Yes,No,Yes
Yes,Yes,No,No,Yes,Yes,No,No,No,No,...,Yes,Yes,No,Yes,Yes,Yes,Yes,Yes,No,Yes
Yes,Yes,No,No,Yes,Yes,No,No,No,No,...,Yes,Yes,No,Yes,Yes,Yes,Yes,Yes,No,Yes


0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20
-0.5,-0.5,0,0,-0.5,-0.5,0,0,0,0,...,-0.5,-0.5,0,-0.5,-0.5,-0.5,-0.5,-0.5,0,-0.5
-0.5,-0.5,0,0,-0.5,-0.5,0,0,0,0,...,-0.5,-0.5,0,-0.5,-0.5,-0.5,-0.5,-0.5,0,-0.5
-0.5,-0.5,0,0,-0.5,-0.5,0,0,0,0,...,-0.5,-0.5,0,-0.5,-0.5,-0.5,-0.5,-0.5,0,-0.5
-0.5,-0.5,0,0,-0.5,-0.5,0,0,0,0,...,-0.5,-0.5,0,-0.5,-0.5,-0.5,-0.5,-0.5,0,-0.5
-0.5,-0.5,0,0,-0.5,-0.5,0,0,0,0,...,-0.5,-0.5,0,-0.5,-0.5,-0.5,-0.5,-0.5,0,-0.5
-0.5,-0.5,0,0,-0.5,-0.5,0,0,0,0,...,-0.5,-0.5,0,-0.5,-0.5,-0.5,-0.5,-0.5,0,-0.5
-0.5,-0.5,0,0,-0.5,-0.5,0,0,0,0,...,-0.5,-0.5,0,-0.5,-0.5,-0.5,-0.5,-0.5,0,-0.5
-0.5,-0.5,0,0,-0.5,-0.5,0,0,0,0,...,-0.5,-0.5,0,-0.5,-0.5,-0.5,-0.5,-0.5,0,-0.5


As before, I try to estimate the parameters back. I run two models, with and without mosquito nets. 

In [13]:
# FIRST MODEL, WITH ONE FIXED EFFECT, MONTH
overdispersion<-1:nrow(dat)
m<-glmer(mosquito_count~(1|house_ID)+(1|overdispersion)+month, 
        data=dat, family='poisson', 
                    control=glmerControl(optimizer="bobyqa",optCtrl=list(maxfun=2e5)))
summary(m)

Generalized linear mixed model fit by maximum likelihood (Laplace
  Approximation) [glmerMod]
 Family: poisson  ( log )
Formula: mosquito_count ~ (1 | house_ID) + (1 | overdispersion) + month
   Data: dat
Control: glmerControl(optimizer = "bobyqa", optCtrl = list(maxfun = 2e+05))

     AIC      BIC   logLik deviance df.resid 
  5859.1   5877.9  -2925.6   5851.1      796 

Scaled residuals: 
     Min       1Q   Median       3Q      Max 
-1.53058 -0.33282 -0.02672  0.09461  1.44283 

Random effects:
 Groups         Name        Variance Std.Dev.
 overdispersion (Intercept) 0.9248   0.9617  
 house_ID       (Intercept) 0.6674   0.8169  
Number of obs: 800, groups:  overdispersion, 800; house_ID, 100

Fixed effects:
            Estimate Std. Error z value Pr(>|z|)    
(Intercept)  0.21251    0.10678    1.99   0.0466 *  
monthJuly    3.55421    0.08393   42.35   <2e-16 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Correlation of Fixed Effects:
          (Intr)
month

In [14]:
# SECOND MODEL, WITH BOTH FIXED EFFECTS
overdispersion<-1:nrow(dat)
m<-glmer(mosquito_count~(1|house_ID)+(1|overdispersion)+month+mosquito_net, 
        data=dat, family='poisson', 
                    control=glmerControl(optimizer="bobyqa",optCtrl=list(maxfun=2e5)))
summary(m)

Generalized linear mixed model fit by maximum likelihood (Laplace
  Approximation) [glmerMod]
 Family: poisson  ( log )
Formula: mosquito_count ~ (1 | house_ID) + (1 | overdispersion) + month +  
    mosquito_net
   Data: dat
Control: glmerControl(optimizer = "bobyqa", optCtrl = list(maxfun = 2e+05))

     AIC      BIC   logLik deviance df.resid 
  5856.3   5879.8  -2923.2   5846.3      795 

Scaled residuals: 
     Min       1Q   Median       3Q      Max 
-1.52172 -0.31751 -0.02471  0.09318  1.38134 

Random effects:
 Groups         Name        Variance Std.Dev.
 overdispersion (Intercept) 0.9259   0.9622  
 house_ID       (Intercept) 0.6268   0.7917  
Number of obs: 800, groups:  overdispersion, 800; house_ID, 100

Fixed effects:
                Estimate Std. Error z value Pr(>|z|)    
(Intercept)      0.40872    0.13620   3.001  0.00269 ** 
monthJuly        3.55479    0.08397  42.334  < 2e-16 ***
mosquito_netYes -0.39329    0.17707  -2.221  0.02635 *  
---
Signif. codes:  0 '***' 0.

Some key observations: First, we are still able to estimate all fixed effects. In particular, the negative effect of having mosquito nets. Second, unlike the previous example with complete randomised mosquito nets, here the overdisperion is not affected by whether we included mosquito net in model fitting. Including mosquito net in the model only reduces the house variance (from 0.6674 to 0.6268). It is predictable as the mosquito net is now confounding with house (each house has its own net status, rather than per observation). 

I guess you will say that 0.6268 is the epidemiologically meaningful parameter you wish to obtain. 

### [Extra 2] Confounding mosquito net
Below is an extreme case where we intentionally install mosquito nets to high risk houses. It is a plausible scenario for the government or local officials to distribute mosuqito nets to houses with the highest exposure. It is just a natual thing to do. To model this, I give mosquito nets to houses with positive random effects. 

In [15]:
# CONFOUNDING MOSQUITO NET
mosquito_net<-matrix('No', nc=num_house, nr=8)
mosquito_net[house_effect>=0]<-'Yes' # IF HOUSE EFFECT >0, GIVE MOSQUITO NET
mosquito_net_effect<-(mosquito_net=='Yes')*(-0.5) 
mosquito_net
mosquito_net_effect
# CALCULATE LOG MEAN COUNT
log_lambda<-house_effect+overdispersion_effect+month_effect+mosquito_net_effect
log_lambda
mosquito_count<-rpois(length(log_lambda), exp(log_lambda))
# PUT EVERYTHING INTO A DATA FRAME
dat<-data.frame(mosquito_count=as.vector(mosquito_count), 
                house_ID=as.vector(house_ID), 
                month=as.vector(month), mosquito_net=as.vector(mosquito_net))
# SET VARIABLES AS FACTORS
dat$house_ID<-factor(dat$house_ID)
dat$month<-factor(dat$month, levels=c('Jan', 'July'))
dat$mosquito_net<-factor(dat$mosquito_net, levels=c('No', 'Yes'))
dim(dat)

0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20
Yes,No,No,No,No,Yes,No,No,No,No,...,Yes,Yes,No,Yes,Yes,No,Yes,No,No,No
Yes,No,No,No,No,Yes,No,No,No,No,...,Yes,Yes,No,Yes,Yes,No,Yes,No,No,No
Yes,No,No,No,No,Yes,No,No,No,No,...,Yes,Yes,No,Yes,Yes,No,Yes,No,No,No
Yes,No,No,No,No,Yes,No,No,No,No,...,Yes,Yes,No,Yes,Yes,No,Yes,No,No,No
Yes,No,No,No,No,Yes,No,No,No,No,...,Yes,Yes,No,Yes,Yes,No,Yes,No,No,No
Yes,No,No,No,No,Yes,No,No,No,No,...,Yes,Yes,No,Yes,Yes,No,Yes,No,No,No
Yes,No,No,No,No,Yes,No,No,No,No,...,Yes,Yes,No,Yes,Yes,No,Yes,No,No,No
Yes,No,No,No,No,Yes,No,No,No,No,...,Yes,Yes,No,Yes,Yes,No,Yes,No,No,No


0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20
-0.5,0,0,0,0,-0.5,0,0,0,0,...,-0.5,-0.5,0,-0.5,-0.5,0,-0.5,0,0,0
-0.5,0,0,0,0,-0.5,0,0,0,0,...,-0.5,-0.5,0,-0.5,-0.5,0,-0.5,0,0,0
-0.5,0,0,0,0,-0.5,0,0,0,0,...,-0.5,-0.5,0,-0.5,-0.5,0,-0.5,0,0,0
-0.5,0,0,0,0,-0.5,0,0,0,0,...,-0.5,-0.5,0,-0.5,-0.5,0,-0.5,0,0,0
-0.5,0,0,0,0,-0.5,0,0,0,0,...,-0.5,-0.5,0,-0.5,-0.5,0,-0.5,0,0,0
-0.5,0,0,0,0,-0.5,0,0,0,0,...,-0.5,-0.5,0,-0.5,-0.5,0,-0.5,0,0,0
-0.5,0,0,0,0,-0.5,0,0,0,0,...,-0.5,-0.5,0,-0.5,-0.5,0,-0.5,0,0,0
-0.5,0,0,0,0,-0.5,0,0,0,0,...,-0.5,-0.5,0,-0.5,-0.5,0,-0.5,0,0,0


0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20
0.7818209,1.9782132,-1.0583796,0.2819368,-1.2292619,0.7831057,-0.7850257,-0.7098085,0.2370877,-0.2811265,...,0.4880065,1.5714329,-0.5161941,0.74065508,0.1783969,-0.3463077,-0.9546873,-0.2617622,0.60198342,-0.9806446
-0.9781283,-0.1060374,-1.1683179,-2.13050765,0.2800514,0.7445796,0.1068987,0.1604907,0.1548139,0.3925811,...,0.3167917,-0.1074606,-0.6003602,0.15068141,-0.5123147,-1.8282912,-0.210153,-1.0323218,-1.23035082,0.886904
0.6212946,1.4429986,1.1380589,0.04427088,0.7284076,-0.5043113,-0.751422,-1.4702965,0.2872542,-1.1513403,...,-0.4816714,0.5479894,-1.4337662,0.14803982,-0.3727875,-1.2222982,0.5089023,1.7818815,-0.05247828,0.3000167
0.3870549,1.2677141,0.6526088,-0.80634295,-0.51232,0.5235504,-2.5342075,0.3439369,0.6697668,0.1072972,...,1.9082128,1.6728712,-0.5778023,0.06367257,-1.5889054,-2.1386386,1.1215126,-0.5556069,-1.14460331,-0.5535393
2.9830198,3.6975379,4.4475427,2.40186166,0.544305,4.4860024,2.1759363,3.579158,4.9621049,4.2707802,...,5.3363414,4.7416996,0.7050719,3.15880275,3.9044913,2.5436932,4.8873421,2.8705292,3.69585333,2.8159049
2.7555755,5.1303215,6.1123337,2.46990561,3.4001245,3.6298166,3.0435111,2.7445508,2.3809765,3.7159585,...,5.781859,5.8756143,0.8711538,2.64200598,5.2271259,3.1053729,5.2559298,2.7259073,2.56978966,3.5299622
2.6687188,1.5485404,2.7435196,4.80883807,4.2991803,5.4190423,0.2456549,4.1624051,4.1218037,4.5199955,...,3.9589167,3.1128137,1.6248611,4.27558168,3.2413451,3.3404798,4.3986723,4.0254016,3.69373804,3.6646006
4.2871882,3.6162562,4.0495947,3.29095771,3.2636505,3.1565633,2.7463644,4.1239137,3.4402564,3.097481,...,6.1620013,3.1400845,1.3178246,5.44224514,4.928334,4.2615401,4.6330418,3.7889985,4.53179697,3.7935031


As before, I fit two glms, with and without the mosquito net:

In [16]:
# FIRST MODEL, ONE FIXED EFFECT, MONTH
overdispersion<-1:nrow(dat)
m<-glmer(mosquito_count~(1|house_ID)+(1|overdispersion)+month, 
        data=dat, family='poisson', 
                    control=glmerControl(optimizer="bobyqa",optCtrl=list(maxfun=2e5)))
summary(m)

Generalized linear mixed model fit by maximum likelihood (Laplace
  Approximation) [glmerMod]
 Family: poisson  ( log )
Formula: mosquito_count ~ (1 | house_ID) + (1 | overdispersion) + month
   Data: dat
Control: glmerControl(optimizer = "bobyqa", optCtrl = list(maxfun = 2e+05))

     AIC      BIC   logLik deviance df.resid 
  5772.6   5791.4  -2882.3   5764.6      796 

Scaled residuals: 
     Min       1Q   Median       3Q      Max 
-1.50921 -0.29941 -0.01065  0.10740  1.16124 

Random effects:
 Groups         Name        Variance Std.Dev.
 overdispersion (Intercept) 0.9127   0.9554  
 house_ID       (Intercept) 0.3694   0.6078  
Number of obs: 800, groups:  overdispersion, 800; house_ID, 100

Fixed effects:
            Estimate Std. Error z value Pr(>|z|)    
(Intercept)  0.15958    0.09166   1.741   0.0817 .  
monthJuly    3.61986    0.08366  43.270   <2e-16 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Correlation of Fixed Effects:
          (Intr)
month

In [17]:
# SECOND MODEL, BOTH FIXED EFFECTS
overdispersion<-1:nrow(dat)
m<-glmer(mosquito_count~(1|house_ID)+(1|overdispersion)+month+mosquito_net, 
        data=dat, family='poisson', 
                    control=glmerControl(optimizer="bobyqa",optCtrl=list(maxfun=2e5)))
summary(m)

Generalized linear mixed model fit by maximum likelihood (Laplace
  Approximation) [glmerMod]
 Family: poisson  ( log )
Formula: mosquito_count ~ (1 | house_ID) + (1 | overdispersion) + month +  
    mosquito_net
   Data: dat
Control: glmerControl(optimizer = "bobyqa", optCtrl = list(maxfun = 2e+05))

     AIC      BIC   logLik deviance df.resid 
  5744.9   5768.3  -2867.5   5734.9      795 

Scaled residuals: 
    Min      1Q  Median      3Q     Max 
-1.5000 -0.3176 -0.0015  0.1011  1.1292 

Random effects:
 Groups         Name        Variance Std.Dev.
 overdispersion (Intercept) 0.9159   0.9571  
 house_ID       (Intercept) 0.2347   0.4844  
Number of obs: 800, groups:  overdispersion, 800; house_ID, 100

Fixed effects:
                Estimate Std. Error z value Pr(>|z|)    
(Intercept)     -0.23142    0.10952  -2.113   0.0346 *  
monthJuly        3.62138    0.08375  43.238  < 2e-16 ***
mosquito_netYes  0.73492    0.12533   5.864 4.52e-09 ***
---
Signif. codes:  0 '***' 0.001 '**' 0

Again, the overdisperion estimate is unaffected by mosquito net. It is because mosquito net is confounding with houses but not with house visits. 

In the first model (without fitting mosquito net) the house variance is smaller than the true value, as mosquito net's negative impact compensates the effect on houses with above-average exposure. But I guess you would argue that this is the number you want. 

In the second model we got both house variance and the fixed effect of mosquito net wrong. 

The good thing is that the monthly effect is unaffected by all these. 

### Conclusion
1) If we have completely randomised mosquito net it only affects the overdispersion estimates. 
2) If we have permanent (but still randomised) mosquito net distribution it affects the house variance. 
3) If we have targeted mosquito net distribution it may affect both the house variance estimate and the fixed effect estimate of mosquito net. 

Ignore the bits below

In [18]:
var(as.vector(overdispersion_effect))
# ESTIMATE THE REDUCTION OF OVERDISPERSION
temp<-rep(NA, 1000)
k<-round(num_house*8/2)
for (i in 1:length(temp))
    {
    temp1<-as.vector(overdispersion_effect)+sample(c(rep(0, k), rep(-0.5, num_house*8-k)))
    temp[i]<-var(temp1)
}
mean(temp)
mean(temp)-var(as.vector(overdispersion_effect))

# ESTIMATE THE REDUCTION OF THE HOUSE EFFECT VARIANCE
var(house_effect[1,])
temp<-rep(NA, 1000)
k<-round(num_house/2)
for (i in 1:length(temp))
    {
    temp1<-house_effect[1,]+sample(c(rep(0, k), rep(-0.5, num_house-k)))
    temp[i]<-var(temp1)
}
mean(temp)
mean(temp)-var(house_effect[1,])
0.6674-0.6268