# Google Colab initialization

This section will help you interface with Google Drive and clone the git repository where the code lives. These steps **aren't necessary if you are running locally**. First, make sure you have opened the notebook in Google Colab (use the below button if ncessary) and logged into your Google account.

[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://githubtocolab.com/tanderson11/covid_households/blob/main/notebooks/ViolinsAndPowerCalc.ipynb)

In [None]:
from google.colab import drive
drive.mount('/content/gdrive')

In [None]:
%mkdir /content/gdrive/My\ Drive/github/
%cd /content/gdrive/My\ Drive/github/
# Thayer has his files located here instead
#%cd /content/gdrive/My\ Drive/github/paper_push


In [None]:
# If you've forked the repository, point to your own username and repository name (if different)
repo_owner="tanderson11"
repository="covid_households"

!git config --global user.email "tanderson11@gmail.com"
!git config --global user.name "Thayer Anderson"

In [None]:
!git clone https://github.com/tanderson11/covid_households.git

In [None]:
%cd covid_households/
!ls -a

In [None]:
!git checkout main
!git pull

In [None]:
%cd ./notebooks

# Module initialization

In [10]:
%cd ../covid_households
import recipes
import interventions
import traits

/Users/thayer/develop/covid_households/covid_households


# Vaccine trial

This cell configures the parameters that are shared between the vaccinated arm and the control arm.

The traits of susceptibility and infectivity refer to population variation in these traits of individuals (i.e. it's unrelated to the vaccine). You can set these to a `ConstantTrait` for no variation or to a `LognormalTrait` to achieve variation in the population.

In [29]:
susceptibility = traits.ConstantTrait()
#susceptibility = traits.LognormalTrait.from_natural_mean_variance(mean=1.0, variance=0.5)
infectivity = traits.ConstantTrait()
#infectivity = traits.LognormalTrait.from_natural_mean_variance(mean=1.0, variance=2.0)

# {size: # of households of that size}
sizes = {100:10}
trials = 100
household_beta = 0.0005

This cell configures the vaccine.

The `shape` refers to how the vaccine is applied to the population. `InterveneOnFirst` means the "first" individual in each household is vaccinated (since household order is totally random, this effectively vaccinates a random individual in each household, but be careful not to break the symmetry of individuals by introducing something *else* that cares about particular individuals).

The `vaccine` determines the (relative per-contact) susceptibility and infectivity of vaccinated individuals. For example, `sus_factor=0.2` says that a vaccinated person is only $20\%$ as likely as an unvaccinated person to be infected per contact.

In [30]:
shape = interventions.InterveneOnFirst()
vaccine = interventions.ConstantFactorIntervention(shape, sus_factor=0.2, inf_factor=0.3)

These cells simulate forwards in time. They are purely configured above, you shouldn't need to touch them except for advanced uses.

In [31]:
vax_model = recipes.Model(intervention=vaccine)
vax_df = vax_model.run_trials(household_beta=household_beta, sizes=sizes, trials=trials, sus=susceptibility, inf=infectivity, as_counts=False)

vax_df.groupby('trial').sum()

Unnamed: 0_level_0,size,infections,intervention and infection,total interventions
trial,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1
0,1000,107,0,10.0
1,1000,178,1,10.0
2,1000,11,0,10.0
3,1000,17,0,10.0
4,1000,49,0,10.0
...,...,...,...,...
95,1000,18,0,10.0
96,1000,15,0,10.0
97,1000,97,1,10.0
98,1000,10,0,10.0


In [32]:
# 1.0 and 1.0 because the placebo has no effect
control_model = recipes.Model(intervention=interventions.ConstantFactorIntervention(shape, 1.0, 1.0))
control_df = control_model.run_trials(household_beta=household_beta, sizes=sizes, trials=trials, sus=susceptibility, inf=infectivity, as_counts=False)

control_df.groupby('trial').sum()

Unnamed: 0_level_0,size,infections,intervention and infection,total interventions
trial,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1
0,1000,155,1,10.0
1,1000,17,0,10.0
2,1000,13,0,10.0
3,1000,11,0,10.0
4,1000,106,1,10.0
...,...,...,...,...
95,1000,10,0,10.0
96,1000,14,0,10.0
97,1000,11,0,10.0
98,1000,100,1,10.0


# Quantifying vaccine effects

Defining and calculating $\text{VE}_{\text{S}}$:

$$ \text{VE}_{\text{S}} = \left.1 - \frac{AR_v}{AR_p} = 1 - {n_v^+\ /\ (n_v^+ + n_v^-)} \middle/ \right. {n_p^+\ /\ (n_p^+ + n_{p}^-)}{}$$

Defining and calculating $\text{VE}_{\text{contacts}}$:

$$ \text{VE}_{\text{contact}} = \left.1 - \frac{AR_u}{AR_{np}} = 1 - {\ n_u^+/\ (n_u^+ + n_u^-)} \middle/ \right. {n_{np}^+\ /\ (n_{np}^+ + n_{np}^-)}$$

where $\pm$ refers to infected and uninfected individuals, $n_v$ refers to vaccinated individuals (who received the real vaccine), $n_u$ refers to unvaccinated individuals (in households where the vaccine was administered), $n_p$ refers to placebo-receiving individuals, and $n_{np}$ refers to individuals who received no placebo (in households where the placebo was administered).

The term $AR$ (attack rate) is used as defined in the literature and should not be understood as a rate, but simply as the observed frequency of infections among a group such that $AR_v$, for example, is defined as the fraction of vaccinated individuals who were in fact infected.

Defining and calculating $\text{VE}_{\text{total}}$:

$$ \text{VE}_{\text{total}} = \left.1 - \frac{AR_{HV}}{AR_{HP}} = 1 - {n_{HV}^+\ /\ (n_{HV}^+ + n_{HV}^-)} \middle/ \right. {n_{HP}^+\ /\ (n_{HP}^+ + n_{HP}^-)}{}$$

Heree the notation has changed slightly to allow $n_{HV}$ to refer to the total number of individuals in households that received a vaccine, ie $n_{HV} = n_{v} + n_{u}$.

In [27]:
import pandas as pd
import scipy

def ves(vax_df, control_df):
    """Vaccination effect on susceptibility using the placebo RR as baseline (equation 1/2 in Betz)"""
    print("Calculating VEs ...\n")

    vg = vax_df.groupby(["trial"])
    vgs = vg.sum()
    f_v = vg["intervention and infection"].sum() / vg["total interventions"].sum()

    cg = control_df.groupby(["trial"])
    cgs = cg.sum()
    f_c = cg["intervention and infection"].sum() / cg["total interventions"].sum()     

    # fisher exact test record actual number of events: columns either vaccinated or in household with vaccination vs other and rows = individual was infected vs not

    # fisher exact test : comparing primary participants in households

    ##             placebo | vaccinated
    ##  uninfected
    ##  -----
    ##  infected

    fisher_df = pd.concat([cgs["total interventions"] - cgs["intervention and infection"], vgs["total interventions"] - vgs["intervention and infection"], cgs["intervention and infection"], vgs["intervention and infection"]], axis=1)
    fisher_df.columns =["cuinfected", "vuinfected", "cinfected", "vinfected"]
    p = fisher_df.apply(lambda row: (scipy.stats.fisher_exact([[row["cuinfected"], row["vuinfected"]], [row["cinfected"], row["vinfected"]]]))[1], axis=1) # index 1 to get p value
    p.name = "fisher p value"

    ve = 1. - f_v / f_c
    ve.name = "VE"

    return pd.concat([ve, p], axis=1)

def vecontact(vax_df, control_df):
    print("Calculating VEcontact ...\n")
    vax_df = vax_df.copy()
    vax_df["total unvaccinated"] = vax_df["size"] - vax_df["total interventions"]
    vax_df["unvaccinated and infected"] = vax_df["infections"] - vax_df["intervention and infection"]
    vg = vax_df.groupby(["trial"])
    vgs = vg.sum()
    f_v = vg["unvaccinated and infected"].sum() / vg["total unvaccinated"].sum()

    control_df = control_df.copy()
    cg = control_df.groupby(["trial"])
    control_df["total unvaccinated"] = control_df["size"] - control_df["total interventions"]
    control_df["unvaccinated and infected"] = control_df["infections"] - control_df["intervention and infection"]
    cgs = cg.sum()
    f_c = cg["unvaccinated and infected"].sum() / cg["total unvaccinated"].sum()
    
    ve = 1. - f_v / f_c
    #ve.name = "VEcontact"
    ve.name = "VE"

    # fisher exact test : comparing households by type but only unvaccinated
    ##             control hh secondary (no placebo) | vaccinated hh secondary (no vax)
    ##  uninfected
    ##  -----
    ##  infected


    fisher_df = pd.concat([cgs["total unvaccinated"]-cgs["unvaccinated and infected"], vgs["total unvaccinated"]-vgs["unvaccinated and infected"], cgs["unvaccinated and infected"], vgs["unvaccinated and infected"]], axis=1)
    fisher_df.columns =["cuinfected", "vuinfected", "cinfected", "vinfected"]
    p = fisher_df.apply(lambda row: (scipy.stats.fisher_exact([[row["cuinfected"], row["vuinfected"]], [row["cinfected"], row["vinfected"]]]))[1], axis=1) # index 1 to get p value
    p.name = "fisher p value"

    return pd.concat([ve, p], axis=1)

def vetotal(vax_df, control_df):
    print("Calculating VEtotal ...\n")
    vg = vax_df.groupby(["trial"])
    vgs = vg.sum()
    f_v = vg["infections"].sum() / vg["size"].sum()

    cg = control_df.groupby(["trial"])
    cgs = cg.sum()
    f_c = cg["infections"].sum() / cg["size"].sum()

    ve = 1. - (f_v)/(f_c)
    #ve.name = "VEtotal"
    ve.name = "VE"

    # fisher exact test : comparing households by type

            ##             control hh | vaccinated hh
    ##  uninfected
    ##  -----
    ##  infected


    fisher_df = pd.concat([cgs["size"]-cgs["infections"], vgs["size"]-vgs["infections"], cgs["infections"], vgs["infections"]], axis=1)
    fisher_df.columns =["cuinfected", "vuinfected", "cinfected", "vinfected"]
    p = fisher_df.apply(lambda row: (scipy.stats.fisher_exact([[row["cuinfected"], row["vuinfected"]], [row["cinfected"], row["vinfected"]]]))[1], axis=1) # index 1 to get p value
    p.name = "fisher p value"

    return pd.concat([ve, p], axis=1)

In [33]:
ves(vax_df, control_df)

Calculating VEs ...



Unnamed: 0_level_0,VE,fisher p value
trial,Unnamed: 1_level_1,Unnamed: 2_level_1
0,1.0,1.0
1,-inf,1.0
2,,1.0
3,,1.0
4,1.0,1.0
...,...,...
95,,1.0
96,,1.0
97,-inf,1.0
98,1.0,1.0


In [34]:
vecontact(vax_df, control_df)

Calculating VEcontact ...



Unnamed: 0_level_0,VE,fisher p value
trial,Unnamed: 1_level_1,Unnamed: 2_level_1
0,0.305195,2.198100e-03
1,-9.411765,7.457616e-38
2,0.153846,8.378342e-01
3,-0.545455,3.414689e-01
4,0.533333,3.155695e-06
...,...,...
95,-0.800000,1.818090e-01
96,-0.071429,1.000000e+00
97,-7.727273,6.359264e-19
98,0.898990,1.788685e-20


In [35]:
vetotal(vax_df, control_df)

Calculating VEtotal ...



Unnamed: 0_level_0,VE,fisher p value
trial,Unnamed: 1_level_1,Unnamed: 2_level_1
0,0.309677,1.799571e-03
1,-9.470588,4.027640e-38
2,0.153846,8.378441e-01
3,-0.545455,3.415037e-01
4,0.537736,2.241682e-06
...,...,...
95,-0.800000,1.818403e-01
96,-0.071429,1.000000e+00
97,-7.818182,3.445862e-19
98,0.900000,9.574900e-21
