In [1]:
# Initialization
%cd ../covid_households/
import recipes
import utilities
import traits

import tqdm
from multiprocessing import Pool

/Users/thayer/covid_households/covid_households


# What is this notebook for?

We conduct our simulations over a range of values for each of our three different parameters, $s_{80}$, $p_{80}$, and $\text{SAR}$. But these parameters are not direct inputs into our model, they are complex expressions of properties of the distributions of relative susceptibility and infectivity ($s_{80}$ and $p_{80}$) in the population or the average risk of infection from a household contact ($\text{SAR}$). For full information about these parameters, see the Methods and Supplemental Methods sections.

To convert these parameters to actual model parameters (the mean & variance of distributions; or $\beta$, the probability/time of infection) we use numerical methods. For the overwhelming majority of parameter combinations, this works great. But when $p_{80}$ or $s_{80}$ is small and $\text{SAR}$ is high, we cannot solve for a $\beta$ that actually produces the desired $\text{SAR}$. There is so much heterogeneity (and thus so many people that are neglibly infectious or susceptible) that we can't solve for an appropriately high $\beta$ given that $\beta < 1$.

We want to drop these points of our 3d grid in parameter space so that the likelihood surface does not include points with an unrealistic $\beta$. To do that, we first have to find every point where the residual from the numerical fit is higher than our tolerance ($10^{-5}$).

We define the region over which we simulate by enumerating each of its axes. In order to compute in parallel, we also make a `coordinate_stream` generator that yields coordinate pairs for the entire region in sequence.

In [2]:
import numpy as np
s80_axis = np.linspace(0.02, 0.80, 40)
p80_axis = np.linspace(0.02, 0.80, 40)
sar_axis = np.linspace(0.01, 0.60, 60)

def coordinate_stream(axis1, axis2, axis3):
    for v1 in axis1:
        for v2 in axis2:
            for v3 in axis3:
                yield (v1, v2, v3)

Using Python's multiprocessing functionality, we iterate over each point in the region and apply the `calculate_residual` function from `utilities` in order to find the difference between the expected $\text{SAR}$ and the $\text{SAR}$ that is actually implied by $\beta$ and the traits.

In [4]:
with Pool(4) as p:
    total = len(s80_axis) * len(p80_axis) * len(sar_axis)
    residuals = list(tqdm.tqdm(
        p.imap(utilities.residual_wrapper, coordinate_stream(s80_axis, p80_axis, sar_axis)),
        total=total
    ))

100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 96000/96000 [27:01<00:00, 59.21it/s]


A "crib" is a cheat sheet that we use to precalculate the mappings between our parameters and the model parameters we need to simulate forwards in time. We want to mirror its structure but find the cells were the residual is greater than our tolerance.

In [22]:
beta_crib_copy = utilities.S80_P80_SAR_Inputs.beta_crib.copy()
beta_crib_copy['residuals'] = residuals

In [23]:
beta_crib_copy['bad beta'] = beta_crib_copy['residuals'] > 10e-5

In [24]:
beta_crib_copy[beta_crib_copy['bad beta']]

Unnamed: 0_level_0,Unnamed: 1_level_0,Unnamed: 2_level_0,beta,residuals,bad beta
s80,p80,SAR,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1


In [43]:
beta_crib_copy.to_csv('./problematic_parameter_combinations.csv')