# TA3 simulation workflow 

## Jeff Bezanson, Julia Computing team and Patrick Stokes, Algebraic Julia (5-10 minutes)

Wiring diagram execution of exploratory scientific workflows on models obtained from TA2 for Scenarios 1, 2 and 3.

## Sam Witty, CIEMSS team (15 minutes)

* Model selection and probabilistic model calibration from historical data from Scenario 2.
* Query exploration of realistic vaccination policy interventions
* Risk-aware optimization of vaccination policies that account for heavy-tailed infection distributions such as superspreader events
* Discussion of HMI integration of continuous validation tools with TA4

## Paul Cohen, University of Pittsburgh  (5-10 minutes)

Post-execution exploration of simulation results using the DeDri query language.

In [1]:
import pyro

def my_model():
    x = pyro.sample("x", pyro.distributions.Normal(0, 1))
    y = pyro.sample("y", pyro.distributions.Normal(x, 1))
    z = pyro.sample("z", pyro.distributions.Normal(y, 2))
    return x, y, z

def query_model_vars(model):
    model_vars = set()
    for name, value in pyro.get_param_store().items():
        if name.split(".")[0] in model.__name__:
            model_vars.add(name)
    return model_vars

# query the model for all variables
vars = query_model_vars(my_model)
print(vars) # {'my_model.x', 'my_model.y', 'my_model.z'}


set()


In [4]:
import pyro
import torch

def my_model():
    x = pyro.sample("x", pyro.distributions.Normal(0, 1))
    y = pyro.sample("y", pyro.distributions.Normal(x, 1))
    z = pyro.sample("z", pyro.distributions.Normal(y, 2))
    return x, y, z

# apply a trace to the model
traced_model = pyro.poutine.trace(my_model).get_trace()

# access the variables and their values
for name, node in traced_model.nodes.items():
    if 'value' in node:
        print(name)
        print(node)



x
{'type': 'sample', 'name': 'x', 'fn': Normal(loc: 0.0, scale: 1.0), 'is_observed': False, 'args': (), 'kwargs': {}, 'value': tensor(-0.0740), 'infer': {}, 'scale': 1.0, 'mask': None, 'cond_indep_stack': (), 'done': True, 'stop': False, 'continuation': None}
y
{'type': 'sample', 'name': 'y', 'fn': Normal(loc: -0.07404226809740067, scale: 1.0), 'is_observed': False, 'args': (), 'kwargs': {}, 'value': tensor(-0.6132), 'infer': {}, 'scale': 1.0, 'mask': None, 'cond_indep_stack': (), 'done': True, 'stop': False, 'continuation': None}
z
{'type': 'sample', 'name': 'z', 'fn': Normal(loc: -0.6132466197013855, scale: 2.0), 'is_observed': False, 'args': (), 'kwargs': {}, 'value': tensor(0.5962), 'infer': {}, 'scale': 1.0, 'mask': None, 'cond_indep_stack': (), 'done': True, 'stop': False, 'continuation': None}
_RETURN
{'name': '_RETURN', 'type': 'return', 'value': (tensor(-0.0740), tensor(-0.6132), tensor(0.5962))}


![thin-thread-w-Julia.png](images/thin-thread-w-Julia.png)

# Axes of Query Complexity

* **model transformation complexity** ranging from simple queries over Pearl’s causal hierarchy (association and intervention) to complex queries (multiple world counterfactuals)
* **simulation complexity**, ranging from unconditional (forward simulation) queries to conditional queries (inverse problem)
* **decision complexity**, ranging from simple decisions (compare A vs B) to sophisticated decisions “optimize $f(x, u)$ subject to $g(x, u) = 0, h(x, u) \leq 0$
* **intervention complexity**, In general, interventions that modify Petri nets are guaranteed to generate ODE's that preserve mass balance. Interventions that modify ODEs directly have no such guarantees.  As we explore additional queries in the ASKEM starter kit, we will likely need to represent interventions that directly modify the trajectories of the ODE's, which may result in further assumption violations. 
* **data complexity**.   Just as "no plan has survived contact with the enemy",  it may also be the case that "no model has survived contact with data". Unobserved confounding, missing data, and selection bias all threaten the validity of causal effect estimates.  Furthermore, when data come from different populations with different distributions over the same variables, care must be taken to avoid introducing bias.  By specifying queries that acknowledge these threats to validity, we can generate models that take these factors into account and apply formal causal reasoning to recover from the biases these threats may cause.



## ASKEM Starter Kit Questions

| Question | model complexity | simulation complexity | decision complexity | intervention complexity | query expression |
|-----------|--------------|-----------------|---------------------------|-------------------|------------------|
| What is the probability of staying under ICU capacity? | simple (no interventions) | intermediate (many forward simulations by sampling from prior distribution of parameters) | simple (no decisions)| simple (no intervention)| $$P(ICU < capacity)$$ |
What is the probability of staying under ICU capacity if we do intervention X? | intermediate (intervened model) | intermediate (many forward simulations by sampling from prior distribution of parameters) | simple (only one intervention) |  unspecified | $$P(ICU_{do(X)} < capacity)$$ |
| I can only do one, is A or B better? | intermediate (two intervened models) | intermediate (many forward simulations by sampling from prior distribution of parameters) | Simple (comparison of two alternatives) |depends on nature of A and B. | $$E[Y_{do(A)} - Y_{do(B)}] > 0$$ |
|Is there an intervention that will keep us under ICU capacity with probability $p$? | intermediate (many interventions) | intermediate (many forward simulations by sampling from prior distribution of parameters| sophisticated (search over decision space to find an intervention that satisfy ICU capacity constraint | unspecified |  $$\texttt{satisfy}_{x\in X}  P(ICU_{do(x)}) < capacity) > p$$|
| Is intervention $A$ or $B$ more likely to keep me under ICU capacity (simpler version of 3b)  | intermediate (two interventions) |intermediate (many forward simulations by sampling from prior distribution of parameters) | simple (comparison of two alternatives)n|depends on nature of A and B. | $$\frac{P(ICU_{do(A)} < capacity )}{P(ICU_{do(B)} < capacity )}$$  |
|When considering interventions $A,B,\ldots$ what is the minimum (maximum) “expense” that achieves goal $G$? (is this underspecified? probably:  what is the cost function? Shouldn’t we iterate with the user about this?) Are you specifying an optimization function or constraints – not the same thing.  Many of these questions have implicit conditions that need to become explicit | intermediate (many interventions) |intermediate (many forward simulations by sampling from prior distribution of parameters |sophisticated (optimization over decision space) | depends on nature of $A$, $B$, etc...| $$\begin{array}{rl}\texttt{minimize}_{x\in \{A,B,\ldots\}}& Cost(x) \\
\texttt{such that} & G_{do(x)} = 1\end{array}$$

## Decision-driven (DeDri) queries that exercise the axes of complexity.

| Question | model complexity | simulation complexity | decision complexity | intervention complexity | query expression |
|-----------|--------------|-----------------|---------------------------|-------------------|------------------|
| What is the probability of staying under ICU capacity? | simple (no interventions) | intermediate (many forward simulations by sampling from prior distribution of parameters) | simple (no decisions)| simple (no intervention)| $$P(ICU < capacity)$$ |
What is the probability of staying under ICU capacity if we do intervention X? | intermediate (intervened model) | intermediate (many forward simulations by sampling from prior distribution of parameters) | simple (only one intervention) |  depends on X | $$P(ICU_{do(X)} < capacity)$$ |
Given that we exceeded ICU capacity, what is the probability of staying under ICU capacity if we do intervention X? | intermediate (intervened model) | complex (requires soving an inverse problem) | simple (only one intervention) |  depends on X | $$P(ICU_{do(X)} < capacity | ICU)$$ |
|Given that we exceeded ICU capacity, is there an intervention that will keep us under ICU capacity with probability $p$? | intermediate (many interventions) | complex (requires solving an inverse problem)| sophisticated (search over decision space to find an intervention that satisfy ICU capacity constraint | unspecified |  $$\texttt{satisfy}_{x\in X}  P(ICU_{do(x)}) < capacity| ICU) < p$$|
| Given that we exceeded ICU capacity, when should we have intervened? | complex (requires stratification) |complex (requires solving an inverse problem) |  complex (search over many interventions) | complex (intervening on a trajectory) | $$\texttt{satisfy}_{x(t)\in X(T)}  P(ICU_{do(x(t))} < capacity| ICU)< p$$
| On average, how many fewer infections will there be if we imposed a mask mandate? | intermediate (two intervened models) | intermediate (many forward simulations by sampling from prior distribution of parameters) | Simple (comparison of two alternatives) |depends on how the mask mandate is implemented. | $$E[Infections_{do(mask mandate)} - Infections]$$ |
|Is there an intervention that will keep us under ICU capacity with probability $p$? | intermediate (many interventions) | intermediate (many forward simulations by sampling from prior distribution of parameters| sophisticated (search over decision space to find an intervention that satisfy ICU capacity constraint | unspecified |  $$\texttt{satisfy}_{x\in X}  P(ICU_{do(x)} < capacity) < p$$|
| Is intervention $A$ or $B$ more likely to keep me under ICU capacity (simpler version of 3b)  | intermediate (two interventions) |intermediate (many forward simulations by sampling from prior distribution of parameters) | simple (comparison of two alternatives)n|depends on nature of A and B. | $$\frac{P(ICU_{do(A)} < capacity )}{P(ICU_{do(B)} < capacity )}$$  |

## December Demo queries and variants

| Question | model complexity | simulation complexity | decision complexity | intervention complexity | data complexity | query expression |
|-----------|--------------|-----------------|---------------------------|-------------------|------------------|------------|
|Compare different vaccination priority strategies. Which results in fewer total hospitalizations after a two month span? Does either strategy disproportionately affect specific demographic groups? i.e., would health inequities be amplified by either approach? Strategy 1: Vaccinate by age group, starting with the older population and work younger.  Strategy 2: Vaccinate by type of employment (exposure-based, i.e., essential workers) | intermediate (intervention) | simple (no data, possible priors) | simple (comparison of two interventions) | simple (modification of age and occupation-stratified rate parameters) | sophisticated (real-world data) |To represent Health inequities, we have to generate a counterfactual query comparing the effect on protected demographics with each intervention.  In this case, we need to stratify by these demographics in addition to age and occupation. Relative risk measures how disproportionately each demographic is hospitalized under each intervention compared to their proportion in the population.. $$RelativeRisk_{x \in Demographics} \left( \sum_{t=0}^T hospitalizations(t, x)_{do(strategy_1)})\right) -  RelativeRisk_{x\in Demographics}\left(\sum_{t=0}^T hospitalizations(t, x)_{do(strategy_2)}\right)$$ where strategy 1 vaccinates at a rate proportional to the hospitalization rate parameter, $strategy_1 := do(\nu_{age} \propto \delta_{age})$ and strategy 2 vaccinates occupations at a rate proportional to their infection rate parameter. $strategy_2 := do(\nu_{occupation} \propto \beta_{occupation})$|
|Question/Ask: Looking back to this time, which interventions could we have implemented to keep below a hospitalization threshold of 3k covid patients, over the winter 2020 season (Dec. 1st 2020 to March 1st 2021)? (Can this be stated probabilistically? How likely would an intervention have enabled us to reach our goal?) .Very limited social distancing and masking policies (say this would only apply to healthcare settings, assume 5% decrease from normal contact/transmission levels) beginning right at the start of the period on Dec 1, 2021, through March 1, 2021. Stronger social distancing and masking policies, wait until after holidays and begin on Jan 1st, 2021 through end of 3 months (until March 1, 2021). What is the severity of intervention required for this option to have been successful? (for CHIME model this intervention maps to % decrease from baseline transmission levels). Modeling constraint/goal: keep covid hospitalizations < threshold over next 0<=t<3 months | sophisticated (counterfactual query) | sophisticated (inverse problem) | sophisticated (search over intervention space) | sophisticated (time-based interventions) | sophisticated (real-world data) | $$\begin{array}{rl}\texttt{satisfy}_{x\in \left\{\beta_{12/01/2020-3/01/2021} = 0.95\beta,\beta_{01/01/2021-3/01/2021}=0.75\beta\right\}}& P(Hospitalizations(t)_{do(x)} > 3k) < p \\
 & \forall t\in \left[12/1/2020-3/1/2021\right] \end{array}$$ 
|Question/Ask: Looking back to this time, which interventions could we have implemented to keep below a hospitalization threshold of 3k covid patients, over the winter 2020 season (Dec. 1st 2020 to March 1st 2021)? (Can this be stated probabilistically? How likely would an intervention have enabled us to reach our goal?) Vaccination by age.  Vaccination by occupation. What is the severity of intervention required for this option to have been successful? (for CHIME model this intervention maps to % decrease from baseline transmission levels). Modeling constraint/goal: keep covid hospitalizations < threshold over next 0<=t<3 months | sophisticated (counterfactual query) | sophisticated (inverse problem) | comparison of two alternativs.| sophisticated (time-based interventions) | sophisticated (real-world data) |$$\begin{array}{rl}\texttt{satisfy}_{x\in \left\{\nu_{age}, \nu_{occupation}\right\}} & P(Hospitalization(t)_{do(x)} > 3k) < p \\
 & \forall t\in \left[12/1/2020-3/1/2021\right] \end{array}$$ |
|What vaccination rate(s) $\nu$ would these two groups need to have over the next 3 months, in order to lower the observed case rate for those age groups below 10 cases per 100k population?|intermediate (interventions) | intermediate (prior distribution) | sophisticated (2 choices) | simple (parameter change) | sophisticated (real-world data) | $$\begin{array}{rl}\texttt{minimize} & \nu \\ \texttt{such that} & P(Infected(t)_{do(\nu)} > 0.01\%) < p \\
 & \forall t > 4/1/2021 \end{array}$$ |
 |What vaccination rate(s) would these two groups need to have over the next 3 months, in order to lower the observed case rate for those age groups below 10 cases per 100k population?|intermediate (interventions) | intermediate (prior distribution) | simple (2 choices) | simple (parameter change) | sophisticated (real-world data) | $$\begin{array}{rl}\texttt{satisfy}_{x\in \left\{\nu_{age}, \nu_{occupation}\right\}} & P(Infected(t)_{do(x)} > 0.01\%) < p \\
 & \forall t > 4/1/2021 \end{array}$$ |
|What mask mandate/social distancing strength would these two groups need to have over the next 3 months, in order to lower the observed case rate for those age groups below 10 cases per 100k population | intermediate (interventions) | intermediate (prior distribution) | simple (2 choices) | simple (parameter change) | sophisticated (real-world data) | $$\begin{array}{rl}\texttt{satisfy}_{x\in \left\{\beta_{12/01/2020-3/01/2021} = 0.95\beta,\beta_{01/01/2021-3/01/2021}=0.75\beta\right\}}& P(Infected(t)_{do(x)} > 0.01\%) < p \\
 & \forall t> 4/1/2021 \end{array}$$ | 
Which of these models works better under training conditions? Which one should I trust more under certain conditions? Which one performed better during $t$ time period? | simple (no interventions) | intermediate (fitting/callibration for different scenarios)| intermediate (range of different comparison criteria) 3| simple (parameters and initial conditions) | complex (real-world data) | $$\texttt{compare}_{x(t)\in Scenarios}(\left\{CHIME(x(t))\right\}_{t=0}^T, \left\{SIDARTHE(x(t))\right\}_{t=0}^T|\left\{x(t)\right\}_{t=0}^T)$$
Which of these models works better under holdout conditions? Which one should I trust more under certain conditions? Which one performed better during $t$ time period? | simple (no interventions) | intermediate (possible callibration for different scenarios)| intermediate (range of different comparison criteria) | simple (parameters and initial conditions) | complex (real-world data) | $$\texttt{compare}_{x(t)\in Scenarios}(\left\{CHIME(x(t))\right\}_{t=T+1}^{T+k}, \left\{SIDARTHE(x(t))\right\}_{t=T+1}^{T+k}|\left\{x(t)\right\}_{t=0}^T)$$

# December Demo Scenarios
 
These are scenarios that are described in more detail in the google doc [December Demo Example Scenarios](https://docs.google.com/document/d/1Obgelbwv8eceqAVKLaJjozpePFelZqGl/edit)
## Scenario 1:  Comparing the effect of early and late mask mandates on hospitalizations

Question/Ask: Looking back to this time, which interventions could we have implemented to keep below a hospitalization threshold of 3k covid patients, over the winter 2020 season (Dec. 1st 2020 to March 1st 2021)? 
* (Can this be stated probabilistically? How likely would an intervention have enabled us to reach our goal?) 
* Very limited social distancing and masking policies (say this would only apply to healthcare settings, assume 5% decrease from normal contact/transmission levels) beginning right at the start of the period on Dec 1, 2021, through March 1, 2021. 
* Stronger social distancing and masking policies, wait until after holidays and begin on Jan 1st, 2021 through end of 3 months (until March 1, 2021). 
* What is the severity of intervention required for this option to have been successful? (for CHIME model this intervention maps to % decrease from baseline transmission levels). 
* Modeling constraint/goal: keep covid hospitalizations < threshold over next 0<=t<3 months 

 $$\begin{array}{rl} \texttt{compare}_{ x(t)\in \\ \left\{\texttt{Intervention}(12/01/2020-3/01/2021) := 0.95\beta,\\ \ \texttt{Intervention}(01/01/2021-3/01/2021) := 0.75\beta\right\}}  & \texttt{Hospitalizations}(t)_{do(x(t))} < 3k, \\
  & \forall t\in \left[12/1/2020-3/1/2021\right])\end{array}$$ 

In [None]:
t = Interval("t", start=Date(12, 1, 2020), end=Date(3, 1, 2021))
Hospitalizations = Fluent(t, "Hospitalizations", concept="cemo:hospitalization_rate", units=[])

Beta = Parameter("beta", concept="askemo:0000005", units=["1/Day"])

early_weak_mask_intervention = Decision(
    "early_weak_mask_intervention",
    variable=[Beta],
    value=[0.95 * Beta],
    interval=Interval(start=Date(12, 1, 2020), end=Date(3, 1, 2021)),
)

late_strong_mask_intervention = Decision(
    "late_strong_mask_intervention",
    variable=Beta,
    value=0.75 * Beta,
    interval=Interval(start=Date(1, 1, 2021), end=Date(3, 1, 2021)),
)

compare(
    [
        (Hospitalizations[t] @ early_weak_mask_intervention)  <= 3e3,
    ],
    [
        (Hospitalizations[t] @ late_strong_mask_intervention) <= 3e3,
    ],
)



In [None]:
t = Interval("t", start=Date(12, 1, 2020), end=Date(3, 1, 2021))
observed_hospitalizations = pd.read_csv('https://coronavirus.health.ny.gov/daily-hospitalization-summary')
t0 = Interval("t0", start=Date(10,8,2020), end=Date(5,21,2021))
Hospitalizations = Fluent(t, "Hospitalizations", concept="cemo:hospitalization_rate", units=[])

Beta = Parameter("beta", concept="askemo:0000005", units=["1/Day"])

early_weak_mask_intervention = Decision(
    "early_weak_mask_intervention",
    variable=[Beta],
    value=[0.95 * Beta],
    interval=Interval(start=Date(12, 1, 2020), end=Date(3, 1, 2021)),
)

late_strong_mask_intervention = Decision(
    "late_strong_mask_intervention",
    variable=Beta,
    value=0.75 * Beta,
    interval=Interval(start=Date(1, 1, 2021), end=Date(3, 1, 2021)),
)

compare(
    [
        P(Hospitalizations[t] @ early_weak_mask_intervention  >= 3e3 | Hospitalizations[t0]  == observed_hospitalizations[t0]) <= 0.05,
    ],
    [
        P(Hospitalizations[t] @ late_strong_mask_intervention >= 3e3 | Hospitalizations[t0] == observed_hospitalizations[t0]) <= 0.05,
    ],
)

## Finding the minimum vaccination rate to reduce infections below 10 per 100k after April 1.

What vaccination rate(s) $\nu$ would these groups need to have over the next 3 months, in order to lower the observed case rate for those age groups below 10 cases per 100k population?

 $$\begin{array}{rl}\texttt{minimize}_{\nu(\tau)\in [0,1]} & \sum_{\tau=1/1/2021}^{3/1/2021}\nu(\tau)\\ \texttt{subject to} & P(Infected(t)_{do(\nu(\tau))} > 0.01\% \ |\  Infected(1/1/2021)=0.08 ) < 5\% \\
 & \forall t > 4/1/2021 \end{array}$$
 
 

In [None]:
t = Interval("t", start=Date(4, 1, 2021), end=Date(5, 1, 2021))
tau = Interval("tau", start=Date(1,1,2021), end=Date(3,1,2021))
vaccine_rate = Fluent("vaccine_rate", 
                                concept="askemo:0000012",
                                domain = tau,
                                codomain=Interval( "vaccinations_per_day",
                                              start=0,
                                              end = 1)
                                )
                        

Infected = Fluent("Infected", concept="ido:0000511", domain=t, codomain=Interval("cases_per_100k",start=0, end=1))
p = 0.05
minimize(
    objective=Sum[tau](vaccine_rate[tau]),
    subject_to=[P((Infected[t] @ vaccine_rate[tau]) >= 0.01 | Infected[Date(1,1,2021)] == 0.08) < p]
)

## Scenario 3: Comparison of CHIME and SIDARTHE models on a dataset

### Which of these models works better under training conditions?
* Which one should I trust more under certain conditions?
* Which one performed better during $t$ time period? 

$$\texttt{compare}_{x(t)\in Scenarios}(\left\{CHIME(x(t))\right\}_{t=0}^T, \left\{SIDARTHE(x(t))\right\}_{t=0}^T|\left\{x(t)\right\}_{t=0}^T)$$




In [None]:
t = Interval("t", metaid="wikidata:Q11471", units="wiki:Q573")
CHIME = Intervention(metaid="https://penn-chime.phl.io/")
SIDARTHE = Intervention(t, metaid="biomodels:BIOMD0000000955")
Infected = Fluent(t, "Infected", metaid="ido:0000511")
Context = Fluent(t, metaid="")
compare(
    [P(Infected[t] @ CHIME[t] | Context[t]), 0 <= t, t <= T],
    [P(Infected[t] @ SIDARTHE[t] | Context[t]), 0 <= t, t <= T],
)

### Which of these models works better under holdout conditions? 

* Which one should I trust more under certain conditions?
* Which one performed better during $t$ time period? 

$$\texttt{compare}_{x(t)\in Scenarios}(\left\{CHIME(x(t))\right\}_{t=T+1}^{T+k}, \left\{SIDARTHE(x(t))\right\}_{t=T+1}^{T+k}|\left\{x(t)\right\}_{t=0}^T)$$

In [None]:
t = Interval("t", metaid="wikidata:Q11471", units="wiki:Q573")
t0 = Interval("t", metaid="wikidata:Q11471", units="wiki:Q573")
CHIME = Intervention(t, metaid="https://penn-chime.phl.io/")
SIDARTHE = Intervention(t, metaid="biomodels:BIOMD0000000955")
Infected = Fluent(t, "Infected", metaid="ido:0000511")
Scenario = Fluent(t0, metaid="")
compare(
    [P(Infected[t] @ CHIME[t] | Context[t0]), T + 1 <= t, t <= T + k, 0 <= t0, t0 <= T],
    [P(Infected[t] @ SIDARTHE[t] | Context[t0]), T + 1 <= t, t <= T + k, 0 <= t0, t0 <= T],
)

# DeDri syntax for causal and counterfactual queries

In [None]:
from y0.dsl import P, Sum, Variable, Product
from IPython.display import Latex

In [None]:
Infected = Variable("Infected")
Recovered = Variable("Recovered")
Vaccinated = Variable("Vaccinated")
Died = Variable("Died")

**What is the probability that an infected, vaccinated person recovers?**

In [None]:
P(Recovered | Infected, Vaccinated)

**What is the probability that an  infected, unvaccinated,person does not recover?** 

In [None]:
P(~Recovered | Infected, ~Vaccinated)

**What is the probability that an infected person would recover if they were vaccinated, but die if they were not vaccinated?**

In [None]:
P(Recovered @ Vaccinated, ~Recovered @ ~Vaccinated | Infected)

In [None]:
import y0

y0.dsl.P

# Causal Validation of simple epidemiology queries with latent confounders

If there is latent confounding, then there exist spurious correlations in the data.  Identification algorithms can be used to determine whether the causal query can be estimated from the dataset.

In [None]:
from y0.examples import id_sir_example, nonid_sir_example
from y0.algorithm.identify import Identification, identify, Unidentifiable

nonid_sir_example.graph.draw()

## Causal effect of infection on death with latent confounder (immunocompromised) is not identifiable.

In [None]:
nonid_sir = Identification.from_expression(
    query=P[Infected](Died), estimand=P(Infected, Died), graph=nonid_sir_example.graph
)
try:
    identify(nonid_sir)
except Unidentifiable:
    display(nonid_sir.query.expression)
    display(Latex("is not identifiable given the model"))

## Causal effect of infection on death with hospitalization mediator is identifiable

In [None]:
from y0.algorithm.identify import Identification, identify, Unidentifiable
from IPython.display import Latex, Markdown
from y0.dsl import P, Variable, Sum, Product

Infected, Died, Hospitalized = Variable("Infected"), Variable("Died"), Variable("Hospitalized")

id_sir = Identification.from_expression(
    query=P[Infected](Died), estimand=P(Infected, Hospitalized, Died), graph=id_sir_example.graph
)
id_sir.graph.draw()

In [None]:
estimand = identify(id_sir)
display(
    Latex(f"The query "),
    id_sir.query.expression,
    Latex(f"is identifiable and has estimand: "),
    estimand,
)

$$E[Y_{do(A)} - Y_{do(\lnot A)}]$$

$$\sum_{y_{do(A)}\in dom(Y_{do(A)})}y_{do(A)}P(y_{do(A)}) - \sum_{y_{do(\lnot A)}\in dom(Y_{do(\lnot A)})}y_{do(\lnot A)}P(y_{do(\lnot A)})$$ 

# Data cubes

In [9]:
import xarray as xr
import pickle
import numpy as np

with open('dimensions.pkl', 'rb') as d:
    dimensions = pickle.load(d)
for d in dimensions:
    print(d, len(dimensions[d]))

experimental conditions 5
replicates 500
attributes 4
timesteps 83


In [6]:
datacube = np.load("datacube.npy")
datacube.shape

(5, 500, 4, 83)

In [12]:
da = xr.DataArray(
    data=datacube,
    dims=["experimental conditions",
          "replicates",
          "attributes",
          "timesteps"],
    coords=dimensions)
da.to_netcdf("ciemss_datacube.nc")

    

In [13]:
da2 = xr.load_dataarray('ciemss_datacube.nc')
da2


In [10]:
xr.__file__

'/Users/zuck016/.pyenv/versions/anaconda3-2021.11/lib/python3.9/site-packages/xarray/__init__.py'

# Hackathon 1

## Scenario 1

### Scenario Ask: 
While perusing publications on Covid-19 models, you come across a model from early 2020 that was developed to describe the first Covid wave in Lombardy, Italy. You’re interested in updating this model for 2022, to include vaccinations.


1. Before updating the model, you want to make sure you have a good understanding of the original model, can execute it, and reproduce the results in the publication describing the model. The paper doesn’t include code, but you think it’s feasible to create an executable version of the model and reproduce the results based on the model descriptions in the paper alone. The paper DOI is: [10.3389/fpubh.2020.00230](https://www.frontiersin.org/articles/10.3389/fpubh.2020.00230/full). There are three ‘unit tests’ to ensure the model representation that we want to execute, is correct: Reproduce the results in Figs. 2A, 3A, and 3B

    1. [Challenge] Ingest model and pass unit tests from publication alone (do not start with any code as input)
    2. Ingest model and pass unit tests from publication and corresponding Code Version A 
    3. Ingest model and pass unit tests from publication and corresponding Code Version B 

2. Update the model from Question 1, to include vaccination. There are a number of ways to implement vaccination in an epidemiological model, but no matter the modeling approach, it should have an impact on one or more disease outcomes (e.g infections or deaths). Ensure your updated model is not the same as the model referenced in question 3a. Aside from these guidelines, there are no restrictions on modeling choices. If it is not clear how to update the model, do a small literature review/search, to understand how other published models account for vaccination.
3. Model Comparison: 
	In addition to your updated model from Question 2 you are aware of the following two specific models that include vaccination
	You find a publication that adds vaccination to the original model from question 1, at https://biomedres.us/pdfs/BJSTR.MS.ID.007413.pdf. (Please note the formatting error on pg. 4, where the first term in the equation for dS/dt should be μN)
	You are also aware of the CHIME SVIIvR model (which adds vaccination to the original CHIME model, and was part of the starter kit)
	Do a structural model comparison between the models in questions 2, 3.a.i, and 3.a.ii. The structural comparison should include a summary or diagram describing similarities and differences between the models, with respect to parameters, variables/states, pathways, etc.
	Compare simulation outputs between the three models, for the following two scenarios. Assume initial values and parameter values are consistent (to the extent possible) with Table 1 in https://biomedres.us/pdfs/BJSTR.MS.ID.007413.pdf. For initial values that are not specified, choose reasonable values and ensure they are the same between the three models being compared.
	Vaccine efficacy = 75%, population vaccinated = 10%
	Vaccine efficacy = 75%, population vaccinated = 80%
	Create an equally weighted ensemble model using the three models in 3b, and replicate the scenarios in 3.c.i and 3.c.ii. How does the ensemble model output compare to the output from the individual component models?
	For any of the models in question 3, conduct a sensitivity analysis to determine which intervention parameters should be prioritized in the model, for having the greatest impact on deaths – NPIs, or vaccine-related interventions?
	For any of the models in question 3, add age stratification to the model and leverage data from the provided contact matrix and following resources. You may ignore vital dynamics. Assume that vaccination status does not have an impact on contact rates between age groups. Assume age-specific vaccination, vaccine effectiveness, hospitalization, and mortality rates, if relevant to the model. For other parameters, you may find reasonable values from the literature (including any of the papers referenced in this scenario) and/or make simplifying assumptions about whether they have different values based on age group.
	For age-specific vaccine effectiveness parameters – you can utilize data compiled by the US CDC, available here (https://covid.cdc.gov/covid-data-tracker/#vaccine-effectiveness). You can assume that only mRNA vaccines are used and that efficacy data in Italy would be similar to that of the United States. For a search and discovery challenge, you can try to identify vaccine utilization by manufacturer in the target area and align this data with vaccine-specific efficacy data across age groups for the time window in question. This task should not, however, be a limiting factor in making progress on subsequent downstream TA tasks.
	You may find Italy population distribution data/information, or vaccination rates by age group, from any source, or make a simplifying assumption about similarities with data the United States
	See provided contact matrix – “Italy_contact_matrix.csv”. Matrix values represent mean number of contacts that an individual from an age group represented by each row, would encounter with age groups represented by each column. There are 16 five-year age groups from 0-80 years, with X1 representing the youngest age group, and X16 representing the oldest age group.
	With the age-stratified model, simulate the following situations. You may choose initial values that seem reasonable given the location and time, and you can reuse values from any of the publications referenced):
	High vaccination rate among older populations 65 years and older (e.g. 80%+), and low vaccination rate among all other age groups (e.g. below 15%)
	High vaccination rate among all age groups
	Repeat d.i and d.ii, but now add a social distancing policy at schools, that decreases contact rates by 20% for school-aged children only. 
	Compare and summarize simulation outputs for d.i-d.iii


### 3c
Compare simulation outputs between the three models, for the following two scenarios. Assume initial values and parameter values are consistent (to the extent possible) with Table 1 in https://biomedres.us/pdfs/BJSTR.MS.ID.007413.pdf. For initial values that are not specified, choose reasonable values and ensure they are the same between the three models being compared.

1. Vaccine efficacy = 75%, population vaccinated = 10%
2. Vaccine efficacy = 75%, population vaccinated = 80%


In [1]:
M1 = "SVEIIvR"
M2 = "SEIRD+V"
M3 = "SVIIvR (CHIME)"
Compare([M1, M2, M3], 
        conditions={betaV: 0.75*beta, deltaV: 0.75*delta},
        interventions=[{V:0.8, S:0.2}, {V:0.1, S:0.9}])


NameError: name 'Compare' is not defined

### 4.
Create an equally weighted ensemble model using the three models in 3b, and replicate the scenarios in 3.c.i and 3.c.ii. How does the ensemble model output compare to the output from the individual component models?


In [None]:
Compare([(M1+M2+M3)/3, M1, M2, M3],
        conditions={betaV: 0.75*beta, deltaV: 0.75*delta},
        interventions=[{V:0.8, S:0.2}, {V:0.1, S:0.9}]

##### 5
For any of the models in question 3, conduct a sensitivity analysis to determine which intervention parameters should be prioritized in the model, for having the greatest impact on deaths – NPIs, or vaccine-related interventions?



In [None]:
NPI  = {beta="better_than_vaccine", delta="better_than_no_NPI"}
vaccine_intervention = {beta="better_than_no_vaccine", delta="better_than_NPI"}
Compare(no_intervention, NPI_only, Vaccine_only, vaccine_plus_intervention)
for model in [no_intervention, NPI_only, Vaccine_only, vaccine_plus_intervention]:
    local_sensitivity = LocalSensitivityAnalysis(model, intervention_parameters, params, initial_conditions)

    global_sensitivity = GlobalSensitivityAnalysis(model, intervention_parameters, param_ranges, initial_condition_ranges)


### 6d

d. With the age-stratified model, simulate the following situations. You may choose initial values that seem reasonable given the location and time, and you can reuse values from any of the publications referenced):
i. High vaccination rate among older populations 65 years and older (e.g. 80%+), and low vaccination rate among all other age groups (e.g. below 15%)
ii. High vaccination rate among all age groups
iii. Repeat d.i and d.ii, but now add a social distancing policy at schools, that decreases contact rates by 20% for school-aged children only. 
iv. Compare and summarize simulation outputs for d.i-d.iii


In [2]:
Scenario1 = {V[65:]: 0.8,
    S[65:]: 1 - V[65:],
    V[:65]: 0.15,
    S[:65]: 1 - V[:65]}

Scenario2 = {V[:]: 0.8}
Scenario3 = Scenario1 + {beta[:18]: 0.2*beta[:18] for model.beta in models}
Scenario4 = Scenario2 + {beta[:18]: 0.2*beta[:18] for model.beta in models}



NameError: name 'V' is not defined

## Scenario 2

### 
Scenario Background: You are a disease modeler supporting the Los Angeles County Department of Public Health, at the beginning of the original Omicron wave. The LA County Board of Supervisors is concerned about what the next few months will look like, and what level of intervention will be required to manage what is shaping up to be a large winter Covid-19 wave. Vaccines were broadly available during this time period and vaccination should be accounted for in the modeling.

### Scenario Setting/Situation:
Time = December 28th, 2021 (right around upswing of Omicron wave), Location = LA County

### Scenario Asks:

1. Find a model capable of forecasting Covid cases and hospitalizations (these don’t need to be broken down by vaccination status, but the model should account for vaccination in some way). Parameterize model either using data from the previous two months (October 28th – December 28th, 2021), or with relevant parameter values from the literature. Forecast Covid cases and hospitalizations over the next 3 months under no interventions.

In [None]:
t = Interval(start=Date(12,29,2021), end=Date(3,1,2022))
t0 = Interval(start=Date(10,28, 2021), end= Date(12, 28, 2021)
P(theta[t] | S[t0], V[t0], I[t0], Iv[t0], H[t0], Hv[t0], R[t0] )
P(theta[t])

2. Based on the forecast, do we need interventions to keep total Covid hospitalizations under a threshold of 3000 on any given day? If there is uncertainty in the model parameters, express the answer probabilistically, i.e., what is the likelihood or probability that the number of Covid hospitalizations will stay under this threshold for the next 3 months without interventions?

In [None]:
P((H[t] + Hv[t]) > 3000) < 0.05
Risk_metric = "SuperQuantile"


3. Assume a consistent policy of social distancing/masking will be implemented, resulting in a 50% decrease from baseline transmission. Assume that we want to minimize the time that the policy is in place, and once it has been put in place and then ended, it can't be re-implemented. Looking forward from “today’s” date of Dec. 28, 2021, what are the optimal start and end dates for this policy, to keep projections below the hospitalization threshold over the entire 3-month period? How many fewer hospitalizations and cases does this policy result in?

4. Independent from #3, assume there is a protocol to kick in mitigation policies when hospitalizations rise above 80% of the hospitalization threshold (i.e. 80% of 3000). When hospitalizations fall back below 80% of the threshold, these policies expire.
 
    1. When do we expect these policies to first kick in?
    2. What is the minimum impact on transmission rate these mitigation policies need to have the first time they kick in, to (1) ensure that we don't reach the hospitalization threshold at any time during the 3-month period, and (2) ensure that the policies only need to be implemented once, and potentially expired later, but never reimplemented? Express this in terms of change in baseline transmission levels (e.g. 10% decrease, 50% decrease, etc.).

5. Now assume that instead of NPIs, the Board wants to focus all their resources on an aggressive vaccination campaign to increase the fraction of the total population that is vaccinated. What is the minimum intervention with vaccinations required in order for this intervention to have the same impact on cases and hospitalizations, as your optimal answer from question 3? Depending on the model you use, this may be represented as an increase in total vaccinated population, or increase in daily vaccination rate (% of eligible people vaccinated each day), or some other representation. 