<a href="https://colab.research.google.com/github/wdempsey/AI4Health-Online-Experimentation/blob/main/part1_mrt.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Online learning and experimentation algorithms in mobile health



In part 1, we will discuss the _micro-randomized trial design_ and the corresponding primary data analysis methods.  By the end of this section, you should be able to answer the following set of questions:
- What is a just-in-time adaptive intervention (JITAI)? 
- What is a micro-randomized trial?
- What is a causal excursion effect?  How does one estimate this effect?
- What are the main goals of early-stage optimization trials?

In addition, we will introduce the _HeartSteps simulator_.  This is a suite of functions that you can access via the GitHub Repository (in folder ``./hs_simulator/``) which allows us to generate synthetic users in a synthetic MRT for increasing physical activity.  We will use this simulator throughout this practical session. 

In [None]:
## Import packages and check code cells run
import numpy as np
import scipy as sp
from sklearn.linear_model import LinearRegression

# Part 1: Just-in-time adaptive interventions


A JITAI is ``is an intervention design aiming to provide the right type/amount of support, at the right time, by adapting to an individual's changing internal and contextual state'' (Nahum-Shani, 2018).  

- The term “just-in-time support” is used to describe an attempt to provide the right type (or amount) of support, at the right time.
- Timing is largely event-based, e.g., "a moment of high vulnerability and high receptivity".  
  - Ex. For a person attempting to quit smoking, a moment of high stress may lead to high likelihood of relapse.  If the person is currently available (e.g., no meeting on Google Calendar) and not currently active (e.g., not out for a walk), then the person may be receptive to a brief prompt aimed at reducing proximal stress.
  - This is not "timing-based", e.g., need to take pills at 4pm every day.



There are 5 key components of a JITAI:
- __Decision Points__: A decision point is a time at which an intervention decision is made. 
- __Intervention Options__: An array of possible treatments or actions that might be employed at any given decision point.
- __Tailoring Variable__: A tailoring variable is information concerning the individual that is used to decide when (i.e., under what conditions) to provide an intervention and which intervention to provide. 
- __Outcome__:
  - __Distal outcome__: Ultimate goal the intervention is intended to achieve; it is usually a primary clinical outcome, such as weight loss, drug/alcohol use reduction or increase in average activity level.  
  - __Proximal outcome__: Short-term goals the intervention options are intended to achieve.  Typically thought to be on the causal pathway (i.e., a mediator).
- __Decision rules__:  Operationalize the adaptation by specifying which intervention option to offer, for whom, and when. 


A JITAI is an _intervention design_.  Behavioral scientists often have questions in how to best design a JITAI for a particular behavioral health setting.  Consider an mHealth smoking cessation setting.  Scientists may wish to intervene by either sending a reminder to practice mindfulness (hopefully reducing proximal stress) or not; however, it is unknown whether sending the message when the individual is currently stressed (high vulnerability but low receptivity) is better than when the individual is current not stressed (low vulnerability but high receptivity).  

We will break for 15 minutes for small group intros and a short exercise.

- __Group Task 1__: Identify the 5 elements from the following decision rule in a recovery support services mHealth study (A-CHESS):
  - __``If``__ ``At High Risk Location``, __``Then``__ ``IO = Send Message``, __``Else``__ ``IO = Do Nothing``.
- __Group Task 2__: Construct a JITAI to be included in a smoking cessation mHealth intervention package aimed at reducing proximal stress.  Be sure to highlight the 5 key elements.   


#Part 2: Micro-randomized trials (MRTs)

MRTs are an experimental design to collect data to answer questions about the construction of JITAIs. 

- For each person in a study, let $t=1,\ldots, T$ denote a sequence of decision points.  
- At each decision time $t$,  we observe a state variable $S_t \in \mathbb{R}^p$.  
- After observing the state variable $S_t$, the _clinical trialist_ decides to take action $A_t \in \mathcal{A}$ with probability $p_t (A_t \mid H_t)$ (i.e., the randomization probability may depend on the observed history $H_t$).  
- After observing state $S_t$ and taking action $A_t$, the agent observes the proximal response $Y_{t+1}$.  The proximal response is a deterministic function of state, action, and next state (i.e., $Y_{t+1} = g(S_t, A_t, S_{t+1})$)
- The sequence of state, action, and reward at a sequence of decision points defines a _micro-randomized trial_, $\{ S_t, A_t, Y_{t+1} \}_{t=1}^T$.
- Here, our goal is to collect data to optimize an intervention component
  - Q1: Should we include this intervention component in an overall intervention package?
  - Q2: What should the decision rule be in the optimized JITAI?

## Part 2a: A simple MRT example ($n=1$)

- $T = 200$
- $S_t = (S_{t1}, S_{t2})$ where $S_{t1}$ is continuous and $S_{t2}$ is a binary state
- $A_t \sim \text{Bern}(0.6)$
- Define the proximal outcome as
$$
Y_{t+1} = S_t^\prime \alpha + (A_t - 0.6) S_t^\prime \beta 
$$

In [None]:
# Simulation example
T = 200 # number of steps
mrt_prob = 0.6 # time-constant MRT randomizatoin probability

## Generate context (normal and binary states)
def generate_states(T):
  mu, sigma = 0, 1 # mean and standard deviation
  state1 = np.random.normal(mu, sigma, T) # Continuous state
  state2 = np.random.binomial(n=1, p = 0.7,size=T) # Binary state
  state = np.stack((state1,state2), axis = 1) # Compelte State at each time
  return state

## Generate actions (MRT with time-fixed probability)
def generate_actions(mrt_prob, T):
  action = np.random.binomial(n=1, p = mrt_prob,size=T) # Binary state
  return action

## Generate true reward
def proximaloutcome(state, action):
  base_reward = state[0] + 0.3*state[1] 
  advantage = 0.5*state[0] - 0.7*state[1]
  return base_reward + advantage * (action - 0.6)

## Generate single user MRT data
def generate_user(mrt_prob, T):
  state = generate_states(T)
  action = generate_actions(mrt_prob, T)
  y = np.repeat(0.,T)
  for t in range(T):
    y[t] = proximaloutcome(state[t,:], action[t]) + np.random.normal(0, 1, 1)
  ## Triple
  return state, action, y 

user_state, user_action, user_outcome = generate_user(mrt_prob, T)
user_data = np.column_stack((user_state,user_action, user_outcome))
print("First 10 entries of state (2D), action, and reward")
print(user_data[1:10,:])
print("\n")

First 10 entries of state (2D), action, and reward
[[-1.56796807  1.          1.         -1.41292395]
 [-0.68038446  1.          0.          1.28851448]
 [ 0.56845368  1.          1.          0.99664277]
 [ 0.54490252  1.          1.          1.92281527]
 [-0.5903505   0.          0.         -0.00705039]
 [-1.15709988  1.          1.         -0.91053801]
 [ 0.93598785  1.          0.          1.10105061]
 [ 0.2721574   1.          1.         -0.43646135]
 [-1.25861393  0.          0.         -0.14104161]]




In [None]:
## Let's fit a model to the user-data.
 
## Build the design matrix
X = user_state
for col in range(2):
  temp = np.multiply(user_state[:,col],user_action)
  X = np.column_stack((X, temp))

reg = LinearRegression().fit(X,user_outcome)
print("True coefficients using linear model")
print(np.array([1,0.3,0.5,-0.7]))
print("Fitted coefficients using linear model")
print(reg.coef_)

True coefficients using linear model
[ 1.   0.3  0.5 -0.7]
Fitted coefficients using linear model
[ 0.75319931  0.79561062  0.40836539 -0.85470583]


## Question 1: Why do we generate the proximal outcome in this way?

If we take the expectation over the centered treatment we have

$$
\mathbb{E} \left[ (A_t - p_t) \mid S_t \right] = 0
$$ 

This means that the first term is the baseline reward (averaging over treatment) and the second term is the treatment effect.  Centering effectively decouples the treatment effect model from the baseline model.

## Question 2: How can we adapt the traditional RCT estimand to the current setting?

In an RCT, the __average treatment effect__ (ATE) is of interest.  We use potential outcomes to define this.  Let $Y(z)$ denote the potential outcome under treatment $z$.  Then, the ATE is defined as
$$
ATE = \mathbb{E} \left[ Y(1) - Y(0) \right] 
$$
where $Y(1)$ and $Y(0)$ are the potential outcomes for the participant under treatment $(z=1)$ and control $(z=0)$ respectively.  The expectation is with respect to the target population. 

Here, we have different a sequence of proximal outcomes.  How would you think about defining contrasts similar to the ATE in the current setting?

We will formally define this below, but if we think about the potential outcomes, we realize that the proximal outcome at time $t$ will depend not only on the current treatment, __but all prior treatments as well__!  So we need to consider contrasts defined by
$$
Y_{t+1}( \bar a_{t-1}, 1 ) - Y_{t+1}( \bar a_{t-1}, 0 )
$$
The main issue is that there are $2^{t-1}$ contrasts. This is quite numerous so instead, we typically consider averages over prior treatment.
$$
\mathbb{E} \left[ Y_{t+1}( \bar A_{t-1}, 1 ) - Y_{t+1}( \bar A_{t-1}, 0 ) \right]
$$



## Question 3: What goes into choosing the randomization probabilities?

- Why may we not want to use a simple Bernoulli $p=1/2$ coin flip to collect data in all micro-randomized trials?


Some reasons include
  - __Burden__: users may not tolerate receiving many messages per day.  Suppose there were 5 decision points per day.  In an mHealth study aimed at increasing physical activity, too many messages sent on average may over-burden users?  How do we find out this dosage?
  - __Availability__:  sometimes it may not be possible due to ethical or feasibility issues to provide treatment.  
  - __At-risk times__: it may only be useful to provide interventions in certain states.  In Sense2Stop, a smoking cessation may only want to provide 
  - __Prior data__: 



## Small Group Exercises (15 minutes)

- Discuss about questions 1-3
- Extend the code to generate multiple users
- Try and generate individuals so that the average in the population is still the same, but each individual has their own treatment effect and baseline reward.


In [None]:
### Generate multiple users

### Refit the regression model

# Part 3a: Causal Excursion Effects

The fully marginal causal excursion effect is defined as

$$
\beta(t) = \mathbb{E} \left[ Y_{t+1}(\bar A_{t-1}, 1) - Y_{t+1} (\bar A_{t-1}, 0) \right]
$$

This is the marginal causal excursion effect.  It is very similar to the ATE (i.e., averaging over the covariate distribution); however, in our current setting, we are also averaging over __prior treatments__.  Thus the effect is a single decision point _excursion_ from the MRT randomization probability. Under the following (standard) causal inference assumptions,

- **Positivity**:
- **Sequential ignorability**:
- **Consistency**:

the effect can be re-expressed

$$
\beta(t) = \mathbb{E} \left[ Y_{t+1} \mid A_t = 1 \right] - \mathbb{E} \left[ Y_{t+1} \mid A_t = 0 \right]
$$


### Moderated effect

Often, we want to understand if certain time-varying covariates _moderate_ the treatment effect.  That is, does the effect of treatment change given the individual is in a particular state.  To address this, we define the 

$$
\beta(t;s) = \mathbb{E} \left[ Y_{t+1}(\bar A_{t-1}, 1) - Y_{t+1} (\bar A_{t-1}, 0) \mid S_t (A_t) = s \right]
$$

This is the marginal causal excursion effect.  It is very similar to the ATE (i.e., averaging over the covariate distribution); however, in our current setting, we are also averaging over __prior treatments__.  Thus the effect is a single decision point _excursion_ from the MRT randomization probability. Under the following (standard) causal inference assumptions,

- **Positivity**:
- **Sequential ignorability**:
- **Consistency**:

the effect can be re-expressed

$$
\beta(t) = \mathbb{E} \left[ Y_{t+1} \mid A_t = 1 \right] - \mathbb{E} \left[ Y_{t+1} \mid A_t = 0 \right]
$$


# Part 3a: Primary analysis method



# Part 3c: Simple simulation practice

__Group exercise__ (15 minutes): 

- Use the simulated MRT data to estimate the time-varying treatment effect.
- Plot the effect as a function of day in study.
- Extend the basic simulation so that $A_t$ depends on the state
- Show that weights are necessary to estimate the marginal causal excursion effect.
- Show that weights are not necessary to estimate the causal excursion effect conditional on current state.
- What do the results tell us about the intervention component? 
  - If this was a real s

In [None]:
## Fitting the WCLS

In the next section, we will start working with synthetic data built 