<a href="https://colab.research.google.com/github/wdempsey/AI4Health-Online-Experimentation/blob/main/part1_mrt.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Online learning and experimentation algorithms in mobile health

In part 1, we will discuss the _micro-randomized trial design_ and the corresponding primary data analysis methods.  By the end of this section, you should be able to answer the following set of questions:
- What is a just-in-time adaptive intervention (JITAI)? 
- What is a micro-randomized trial?
- What is causal excursion effect?  How does one estimate this effect?
- What are the  goals of early-stage optimization trials?

In addition, we will introduce the _HeartSteps simulator_.  This is a set of functions which allows us to generate synthetic users in a mock MRT.  We will use this simulator in additional sections as well. 

In [1]:
## Import necessary 
import numpy as np
import scipy as sp
from sklearn.linear_model import LinearRegression

# Part 1: Just-in-time adaptive interventions

A JITAI is ``is an intervention design aiming to provide the right type/amount of support, at the right time, by adapting to an individual's changing internal and contextual state'' (Nahum-Shani, 2018).  

- The term “just-in-time support” is used to describe an attempt to provide the right type (or amount) of support, at the right time.
- Timing is largely event-based, e.g., "a moment of high vulnerability and high receptivity".  
  - Ex. For a person attempting to quit smoking, a moment of high stress may lead to high likelihood of relapse.  If the person is currently available (e.g., no meeting on Google Calendar) and not currently active (e.g., not out for a walk), then the person may be receptive to a brief prompt aimed at reducing proximal stress.
- __Decision Points__: A decision point is a time at which an intervention decision is made. 
- __Intervention Options__: An array of possible treatments or actions that might be employed at any given decision point.
  - Ex. Options are ``Send Message`` or ``Do Nothing``
- __Tailoring Variable__: A tailoring variable is information concerning the individual that is used to decide when (i.e., under what conditions) to provide an intervention and which intervention to provide. 
  - Ex. An individual's distance from a high-risk location (A-CHESS)
- __Outcome__:
  - __Distal outcome__: Ultimate goal the intervention is intended to achieve; it is usually a primary clinical outcome, such as weight loss, drug/alcohol use reduction or increase in average activity level.  
  - __Proximal outcome__: Short-term goals the intervention options are intended to achieve.  Typically thought to be on the causal pathway (i.e., a mediator).
- __Decision rules__:  Operationalize the adaptation by specifying which intervention option to offer, for whom, and when. 
  - Ex. __``If``__ ``At High Risk Location``, __``Then``__ ``IO = Send Message``, __``Else``__ ``IO = Do Nothing``.

A JITAI is an _intervention design_.  Behavioral scientists often have questions in how to best design a JITAI for a particular behavioral health setting.  Consider an mHealth smoking cessation setting.  Scientists may wish to intervene by either sending a reminder to practice mindfulness (hopefully reducing proximal stress) or not; however, it is unknown whether sending the message when the individual is currently stressed (high vulnerability but low receptivity) is better than when the individual is current not stressed (low vulnerability but high receptivity).  

- __Group Task 1__: Construct a JITAI to be included in a smoking cessation mHealth intervention package based on the above.  Be sure to highlight the 5 key elements.   


#Part 2: Micro-randomized trials (MRTs)

MRTs are an experimental design to collect data to answer questions about the construction of JITAIs. 

- For each person in a study, let $t=1,\ldots, T$ denote a sequence of decision points.  
- At each decision time $t$,  we observe a state variable $S_t \in \mathbb{R}^p$.  
- After observing the state variable $S_t$, the _clinical trialist_ decides to take action $A_t \in \mathcal{A}$ with probability $p_t (A_t \mid H_t)$ (i.e., the randomization probability may depend on the observed history $H_t$).  
- After observing state $S_t$ and taking action $A_t$, the agent observes the proximal response $Y_{t+1}$.  The proximal response is a deterministic function of state, action, and next state (i.e., $Y_{t+1} = g(S_t, A_t, S_{t+1})$)
- The sequence of state, action, and reward at a sequence of decision points defines a _micro-randomized trial_, $\{ S_t, A_t, Y_{t+1} \}_{t=1}^T$.
- Here, our goal is to collect data to optimize an intervention component
  - Q1: Should we include this intervention component in an overall intervention package?
  - Q2: What should the decision rule be in the optimized JITAI?

In [None]:
# Simulation example
T = 200 # number of steps

## Generate context (normal and binary states)
mu, sigma = 0, 1 # mean and standard deviation
state1 = np.random.normal(mu, sigma, T) # Continuous state
state2 = np.random.binomial(n=1, p = 0.7,size=T) # Binary state
state = np.stack((state1,state2), axis = 1) # Compelte State at each time

## Generate actions (MRT with probability  )
action = np.random.binomial(n=1, p = 0.6,size=T) # Binary state

## Generate true reward
def proximaloutcome(state, action):
  base_reward = state[0] + 0.3*state[1] 
  advantage = 0.5*state[0] - 0.7*state[1]
  return base_reward + advantage * (action - 0.6)

y = np.repeat(0.,T)
for t in range(T):
  y[t] = reward(state[t,:], action[t]) + np.random.normal(0, 1, 1)


## Triple
triple = np.column_stack((state,action, y))
print("First 10 entries of state (2D), action, and reward")
print(triple[1:10,:])
print("\n")

## Build the design matrix
X = state
for col in range(2):
  temp = np.multiply(state[:,col],action)
  X = np.column_stack((X, temp))

reg = LinearRegression().fit(X,y)
print("True coefficients using linear model")
print(np.array([1,0.3,0.5,-0.7]))
print("Fitted coefficients using linear model")
print(reg.coef_)





First 10 entries of state (2D), action, and reward
[[ 0.4889732   0.          0.          0.0970272 ]
 [-1.65526392  1.          0.         -2.15776888]
 [ 0.88552154  0.          0.          1.29955221]
 [-1.21393632  1.          0.          0.97508805]
 [-1.79251348  1.          0.         -1.62145004]
 [ 1.42910115  0.          1.          1.68040017]
 [-2.05180643  1.          0.         -3.29626183]
 [ 0.2045409   1.          0.          1.70142421]
 [ 2.92629192  1.          0.          1.87006771]]


True coefficients using linear model
[ 1.   0.3  0.5 -0.7]
Fitted coefficients using linear model
[ 1.08134331  0.49970343  0.50450428 -0.64743794]


## Question 1: How can we adapt the traditional RCT estimand to the current setting?

In an RCT, the __average treatment effect__ (ATE) is of interest.  This is defined as
$$
\mathbb{E} \left[ Y(1) - Y(0) \right] 
$$
where $Y(1)$ and $Y(0)$ are the potential outcomes for the participant under treatment $(z=1)$ and control $(z=0)$ respectively.  The expectation is with respect to the population.  

- The only thing that impacts decision to choose $a = 1$ or $a=0$ is the _advantage function_:
$$
A(s) = r(s,1) - r(s,0)
$$
- In the example above
$$
A(s) = 0.5 s_{0} - 0.7 s_{1} > 0 \Rightarrow \frac{0.5}{0.7} s_0 > s_1
$$




## Question 2: What goes into choosing the randomization probabilities?

- Why may we not want to use a simple Bernoulli $p=1/2$ coin flip to collect data in all micro-randomized trials?


Some reasons include
  - __Burden__: users may not tolerate receiving many messages per day.  Suppose there were 5 decision points per day.  In an mHealth study aimed at increasing physical activity, too many messages sent on average may over-burden users?  How do we find out this dosage?
  - __Availability__:  sometimes it may not be possible due to ethical or feasibility issues to provide treatment.  
  - __At-risk times__: it may only be useful to provide interventions in certain states.  In Sense2Stop, a smoking cessation may only want to provide 
  - __Prior data__: 



# Part 2b: Running a synthetic MRT

The ``HeartSteps simulator`` is built upon HeartSteps V2, decision points are 6 times per day.  
Variables include
- __ID (string)__ is a unique identifier for the decision
Heartsteps ID (string) is the ID for the participant in the study
Test (boolean) is if the walking suggestion was a test explicitly sent to the participant, and shouldn’t be included in analysis
- __Decision time (datetime string YYYY-MM-DD HH:MM:SS)__ is the time the decision to treat or not treat the participant was made. This time might be slightly different than the time the participant was sent or received the walking suggestion push notification. The time of day is reported in a 24 hour format and is localized to the participant’s timezone at the decision time.
Sedentary (boolean) if the participant is sedentary, it means HeartSteps had recorded less than 250 steps in the last 60 minutes.
- __Treated (boolean)__ indicates that the participant was randomized to receive a walking suggestion at the time of the walking suggestion decision. This value is typically generated by the walking-suggestion-service.
- __Treatment Probability (decimal between 0 and 1)__ this is the probability that the participant would be sent a walking suggestion. 1 means the participant will be sent a walking suggestion, 0 means the participant will not be sent a walking suggestion. If Available is false, then the treatment probability should be zero. This value is typically generated by the walking-suggestion-service.
- __Notification Title (string)__ is the title of the push notification sent to the participant. If this field is missing, then a walking suggestion wasn’t sent to the participant.
- __Notification Message (string)__ is the message that was sent to the participant. If this field is missing, then a walking suggestion wasn’t sent.
Sent Time (datetime string YYYY-MM-DD HH:MM:SS) is the time that the walking suggestion decision was sent to the participant, which can be slightly different than the time the participant received the message. Hours are reported using a 24 hour clock, the timezone for this message is the participant’s current timezone. If this value is missing, it means the participant was randomized to not receive a message, or the decision was imputed (see below).
- __Engaged Time (datetime string YYYY-MM-DD HH:MM:SS)__ is when the participant clicked “Ok” at the bottom of the walking suggestion in the heartsteps-app. Hours are reported using a 24 hour clock, the timezone for this message is the participant’s current timezone. This value will be missing if the participant opened the push notification, but closed the heartsteps-app before clicking “Ok” -- this value could also be missing for the same reasons as sent time.
- __Location (string)__ is the participant’s location represented as a category at the walking suggestion decision time. The category is determined by comparing the participant’s last GPS location reported by the heartsteps-clock-face, to the list of places the participant entered during the onboarding process -- if the participant is within 500 meters of a defined place, they are determined to be at that place. Possible values are “home” “work” or “other”. It’s possible for this field to be empty, which indicates that we haven’t received a location record from the participant within the last 60 minutes.
- __Temperature (integer)__ the temperature in fahrenheit for the participant’s location at the walking suggestion decision time as reported by the DarkSky API. If there is no reported location for the walking suggestion decision, then the temperature reported is the average temperature for each place the participant defined in the onboarding process.
- __Precipitation Type (string)__ is the type of precipitation reported by the DarkSky API for the participant’s current location. Possible values are “None” “Rain” and “Snow”. If the participant’s location is missing at the walking suggestion time, this value represents the most extreme value from each place the participant defined during onboarding (eg Snow is more extreme than Rain, which is more extreme than None).
- __Precipitation Probability (decimal between 0 and 1)__ is the probability of precipitation reported by the DarkSky API for the participant’s current location. If the participant’s location is missing, then this value is the average precipitation probability at each place that a participant defined during onboarding.



In [None]:
## Reading in the data


## ADD CODE FOR EXPLORATORY DATA ANALYSIS HERE


# Part 3a: Causal Excursion

# Part 3a: Primary analysis method

# Part 3b: 