---
title: "Unit 2: Design of Experiments"
Subtitle: "Optimal Experimental Design"
author: "Sean Sylvia, Ph.D."
date: February 18, 2025
format:
  html:
    toc: true
    toc-depth: 2
execute:
  echo: true
  warning: false
  message: false
draft: true
---


## Introduction

Welcome, dear students, to another joyous adventure in the realm of experimental design! In our previous sessions, we tackled the fundamentals of **statistical conclusion validity** and dipped our toes into **power calculations** and **simulation**. We even brushed up against the perils of **clustering**, which appears A LOT in health services research since this is how health care is organized -- think hospitals, clinics, communities, social networks, etc.

Now, we turn the tables. Instead of passively accepting whatever nature (or large administrative database) throws at us, we’re in control. Unlike our secondary data analysis friends, we get to *design* our experiments to make the most of the resources we have. Specifically, we're interested in maximizing our abilitly to learn (i.e. statistical power) subject to our constraints (e.g. budget, logistical). 

Of course, nothing is for free; this is both a blessing and a curse. Tradeoffs abound as usual. We need a framework for thinking about how to optimally weigh costs and benefits. If only there was an entire discipline devoted to this.....OH, WAIT! (Yes, my friends, you are all economists now. You’re welcome.) As an experimentalist, you can optimize your design choices in ways our secondary-data-using colleagues can only dream of (or envy, or curse, depending on their temperament). So let’s dig in, shall we?

::: {.callout-note}
**Note:** Suboptimal design choices won’t necessarily ruin your study’s internal validity, but they will keep you from making the *best* inference possible. And that might cost you that sweet, sweet grant renewal next year.
:::

---

## 1. Optimal Experimental Design: Insights from Econ 101

### 1.1 Our Objective

Thinking back to Econ 101,^[If you were daydreaming in Econ 101, fear not. We’ll keep things simple.] recall that we can pose an optimization problem as maximizing (or minimizing) an objective function subject to constraints. In our case, we’ll use this to set up our experimental design problem, i.e,

> **Objective function:** Statistical power,  
> **Subject to:** Budget constraints.

In other words, we want to choose our design to maximize power subject to our budget (or other) constraints. It turns out that there are loads of things in our control; usually the only things that aren't are feasibility and the budget we have to work with.

**Concept Map**



```{markdown}
#| label: "Concept Map"
#| code-fold: true
#| code-summary: "Expand Code"

flowchart LR
    A(("To calculate optimal sample sizes, consider:")):::redBubble
    A --> B["Desired significance level"]:::greenBubble
    A --> C["Desired statistical power"]:::greenBubble
    A --> D["Minimum detectable effect size"]:::greenBubble
    A --> E["Experimental budget and treatment costs"]:::greenBubble
    A --> F["How the data will be analyzed"]:::greenBubble
    A --> G["Available pre-treatment covariates"]:::greenBubble
    
    F --> H["Unit of assignment"]:::greenOval
    F --> I["Number of distinct outcomes of interest"]:::greenOval
    
    H --> J(["Design with clustered assignment"]):::blueBlock
    I --> K(["Design for multiple hypothesis testing"]):::blueBlock

    classDef redBubble fill:#fbbbbb,stroke:#900,stroke-width:1px,color:#000,margin:8px
    classDef greenBubble fill:#c7ecc7,stroke:#080,stroke-width:1px,color:#000,margin:8px
    classDef greenOval fill:#c7ecc7,stroke:#080,stroke-width:2px,stroke-dasharray:3,margin:8px
    classDef blueBlock fill:#c1e1f9,stroke:#048,stroke-width:1px,color:#000,margin:8px
```



### 1.2 A Simple Setup

To make this concrete, imagine you’ve received a research grant (yay!) or have a wealthy aunt who’s willing to bankroll your next foray into experimental design. You have two arms in your study:

1. A **control** group (no intervention).  
2. A **treatment** group (some fancy new health intervention).

And you have a single, continuous outcome measure, say:  
$$
\text{Health and Happiness Index}
$$

Now, your big question: **How many participants do you need?** How do you split that precious sample between treatment and control?  

### 1.3 The Big Three Elements

When computing your required sample size (or deciding the “optimal” split), there are three main ingredients:

1. **Significance level** ($\alpha$): The probability of a false positive (rejecting the null when it’s actually true).  
2. **Minimum Detectable Effect (MDE)**: The smallest true effect size you want to be able to detect with high probability.  
3. **Power** ($1 - \beta$): The probability of detecting a true effect (i.e., rejecting the null when it’s false).

> **Important:** The MDE is *not* the effect you *expect* to see. It’s the smallest effect you *care* to rule in or rule out. People often mix these up, leading to underpowered studies, heartbreak, and wasted coffee budgets.

---

## 2. The Variance of the Average Treatment Effect (ATE)

Before we get into fancy cost functions, we need the formula for the variance of the ATE estimator. Let’s do a quick recap:

\[
\widehat{\text{ATE}} = \overline{Y}_1 - \overline{Y}_0,
\]

where \(\overline{Y}_1\) and \(\overline{Y}_0\) are sample means in the treatment and control groups, respectively. The variance depends on:

- The underlying outcome variance(s).  
- The total number of units \(N\).  
- The distribution of treatment indicators \(D\).  

If we let \(p\) be the fraction of participants assigned to treatment, then \(\mathrm{Var}(D) = p(1 - p)\). The simpler the design, the easier the formula.
