# The Unappealingly Titled Project H2O

- R version (see folder for Python code)
- work in progress

## Objective

We are primarily interested in participats' ability to discriminate between tap and potted water of some brand on average. In the second place, we are interested in assessing individuals' ability to identify the variety of the sample conditional on their confidence in their own prediction.

## Methods

The sampling process will span up to two months owing to constraints. The aim is to collect the largest sample attainable in up to five rounds per participant. See Sample collection below.

The analysis will consist of four parts. We shall assess (i) the collective ability in each round using the relative risk and odds ratios based on (a) frequentist and (b) Bayesian estimates, (ii) the collective ability across rounds using generalised estimating equations, and (iii) the individual ability across rounds using a modified Brier score.

### Frequentist estimates

The frequentist estimates are simple proportions of counts $n_{k1}$ in the marginal group counts $n_k$.

|           | success (guessed potted water) $(g = 1)$ | failure (guessed tap water) $(g = 2)$ | sample margin |
| --------- | ------- | ------- | ----- |
| **treatment (sampled potted water)** $(s = 1)$ | $n_{11}$     | $n_{12}$     | $n_{s_p}=n_{11}+n_{12}$ |
| **control (sampled tap water)** $(s = 2)$ | $n_{21}$     | $n_{22}$     | $n_{s_t}=n_{21}+n_{22}$ |
| **guess margin** | $n_{g_p}=n_{11}+n_{21}$   | $n_{g_t}=n_{12}+n_{22}$   | $N=n_{11}+n_{12}+n_{21}+n_{22}$   |

Then, estimators $\hat{p}_1 = \frac{1}{n_{s_p}} \sum_{i=1}^{n_{s_p}} 1_{\{g_i = 1\}} = \frac{n_{11}}{n_{s_p}}$ and $\hat{p}_2 = \frac{1}{n_{s_t}} \sum_{i=1}^{n_{s_t}} 1_{\{g_i = 1\}} = \frac{n_{21}}{n_{s_t}}$ solve the Bernoulli likelihood $\mathcal{L}(p_k;y_k)=\prod_{i=1}^{n_k}p_k^{y_{ki}}(1-p_k)^{1-y_{ki}}=p_k^{n_{k1}}(1-p_k)^{n_k-n_{k1}}$ where $n_{k1}$ is the count of successes in $n_{k}$ trials in sample group $k \in \{1,2\}=\{s_p,s_t\}$ and individual outcomes $y_{ki}\in \{0,1\}=\{t,p\}$.

#### Risk and odds ratios

| measure | estimator | null |
|---------|-----------|------|
| risk ratio | $\rho=\frac{p_1}{p_2}$ | $H_0:\rho=\rho_0=1$ |
| odds ratio | $\theta=\frac{p_1/(1-p_1)}{p_2/(1-p_2)}$ | $H_0:\theta=\theta_0=1$ |

##### Risk ratio confidence intervals

$$
\hat \rho = \frac{\hat p_1}{\hat p_2}, \qquad
\log \hat \rho = \log(\hat p_1) - \log(\hat p_2).
$$

$$
\widehat{\text{SE}}(\log \hat \rho)
= \sqrt{\frac{1 - \hat p_1}{\hat p_1 n_{s_p}} + \frac{1 - \hat p_2}{\hat p_2 n_{s_t}}}=\sqrt{\frac{1}{n_{11}} - \frac{1}{n_{11}+n_{12}} + \frac{1}{n_{21}} - \frac{1}{n_{21}+n_{22}}}
$$

$$
\rho \in \exp{\left( \ln{\hat{\rho}}\pm Z_{1-\alpha/2}\sqrt{\widehat{\text{SE}}(\ln{\hat{\rho}})} \right)}
$$

##### Odds ratio confidende intervals

$$
\hat \theta = \frac{n_{11} n_{22}}{n_{21} n_{12}}, \quad
\log \hat \theta = \log n_{11} + \log n_{22} - \log n_{12} - \log n_{21},
$$

$$
\widehat{\text{SE}}(\ln{\hat{\theta}})
= \sqrt{\frac{1}{n_{11}}+\frac{1}{n_{12}}+\frac{1}{n_{21}}+\frac{1}{n_{22}}}.
$$

Note that we commonly add $0.5$ to each $n_{kj}$ to avoid division by zero.

$$
\theta \in \exp{\left( \ln{ \hat \theta \pm Z_{1-\alpha/2}\ \widehat{\mathrm{SE}}(\ln{\hat{\theta}}) } \right)}
$$

### Bayesian estimates

We shall use a standard beta-binomial setup with conjugate priors. This is generally summarised with the continuous form of the Bayes rule like so,

$$
f(\theta \mid y) = \frac{\prod_{i=1}^{n_k} g(y \mid \theta) f(\theta)}{\int g(y \mid \theta) f(\theta) \, d\theta}
$$

We set up the model like so, 

- $y_k \mid p_k \sim \text{binomial}(n_k, p_k), \text{for } k=1,2$ (likelihood)
- $p_k \sim \text{Beta}(\alpha, \beta)$, choose non-informative $p_k \sim \text{Beta}(\frac{1}{2}, \frac{1}{2})$ (Jeffreys prior) 
- $p_k \mid y_k \sim \text{Beta}(y_k+\alpha,n_k-y_k+\beta) = \text{Beta}(\frac{1}{2}+y_k,\frac{1}{2}+n_k-y_k)$ (implied posterior)

Hence the posterior,

$$
f(p_k \mid y_k) \propto \mathcal{L}(p_k;y_k) f(p_k) = \left(p_k^{y_k}(1-p_k)^{n_k-y_k}\right)\left(p_k^{\frac{1}{2}-1}(1-p_k)^{\frac{1}{2}-1}\right) = p_k^{y_k-\frac{1}{2}}(1-p_k)^{n_k-y_k-\frac{1}{2}}
$$

We shall run 16e4 simulations with an additional 4e4 burnin in six chains, and we are interested in the median and the 95% credible interval. We use the estimates to compute the risk and odds ratios and the respective credible intervals.

### Generalised estimating equations

Generalised estimating equations approximate the (marginal) population-average effect for our clustered sample of $i=1,...,M$ individuals across $j=1,...,5$ rounds. The GEE algorithm internally adjusts for correlated data.

The results are interpreted as a binomial logit model like so,

$$
\text{logit}\left(\pi_{ij}\right)=\ln{\left(\frac{\pi_{ij}}{1-\pi_{ij}}\right)}=\beta_0+\beta_1 x_{ij}
$$

Where

- $\pi_{ij}=P(Y_{ij}=1)=\mathbb{E}\left[Y_{ij} \mid X_{ij} \right]=\text{logit}^{-1}(X_{ij}'\beta)$ is the probability that participant $i$ gets predicts potted water ($g=1$) in round $j$
- $\frac{\pi_{ij}}{1-\pi_{ij}}$ is the odds of predicting potted water ($g=1$)
- $\beta_0$ is the log-odds when $x_{ij} = 0$ (intercept)
- $\beta_1$ is the effect of $x_{ij} = 1$ on the log-odds

GEE solve the following gradient for $\beta$,

$$
U(\beta)=\sum_{i=1}^N D_i'V_i^{-1}(y_i-\mu_i)=0
$$

Where $D_i=\frac{\partial \pi_i}{\partial \beta'}=A_i X_i$ with $A_i = \text{diag}\left(v(\mu_{it})\right)$ and the working covariance,

$$
V_i = \phi A_i^{1/2},R(\alpha),A_i^{1/2},\quad
A_i=\text{diag}\,\left(\pi_{ij}(1-\pi_{ij})\right)
$$

Where $R(\alpha_i)=\left[\begin{matrix} 1 & \alpha & \alpha & \dots & \alpha \\ \alpha & 1 & \alpha & \dots & \alpha \\
\vdots & \vdots & \ddots & \vdots & \vdots \\ \alpha & \alpha & \dots & 1\end{matrix}\right]$ is the exchangeable working correlation structure. No closed form exists. The algorithm solves the system iteratively.

#### Retrieve probabilities

$$
\pi_0=\text{logit}^{-1}(\beta_0)=\frac{1}{1+e^{-\beta_0}}=1-\frac{1}{1+e^{\beta_0}}=\text{plogis}(\beta_0)
$$

$$
\pi_1=\text{logit}^{-1}(\beta_0 + \beta_1)=\frac{1}{1+e^{-(\beta_0+\beta_1)}}=1-\frac{1}{1+e^{\beta_0+\beta_1}}=\text{plogis}(\beta_0 + \beta_1)
$$

#### Relative risk

$$
\hat{\rho}=\frac{\hat{\pi}_1}{\hat{\pi}_0}
$$

$$
\ln{(\hat{\rho})}=\ln{\left(\frac{\pi_1(\beta)}{\pi_0(\beta)}\right)}=\ln{\left(\frac{\text{logit}^{-1}(\beta_0+\beta_1)}{\text{logit}^{-1}(\beta_0)}\right)}
$$

With confidence intervals,

$$
\rho \in \exp{\left(\ln{\hat{\rho}}\pm Z_{1-\alpha/2}\,\widehat{\text{SE}}(\ln{\hat{\rho}})\right)}, \quad
\text{Var}\left(\ln{\hat{\rho}}\right)\approx g'\widehat{\text{Var}}_{\text{gp}}(\hat{\beta})g, \quad
g=\left[\begin{matrix}\frac{\partial\ln{\hat{\rho}}}{\partial\beta_0} \\ \frac{\partial\ln{\hat{\rho}}}{\partial\beta_1}\end{matrix}\right]=\left[\begin{matrix}\hat{\pi}_0-\hat{\pi}_1 \\ 1-\hat{\pi}_1\end{matrix}\right]
$$

#### Odds ratio

$$
\hat{\theta}=\frac{\hat{\pi}_1/1-\hat{\pi}_1}{\hat{\pi}_0/1-\hat{\pi}_0}
$$

With confidence intervals,

$$
\theta \in \exp{\left(\ln{\hat{\theta}}\pm Z_{1-\alpha/2}\,\text{SE}(\ln{\hat{\beta}})\right)}, \quad \text{SE}(\ln{\hat{\beta}})=\text{SE}(\ln{\hat{\theta}})
$$

### Brier score

The typical Brier score for a set of clustered outcomes and predictions for $j=1,...,J$ measures the accuracy of the participant's predictions as follows,

$$
\text{BS}=\frac{1}{n}\sum_{j=1}^J (P_j - 1_{\{X_j = 1\}})^2
$$

Where $P_j \in [0,1]$ is a measure of the participant's confidence in the outcome $X_j \in \{0,1\}$. Particularly $P_j \to 1$ implies higher confidence in $X_j = 1$ and $P_j \to 0$ implies a lower confidence in $X_j = 1$. Lower $BS$ imply a higher accuracy. See Brier (1950).

This can be reconceived as a measure of confidence like so,

$$
\text{BS}=\frac{1}{n}\sum_{j=1}^J (Z_j - 1_{\{S_j = 1\}})^2, \quad
S_j =
\begin{cases}
1, & X_j = Y_j,\\
0, & X_j \neq Y_j
\end{cases}
$$

Where $X_j \in {\{0,1\}}$ denotes the outcome, $Y_j \in {\{0,1\}}$ denotes the categorical prediction, $S_j \in {\{0,1\}}$ is the correctnes indicator and $Z_j \in [0,1]$ is the confidence level indicator. The participant then predicts the outcome $Y_j$ with a level of confidence $Z_j$ on a continuous scale from 0% (not at all confident) to 100% (absolutely certain). Once more a lower $BS$ implies a higher accuracy. See Juslin (1994) for a similar setup.

## Sample collection

The samples will be collected in $k=1,...,5$ rounds per each individual $i$ in $n$ participants. In each round $k$, each participant $i$ will be randomly assigned to group $T$ (tap water) or $P$ (potted water). This assignment will be carried out by a draw from the Bernoulli distribution with $p=\frac{1}{2}$. An ordered set of 100 values $0$ ($T$) and $1$ ($P$) will be generated every Monday and each participant will be allotted $0$ or $1$ on a first come first serve basis.

Each round will use a different brand of potted water - Evian, Volvic, Highland Spring, Buxton, Waitrose Essentials - and unfiltered tap water drawn from the same tap. Both will be cooled to the like temperature overnight at most. Samples will be poured from reusable vessels and tasted from disposable cups.

For each participant, we record (I) an identifier (first name), (II) the water sample tasted ($0$ or $1$), (III) the guess ($T$ or $P$), and (IV) level of confidence ([$0,100%$] from not at all confident to absolutely certain).

The participant with the most accurate score across all five rounds wins six bottles of the potted water that has done best against the tap water. In case of more than one winner, a draw from the uniform distribution decides.

The sample collection is expected to conclude by the end of November 2025. The results ought to follow by mid-December.

Caveats: Logistic difficulties preclude the application of a double blind test.

## Reference material

Agresti, A. (2019) *An Introduction to Categorical Data Analysis*, 3rd edn. Wiley.

Agresti, A. (2002) *Categorical Data Analysis*, 2nd edn. Wiley.

Brier, G.W. (1950) 'Verification of Forecasts Expressed in Terms of Probability'. *Monthly Weather Review*, Volume 78, Number 1. https://web.archive.org/web/20171023012737/https://docs.lib.noaa.gov/rescue/mwr/078/mwr-078-01-0001.pdf

Ford, C. (2023) 'Getting Started with Generalized Estimating Equations'. University of Virginia Library. https://library.virginia.edu/data/articles/getting-started-with-generalized-estimating-equations

Goldstein-Greenwood, J. (2021) 'A Brief on Brier Scores'. University of Virginia Library. https://library.virginia.edu/data/articles/a-brief-on-brier-scores

Halekoh, U., Højsgaard, S. and Yan, J. (2006) 'The R Package geepack for Generalized Estimating Equations'. *Journal of Statistical Software', January 2006, Volume 15, Issue 2. DOI:10.18637/jss.v015.i02

Hardin, J.W. and Hilbe, J.M. (2013) *Generalised Estimating Equations*, 2nd edn. CRC Press.

Hoessly, L. (2025) 'On misconceptions about the Brier score in binary prediction models'. arXiv. https://arxiv.org/html/2504.04906v3

Hoff, P.D. (2009) *A First Course in Bayesian Statistical Methods*. Springer.

Højsgaard, S., Halekoh, U., Yan, J., Ekstrøm, C.T. (2025) 'geepack: Generalized Estimating Equation Package'. https://cran.r-project.org/web/packages/geepack/index.html

Juslin, P. (1994) 'The Overconfidence Phenomenon as a Consequence of Informal Experimenter-Guided Selection of Almanac Items'. *Organizational Behavior and Human Decision Processes* 57, 226-246.

Marin, J.M. and Robert, C. (2014) *Bayesian Essentials with R*, 2nd edn. Springer.

Matsuura, K. (2022) *Bayesian Statistical Modeling with Stan, R, and Python*. Springer.

Robert, D. (2020) 'Five Confidence Intervals for Proportions That You Should Know About'. Towards Data Science. https://towardsdatascience.com/five-confidence-intervals-for-proportions-that-you-should-know-about-7ff5484c024f/

Stan Development Team (2024) 'Documentation'. https://mc-stan.org/docs/

## Machinery check

In [1]:
# Libraries
library(tidyverse)
library(readxl)

set.seed(42)

── [1mAttaching core tidyverse packages[22m ──────────────────────────────────────────────────────────────── tidyverse 2.0.0 ──
[32m✔[39m [34mdplyr    [39m 1.1.4     [32m✔[39m [34mreadr    [39m 2.1.5
[32m✔[39m [34mforcats  [39m 1.0.0     [32m✔[39m [34mstringr  [39m 1.5.1
[32m✔[39m [34mggplot2  [39m 3.5.2     [32m✔[39m [34mtibble   [39m 3.3.0
[32m✔[39m [34mlubridate[39m 1.9.4     [32m✔[39m [34mtidyr    [39m 1.3.1
[32m✔[39m [34mpurrr    [39m 1.1.0     
── [1mConflicts[22m ────────────────────────────────────────────────────────────────────────────────── tidyverse_conflicts() ──
[31m✖[39m [34mdplyr[39m::[32mfilter()[39m masks [34mstats[39m::filter()
[31m✖[39m [34mdplyr[39m::[32mlag()[39m    masks [34mstats[39m::lag()
[36mℹ[39m Use the conflicted package ([3m[34m<http://conflicted.r-lib.org/>[39m[23m) to force all conflicts to become errors


### Data structure and mock data

In [2]:
# Construct a data frame
rounds <- 5

df <- tibble(
    participant_id = character()
)

for (i in seq(rounds)) {
    col1 <- paste0("round", i, "_sample")
    col2 <- paste0("round", i, "_prediction")
    col3 <- paste0("round", i, "_confidence")

    df <- df |>
        mutate(
            !!col1 := integer(),
            !!col2 := integer(),
            !!col3 := numeric()
        )
}

df |> colnames() |> print()


 [1] "participant_id"    "round1_sample"     "round1_prediction"
 [4] "round1_confidence" "round2_sample"     "round2_prediction"
 [7] "round2_confidence" "round3_sample"     "round3_prediction"
[10] "round3_confidence" "round4_sample"     "round4_prediction"
[13] "round4_confidence" "round5_sample"     "round5_prediction"
[16] "round5_confidence"


In [3]:
# Generate mock data
    # Load up some names
url <- r"(https://www.ons.gov.uk/file?uri=/peoplepopulationandcommunity/birthsdeathsandmarriages/livebirths/datasets/babynamesenglandandwalestop100babynameshistoricaldata/1904to2024/historicalnames2024.xlsx)"
download.file(url, "historicalnames2024.xlsx", mode = "wb")

data_g <- read_excel("historicalnames2024.xlsx", sheet = "Table_1", skip = 3)
data_b <- read_excel("historicalnames2024.xlsx", sheet = "Table_2", skip = 3)

df_names <- data_g |>
    janitor::clean_names() |>
    add_row(
        data_b |> 
        janitor::clean_names()
    ) |> 
    select(-rank) |>
    pivot_longer(
        cols = everything(), 
        names_to = "var", 
        values_to = "val"
    ) |>
    filter(!(is.na(val)))

    # Construct a mock df
df_cols <- colnames(df)
no_samples <- 30 # Sample size

mock_df <- tibble(
    # Random ids (names)
    !!df_cols[1] := sample(df_names$val, no_samples, replace = TRUE)
)

for (i in seq(2, length(df_cols), 3)) {
    # Water sample ~ Bern(0.5) with 0 (tap), 1 (potted)
    mock_df[df_cols[i]] <- sample(c(0,1), no_samples, replace = TRUE)
    # Participant guess: 0 (tap), 1 (potted)
    mock_df[df_cols[i + 1]] <- sample(c(0,1), no_samples, replace = TRUE)
    # Participant confidence ~ U(0,1)
    mock_df[df_cols[i + 2]] <- round(runif(no_samples, 0, 1), 2)
}

mock_df |> head(n = 10)

participant_id,round1_sample,round1_prediction,round1_confidence,round2_sample,round2_prediction,round2_confidence,round3_sample,round3_prediction,round3_confidence,round4_sample,round4_prediction,round4_confidence,round5_sample,round5_prediction,round5_confidence
<chr>,<dbl>,<dbl>,<dbl>,<dbl>,<dbl>,<dbl>,<dbl>,<dbl>,<dbl>,<dbl>,<dbl>,<dbl>,<dbl>,<dbl>,<dbl>
Victor,0,0,0.0,1,0,0.73,0,1,0.03,0,1,0.83,0,0,0.86
Patricia,1,1,0.61,0,1,0.89,0,0,0.87,0,0,0.58,1,0,0.35
George,0,0,0.84,0,0,0.52,1,0,0.73,0,0,0.47,0,0,0.0
Stanley,1,1,0.75,1,1,0.85,1,0,0.32,0,1,0.37,0,1,0.91
Eileen,1,0,0.45,0,1,0.44,0,1,0.39,1,1,0.28,1,0,0.95
Lucas,1,0,0.54,0,0,0.16,0,1,0.33,0,1,0.6,1,0,0.49
Katie,1,1,0.54,0,1,0.44,0,0,0.09,1,0,0.82,0,1,0.46
Alec,0,1,0.0,1,0,0.97,0,1,0.76,0,0,0.1,0,1,0.6
Arthur,1,0,0.36,1,1,0.48,1,0,0.6,1,0,0.96,1,0,0.91
Hannah,0,0,0.61,0,1,0.25,1,1,0.15,1,1,0.17,0,0,0.17


In [4]:
# Plug in the data
data_df <- mock_df

#### Contingency tables

In [5]:
# Round dfs
df_list <- list()
df_counter <- 0

for (i in seq(2, length(df_cols), 3)) {
    df_counter <- df_counter + 1

    samp <- df_cols[i]
    pred <- df_cols[i + 1]
    df_name <- paste0("round", df_counter)
    
    df_list[[df_name]] <- tibble(
        guess_t = c(
            # True negatives
            sum((data_df[[samp]] == 0) & (data_df[[pred]] == 0)),
            # False negatives
            sum((data_df[[samp]] == 1) & (data_df[[pred]] == 0))
        ),
        guess_p = c(
            # False positives
            sum((data_df[[samp]] == 0) & (data_df[[pred]] == 1)),
            # True positives
            sum((data_df[[samp]] == 1) & (data_df[[pred]] == 1))
        )
    ) |>
    add_column(sample = c("sample_0", "sample_1"), .before = 1)

    df_list[[df_name]] |> print()
}

[90m# A tibble: 2 × 3[39m
  sample   guess_t guess_p
  [3m[90m<chr>[39m[23m      [3m[90m<int>[39m[23m   [3m[90m<int>[39m[23m
[90m1[39m sample_0       6       6
[90m2[39m sample_1       7      11
[90m# A tibble: 2 × 3[39m
  sample   guess_t guess_p
  [3m[90m<chr>[39m[23m      [3m[90m<int>[39m[23m   [3m[90m<int>[39m[23m
[90m1[39m sample_0       7      10
[90m2[39m sample_1       2      11
[90m# A tibble: 2 × 3[39m
  sample   guess_t guess_p
  [3m[90m<chr>[39m[23m      [3m[90m<int>[39m[23m   [3m[90m<int>[39m[23m
[90m1[39m sample_0      11       7
[90m2[39m sample_1       7       5
[90m# A tibble: 2 × 3[39m
  sample   guess_t guess_p
  [3m[90m<chr>[39m[23m      [3m[90m<int>[39m[23m   [3m[90m<int>[39m[23m
[90m1[39m sample_0       8       7
[90m2[39m sample_1       9       6
[90m# A tibble: 2 × 3[39m
  sample   guess_t guess_p
  [3m[90m<chr>[39m[23m      [3m[90m<int>[39m[23m   [3m[90m<int>[39m[23m
[90m1[3

### Frequentist estimates

In [6]:
# Relative risk and odds ratio

    # Two-tailed z-score
p <- 0.05
z <- qnorm(1 - p / 2)

significance_levels <- list(
    "10pc" = "*",
    "5pc" = "**",
    "1pc" = "***"
)

    # Construct a data frame
ab_results <- tibble(
    round = character(),
    relative_risk = numeric(),
    rr_ci_lower = numeric(),
    rr_ci_upper = numeric(),
    rr_significance = character(),
    odds_ratio = numeric(),
    or_ci_lower = numeric(),
    or_ci_upper = numeric(),
    or_significance = character()
)

    # Compute the stats
for (i in seq_along(df_list)) {
    df_name <- names(df_list)[i]
    df_i <- df_list[[i]][,2:3]
    
    # Relative risk
    rho_hat <- (df_i[[1,1]] / sum(df_i[1,])) / (df_i[[2,1]] / sum(df_i[2,]))
    rr <- round(rho_hat, 4)
    # RR confidence interval
    rr_se <- sqrt((1 / df_i[[1,1]]) - (1 / sum(df_i[1,])) + (1 / df_i[[2,1]]) - (1 / sum(df_i[2,])))
    rr_ci <- c(
        round(exp(log(rho_hat) - z * rr_se), 4),
        round(exp(log(rho_hat) + z * rr_se), 4)
    )
    # RR significance
    rr_s <- ifelse((rr_ci[1] > 1) | (rr_ci[2] < 1), significance_levels[[paste0(p * 100, "pc")]], NA_character_)
    
    # Odds ratio
    theta_hat <- (df_i[[1,1]] * df_i[[2,2]]) / (df_i[[2,1]] * df_i[[1,2]])
    or <- round(theta_hat, 4)
    # OR confidence interval
    or_se <- sqrt(sum(sapply(as.vector(df_i), function(x) (1 / x))))
    or_ci <- c(
        round(exp(log(theta_hat) - z * or_se), 4),
        round(exp(log(theta_hat) + z * or_se), 4)
    )
    # OR significance
    or_s <- ifelse((or_ci[1] > 1) | (or_ci[2] < 1), significance_levels[[paste0(p * 100, "pc")]], NA_character_)
                             
    # Add to results
    ab_results <- ab_results |> 
        add_row(
            round = df_name, 
            relative_risk = rr, 
            rr_ci_lower = rr_ci[1],
            rr_ci_upper = rr_ci[2],
            rr_significance = rr_s,
            odds_ratio = or, 
            or_ci_lower = or_ci[1],
            or_ci_upper = or_ci[2],
            or_significance = or_s
        )
}

ab_results |> head()

round,relative_risk,rr_ci_lower,rr_ci_upper,rr_significance,odds_ratio,or_ci_lower,or_ci_upper,or_significance
<chr>,<dbl>,<dbl>,<dbl>,<chr>,<dbl>,<dbl>,<dbl>,<chr>
round1,1.2857,0.5722,2.8891,,1.5714,0.3592,6.8754,
round2,2.6765,0.6628,10.8073,,3.85,0.643,23.0515,
round3,1.0476,0.5728,1.916,,1.1224,0.2534,4.972,
round4,0.8889,0.4742,1.6662,,0.7619,0.1791,3.2408,
round5,1.1624,0.6247,2.1628,,1.4222,0.3276,6.1742,


Discuss the findings here.

### Estimating equations model

In [7]:
# Estimating equations

gee_df <- data_df |>
    select(-ends_with("confidence")) |>
    pivot_longer(
        cols = -participant_id,
        names_to = c("round", ".value"),
        names_pattern = "^round(\\d+)_(.+)$"
    ) |>
    mutate(
        participant_id = as.factor(participant_id),
        round = as.factor(round)
    )

gee_df |> head()

participant_id,round,sample,prediction
<fct>,<fct>,<dbl>,<dbl>
Victor,1,0,0
Victor,2,1,0
Victor,3,0,1
Victor,4,0,1
Victor,5,0,0
Patricia,1,1,1


In [8]:
library(geepack)

gee_model <- geeglm(
    prediction ~ sample,
    id = participant_id, 
    family = binomial, 
    corstr = "exchangeable", 
    data = gee_df
)

# GEE results
gee_results <- summary(gee_model)
gee_results |> print()

"package 'geepack' was built under R version 4.5.2"



Call:
geeglm(formula = prediction ~ sample, family = binomial, data = gee_df, 
    id = participant_id, corstr = "exchangeable")

 Coefficients:
            Estimate Std.err  Wald Pr(>|W|)
(Intercept)  -0.1362  0.2203 0.382    0.536
sample        0.3262  0.3312 0.970    0.325

Correlation structure = exchangeable 
Estimated Scale Parameters:

            Estimate Std.err
(Intercept)        1 0.01507
  Link = identity 

Estimated Correlation Parameters:
      Estimate Std.err
alpha  -0.1078 0.04458
Number of clusters:   30  Maximum cluster size: 5 


In [9]:
# Estimates (betas)
gee_estimates <- gee_results$geese$mean$estimate

In [10]:
# Baseline probability (s = 0)
beta_0 <- gee_estimates[[1]]
pi_0 <- 1 / (1 + exp(-beta_0)) # Or plogis(beta_0)

In [11]:
# Probability (s = 1)
beta_1 <- gee_estimates[[2]]
pi_1 <- 1 / (1 + exp(-(beta_0 + beta_1))) # Or plogis(beta_0 + beta_1)

In [12]:
# Population-averaged risk ratio
rr_hat <- pi_1 / pi_0

In [13]:
# RR CIs
Vb <- vcov(gee_model) # geepack vcov
g  <- c(pi_0 - pi_1, 1 - pi_1) # Gradient at fitted values

gee_rr_ses <- sqrt(as.numeric(t(g) %*% Vb %*% g))

gee_rr_cis <- c(
  exp(log(rr_hat) - z * gee_rr_ses),
  exp(log(rr_hat) + z * gee_rr_ses)
)

In [14]:
# Population-averaged odds ratio
or_hat <- exp(beta_1)

isTRUE(all.equal(
  or_hat,
  (pi_1/(1 - pi_1)) / (pi_0/(1 - pi_0)),
  tolerance = 1e-8
))

In [15]:
# OR CIs
gee_or_se <- gee_results$geese$mean$san.se[[2]]

gee_or_cis <- c(
    exp(log(or_hat) - (z * gee_or_se)),
    exp(log(or_hat) + (z * gee_or_se))
)

In [16]:
# Working correlation matrix
gee_wcm <- gee_results$corr

In [17]:
# WCM CIs
gee_wcm_cis <- c(
    gee_wcm$Estimate - (z * gee_wcm$Std.err),
    gee_wcm$Estimate + (z * gee_wcm$Std.err)
)

In [18]:
# Construct a df with GEE results
gee_res_df <- tribble(
    ~metric, ~value, ~significance,
    "beta_0", beta_0, NA,
    "pi_0", pi_0, NA,
    "beta_1", beta_1, NA,
    "pi_1", pi_1, NA,
    "risk_ratio", round(rr_hat, 3), ifelse((gee_rr_cis[[1]] > 1) | (gee_rr_cis[[2]] < 1), "**", "-"),
    "rr_ci_lo", round(gee_rr_cis[[1]], 3), NA,
    "rr_ci_hi", round(gee_rr_cis[[2]], 3), NA,
    "odds_ratio", round(or_hat, 3), ifelse((gee_or_cis[[1]] > 1) | (gee_or_cis[[2]] < 1), "**", "-"),
    "or_ci_lo", round(gee_or_cis[[1]], 3), NA,
    "or_ci_hi", round(gee_or_cis[[2]], 3), NA,
    "wcm_alpha", gee_wcm$Estimate, ifelse((gee_wcm_cis[[1]] / gee_wcm_cis[[2]]) < p, "**", "-")
    # "wcm_ci_lo", round(gee_wcm_cis[[1]], 3), NA,
    # "wcm_ci_hi", round(gee_wcm_cis[[2]], 3), NA
)

gee_res_df |> t() |> head(n=nrow(gee_res_df))

0,1,2,3,4,5,6,7,8,9,10,11
metric,beta_0,pi_0,beta_1,pi_1,risk_ratio,rr_ci_lo,rr_ci_hi,odds_ratio,or_ci_lo,or_ci_hi,wcm_alpha
value,-0.1362,0.4660,0.3262,0.5473,1.1750,0.8500,1.6230,1.3860,0.7240,2.6520,-0.1078
significance,,,,,-,,,-,,,-


Discuss the findings here.

### Brier score

In [19]:
# Brier score
bs_results <- data_df |>
    mutate(
        across(
            .cols = ends_with("sample"),
            .fns = ~ (as.integer(.x == data_df[[str_replace(cur_column(), "sample$", "prediction")]]) - data_df[[str_replace(cur_column(), "sample$", "confidence")]])^2,
            .names = "{str_replace(.col, 'sample$', 'brier')}"
        )
    ) |>
    rowwise() |>
    mutate(
        brier_score = mean(c_across(ends_with("brier")), na.rm = TRUE)
    ) |>
    ungroup() |>
    select(participant_id, brier_score) |>
    arrange(brier_score)

head(bs_results, n=10)

participant_id,brier_score
<chr>,<dbl>
Anna,0.1647
Deborah,0.1811
Mabel,0.2031
William,0.2236
Stanley,0.2305
James,0.2486
Patricia,0.252
Charlene,0.2782
Lily,0.2899
David,0.3008


Discuss the findings here.

### Bayesian estimates

In [20]:
# Bayesian estimates
    # Libraries
library(cmdstanr)
library(posterior)

    # Stan script
stan_script <- "
data {
    int<lower=1> K; // {treatment, control}
    array[K] int<lower=0> n; // Trials per group
    array[K] int<lower=0> y; // Successes per group
}
parameters {
    vector<lower=0, upper=1>[K] p;
}
model {
    // Jeffreys prior
    p ~ beta(0.5, 0.5);

    // Binomial likelihood
    y ~ binomial(n, p);
}
generated quantities {
    real risk_ratio = p[1] / p[2];
    real odds_ratio = (p[1] / (1 - p[1])) / (p[2] / (1 - p[2]));
}
"

    # Write the Stan model into a file
stan_file <- write_stan_file(stan_script)
    # Compile the model
mod <- cmdstan_model(stan_file)

    # df
bayes_df <- tibble(
    round = character(),
    p_1 = numeric(),
    p_2 = numeric(),
    risk_ratio = numeric(),
    rr_lower = numeric(),
    rr_upper = numeric(),
    rr_significance = character(),
    odds_ratio = numeric(),
    or_lower = numeric(),
    or_upper = numeric(),
    or_significance = character()
)

for (i in seq_along(df_list)) {
    # Contingency table
    name_i <- names(df_list)[i]
    contingency_table_i <- df_list[[name_i]][,2:3]
    ct <- as.matrix(contingency_table_i)
  
    # Stan data
    stan_data <- list(
        K = 2, # {treatment, control}
        n = as.integer(c(sum(ct[1, ]), sum(ct[2, ]))), # Trials per group
        y = as.integer(c(ct[1, 1], ct[2, 1])) # Successes per group
    )
  
    # Sample
    fit <- mod$sample(
        data = stan_data,
        chains = 6, 
        parallel_chains = 6,
        iter_warmup = 4e2L, # 4e3L
        iter_sampling = 16e2L, # 16e3L
        seed = 42,
        refresh = 0, # Suppresses progress updates
        show_messages = FALSE # Suppresses Stan messages
    )

    draws <- fit$draws(c("p", "risk_ratio", "odds_ratio"))

    # Sampling diagnostics
    fit_summary <- fit$summary()
    # fit_diagnostics <- fit$cmdstan_diagnose() # NUTS diagnostics
    # fit_raw <- fit$save_output_files() # Raw chain files

    cat("\n#### Summary and diagnostics for ", name_i, ": \n")
    print(fit_summary)
    # print(fit_diagnostics)

    # Posterior summaries
    draws_mat <- as_draws_matrix(fit$draws(c("p[1]","p[2]","risk_ratio","odds_ratio")))
    rr_q <- quantile(draws_mat[, "risk_ratio"], probs = c(0.025, 0.5, 0.975))
    or_q <- quantile(draws_mat[, "odds_ratio"], probs = c(0.025, 0.5, 0.975))
  
    results_i <- tibble(
        round = name_i,
        p_1 = median(draws_mat[, "p[1]"]),
        p_2 = median(draws_mat[, "p[2]"]),
        risk_ratio = median(draws_mat[, "risk_ratio"]),
        rr_lower = rr_q[1], rr_upper = rr_q[3],
        rr_significance = ifelse((rr_q[[1]] > 1) | (rr_q[[3]] < 1), "**", NA_character_),
        odds_ratio = median(draws_mat[, "odds_ratio"]),
        or_lower = or_q[1], or_upper = or_q[3],
        or_significance = ifelse((or_q[[1]] > 1) | (or_q[[3]] < 1), "**", NA_character_)
    )
  
    bayes_df <- bayes_df |>
        add_row(results_i)
}

cat("\n#### Risk and odds ratios:\n")
bayes_df |> head()

This is cmdstanr version 0.9.0

- CmdStanR documentation and vignettes: mc-stan.org/cmdstanr

- CmdStan path: C:/Users/rosec/.cmdstan/cmdstan-2.37.0

- CmdStan version: 2.37.0

This is posterior version 1.6.1


Attaching package: 'posterior'


The following objects are masked from 'package:stats':

    mad, sd, var


The following objects are masked from 'package:base':

    %in%, match





#### Summary and diagnostics for  round1 : 
[90m# A tibble: 5 × 10[39m
  variable      mean  median    sd   mad      q5     q95  rhat ess_bulk ess_tail
  [3m[90m<chr>[39m[23m        [3m[90m<dbl>[39m[23m   [3m[90m<dbl>[39m[23m [3m[90m<dbl>[39m[23m [3m[90m<dbl>[39m[23m   [3m[90m<dbl>[39m[23m   [3m[90m<dbl>[39m[23m [3m[90m<dbl>[39m[23m    [3m[90m<dbl>[39m[23m    [3m[90m<dbl>[39m[23m
[90m1[39m lp__       -[31m22[39m[31m.[39m[31m8[39m   -[31m22[39m[31m.[39m[31m5[39m   1.03  0.724 -[31m24[39m[31m.[39m[31m8[39m   -[31m21[39m[31m.[39m[31m8[39m   1.00     [4m3[24m818.    [4m5[24m370.
[90m2[39m p[1]         0.500   0.501 0.133 0.139   0.279   0.721 1.00     [4m7[24m078.    [4m5[24m776.
[90m3[39m p[2]         0.394   0.392 0.108 0.112   0.221   0.580 1.000    [4m8[24m431.    [4m5[24m929.
[90m4[39m risk_ratio   1.38    1.27  0.608 0.485   0.641   2.49  1.000    [4m7[24m547.    [4m6[24m248.
[90m5[39m odds_r

round,p_1,p_2,risk_ratio,rr_lower,rr_upper,rr_significance,odds_ratio,or_lower,or_upper,or_significance
<chr>,<dbl>,<dbl>,<dbl>,<dbl>,<dbl>,<chr>,<dbl>,<dbl>,<dbl>,<chr>
round1,0.5006,0.3916,1.2707,0.5442,2.87,,1.5578,0.3664,6.843,
round2,0.4103,0.1642,2.5096,0.8052,12.385,,3.6176,0.727,24.215,
round3,0.6088,0.5834,1.0436,0.575,2.036,,1.1145,0.2349,4.809,
round4,0.5311,0.5974,0.8931,0.4618,1.664,,0.7651,0.1779,3.123,
round5,0.6114,0.5266,1.1526,0.6051,2.16,,1.4089,0.3355,6.148,


Discuss the findings here.