# Power

This notebook contains code to assess power and determine the optimal $n\_\{variable\}$ in an experiment. 

There are **two** ways of doing this:
1. [Simulation based approach](#simulation-based-approach)\
See the [description](#description-of-simulation-based-approach) below.

2. [Numerically estimation power](#numerical-power-estimation)\
See the [description](#description-of-numerical-power-estimation) below.

## Recommendations
1. Please ensure that you have install `wiscs`. 
2. Read the descriptions in the markdown cells carefully.

## Imports

In [None]:
# if you wish to run the simulation approach, please install the following packages
!pip install git+https://github.com/w-decker/wiscs.git --quiet # REQUIRED FOR THIS NOTEBOOK
!pip install git+https://github.com/w-decker/rinterface.git --quiet # REQUIRED FOR THIS NOTEBOOK

In [None]:
# if you wish to install `mixedpower`, run this cell _after_ activating the wiscs-stats conda environment
!pip install git+https://github.com/w-decker/mixedpower.git

In [3]:
import mixedpower as mp # type: ignore

import rinterface.rinterface as R
from src.power import grid
import numpy as np

import wiscs
from wiscs.utils import make_tasks
from wiscs.simulate import DataGenerator

from tqdm import tqdm

## Description of simulation based approach

A simulation based approach to calculating $n\_\{variable\}$ for an experiment is simple, but it involves a few steps.

1. First, a combination, $C$, of all possible values of parameters is generated (See sample slice below).

|   | n_items | n_participants | 
|---|---------|----------------|
| 0 | 10      | 10             | 
| 1 | 10      | 20             | 
| 2 | 10      | 30             | 
| 3 | 10      | 40             |
| 4 | 10      | 50             | 
| 5 | 20      | 10             | 
| 6 | 20      | 20             | 
| 7 | 20      | 30             | 
| 8 | 20      | 40             | 
| 9 | 20      | 50             | 

2. Then for $k$ iterations, a dataset is randomly generated for each row in $C$.

3. Two linear modeld are run with the generated data. The first being something like `Y ~ FE1 + FE 2 + (1 + FE2 | subject)` and the second `Y ~ FE1 + FE + (FE1 : FE2) + (1 + FE2 | subject)`. If the AIC of model one is lower or there is no statistical difference between the two models, then a counter, $T$ (which is initialized with $0$) is updated with $+1$.

7. Desired power is reached when $\frac{\text{sum}(T)}{k} = \text{desired power}$. 

See [here](https://cran.r-project.org/web/packages/SimEngine/vignettes/example_1.html#:~:text=The%20basic%20idea%20is%20that,this%20is%20your%20estimated%20power.) for more details.

## Let's generate some data

See [generate_data.ipynb](/notebooks/generate_data.ipynb) for details regarding how data are generated.

In [7]:
task = make_tasks(200, 300, 5)
params = {
    'word.perceptual': 100,
    'image.perceptual': 90,

    'word.conceptual': 100,
    'image.conceptual': 130,

    'word.task': task,
    'image.task': task,

    'sd.item': None, 
    'sd.question': None, 
    'sd.subject':30,
    "sd.error": 50, 

    'n.subject': 0,
    'n.question': 5,
    'n.item': 0
}

wiscs.set_params(params, verbose=False)

Below are some variables

In [8]:
n_iter = 1000 # how many interations to do?
desired_power = 0.8

In [9]:
# If you wish to define a possible range of n_{variable}, modify + run this cell

#################################################################
n_subjects_range = np.arange(10, 200, 10) # range of subjects to test
n_items_range = np.arange(10, 50, 5) # range of items to test
#################################################################

combinations = grid(subjects=n_subjects_range, items=n_items_range) # get all combinatinos

## Simulation based approach



`rinterface.rinterface()`, a (somewhat clumsy but convienient) way to interface with R in Python is how we will run linear models. Below is is the R code we will run

In [14]:
# code for model eval in R
code = """
suppressMessages(library(lme4))

# load data
df <- as.data.frame(read.csv("../data/power_data.csv"))

# model
control <- lmerControl(check.conv.singular = "ignore")
shared <- suppressWarnings(lmer(rt ~ modality + question + (1 + question | subject) , data = df, REML = FALSE, control = control)) # nolint
separate <- suppressWarnings(lmer(rt ~ modality * question + (1 + question |subject), data = df, REML = FALSE, control = control)) # nolint

# compare
aicvalues <- c("Shared" = AIC(shared), "Separate" = AIC(separate))

# @grab{str}
winner <- names(aicvalues)[which.min(aicvalues)]
"""

Below is the loop

In [None]:
iter = tqdm(combinations) # instantiate iter obj

results = {} # for storing results 

for j, (sub, items) in enumerate(iter):
    success = []
    for i in range(n_iter):
        
        DG = DataGenerator()
        DG.fit_transform(params={'n.subject': int(sub), 'n.item': int(items)}) # udpate subject and item values 
        DG.to_pandas().to_csv(f"../data/power_data.csv", index=False) # save data to csv

        winner = R(code, grab=True) # see which model wins
        success.append(1) if winner == 'Shared' else success.append(0)

        power = np.sum(success) / n_iter # calculate power
        iter.set_postfix({"Power": power, "Count": i, "# Winners": np.sum(success), "# Losers": success.count(0)}) # update tqdm

    if power >= desired_power:
        results[(sub, items)] = power
        break

  0%|          | 0/152 [00:27<?, ?it/s, Power=0.044, Count=49, # Winners=44, # Losers=6]

## Description of numerical power estimation

Here, $n\_\{variable\}$ is solved numerically. This makes use of [`mixedpower`](https://github.com/w-decker/mixedpower) a python library for estimating power in mixed effects models. In short, this library is simply a Python conversion of [Jake Westfall's code](https://github.com/jake-westfall/two_factor_power/tree/master). 

In short, power is estimatied using a noncentrality parameter, $t$ and various variance components to estimate degrees of freedom and in turn, calculate power.

You can also solve for $n\_\{variable\}$. This is done by determing which value of $n\_\{variable\}$ minimized the squared error between desired power and the empirical power using that value for $n\_\{variable\}$. 

## Numerical power estimation

In [35]:
# to solve for n_participants, modify + run this cell 

###############################
args = dict(
    p=0.8, # desired power
    cohens_d = 0.5,
    resid=0.3,
    target_intercept=0.2,
    participant_intercept=0.2,
    participant_x_target = 0.1,
    target_slope=0.1,
    participant_slope=0.1,
    n_targets = 30,
    code=1, 
    alpha=0.05,
)
###############################

###############################################################
n_participants, _ = mp.solve(variable='n_participants', **args) # <- CHANGE THIS AS NEEDED | variable='n_targets'
print(f'Number of participants: {n_participants}')
###############################################################

Number of participants: 25
