# Imports

In [1]:
# If you have not installed `wiscs` locally, run this cell
!pip install git+https://github.com/w-decker/wiscs.git --quiet # REQUIRED FOR THIS NOTEBOOK
!pip install git+https://github.com/w-decker/rinterface.git --quiet # REQUIRED FOR THIS NOTEBOOK

In [2]:
# always run this cell
import wiscs
from wiscs.simulate import DataGenerator
from wiscs.utils import make_tasks

import rinterface.rinterface as R

from src.utils import fmt_script

import pandas as pd
import numpy as np 

# Generating data
Data are generated using the [`wiscs`](https://github.com/w-decker/wiscs) framework. For now, arbitrary variance parameters and design choices are set to help "build" the linear model. Once a final model has been chosen, additional power analyses can be run to determine the exact experimental design criteria.

>Please see [generate_data.ipynb](/notebooks/generate_data.ipynb) for information on how to use the `wiscs` module. 

In [3]:
task = make_tasks(low=100, high=200, n=5)
params = {
    'word.perceptual': 100,'image.perceptual': 95,'word.conceptual': 100,'image.conceptual': 100, # latents
          'word.task': task,     'image.task': task,                                              # tasks
            'sd.item': None,    'sd.question': None,   'sd.subject': 20,         "sd.error": 50, 
'sd.subject_question': 15, 'sd.item_question': None,  # noise
          'n.subject': 300 ,      'n.question': 5,          'n.item': 10,                          # design
}
wiscs.set_params(params)
# Generate data
DG = DataGenerator(); DG.fit_transform(seed=2025)
# convert to Pandas DataFrame
df = DG.to_pandas()

Params set successfully




# Evaluate in R with [`rinterface`](https://github.com/w-decker/rinterface)

I have built a small interface between Python and R. Check out the repo [here](https://github.com/w-decker/rinterface). Essentially, it takes in a multiline string containing an R-valid script and runs that as a subprocess using the `Rscript` command. There's a lot more to it than that (check out the repo for examples and additional functionality), but that's the jist. For the sake of brevity, I've condensed the script we will be regularly running inside a function [`src.fmt_script()`](/notebooks/src/utils.py) (see line 144). This will keep the notebook cleaner. A few things to note on `fmt_script()`:

### What's inside `fmt_script()`?
1. Imports necessary packages, including `lme4` and `lmerTest`. 
2. Factorizes categorical variables (this is hardcoded in because we already know what variables are which)
3. Establishes treatment codes for categorical variables. You can print the script to see more details.
4. Runs the models and prints the summary of each
5. Runs `anova` on the two models

### What do _you_ need to give `fmt_script()`?
1. The R formulas for the shared and separate model as a string
2. The pandas dataframe containing the data

### What if you want to add some more code?
You can optionally specify a list of strings containing new lines of code. Each element in your list will be a new line. These are then added right before the model is run: 

```python
fmt_script(shared_f=..., separate_f=..., df=df, add=['cat("hello world")'])
```

In [4]:
script = fmt_script(shared_f="rt ~ modality + question + (1 + question | subject)", 
                    separate_f="rt ~ modality * question + (1 + question | subject)",
                    df=df)
R(script)

Linear mixed model fit by maximum likelihood . t-tests use Satterthwaite's
  method [lmerModLmerTest]
Formula: rt ~ modality + question + (1 + question | subject)
   Data: df

      AIC       BIC    logLik  deviance  df.resid 
 322073.8  322256.6 -161014.9  322029.8     29978 

Scaled residuals: 
    Min      1Q  Median      3Q     Max 
-3.8379 -0.6611 -0.0035  0.6698  3.6442 

Random effects:
 Groups   Name        Variance Std.Dev. Corr                   
 subject  (Intercept)  639.1   25.28                           
          question1    464.7   21.56    -0.49                  
          question2    425.5   20.63    -0.53  0.49            
          question3    521.3   22.83    -0.52  0.47  0.55      
          question4    446.9   21.14    -0.52  0.58  0.52  0.57
 Residual             2509.7   50.10                           
Number of obs: 30000, groups:  subject, 300

Fixed effects:
               Estimate Std. Error        df t value Pr(>|t|)    
(Intercept)   2.942e+02  1.62

# What does the rest of this notebook look like?

Much of this notebook is evaluating different values specified in the data generation process and different formula's to specify a linear model. The point is to define a conceptually sound model that maximally aligns with the hypotheses laid out in the project. As such, the two previous codeblocks will be repeatedly run (in new cells) below. So much of what's written below is similar to you've already seen. All that's changed are values given to the data generator and the linear models.