# Project template

This is the project template for implementing your own experimentalist and test it against a random experimentalist in a closed-loop.

Our closed-loop will consist of three parts:
1. The experiment runner which executes the 2AFC experiment
2. The theorist which is fitted on the conditions and collected observations from the experiment
3. The experimentalist which is taking in any useful information from the sampled conditions, observations, theorist predictions and uncertainty to give us the next best possible condition

The 2AFC experiment will include a baseline noise, indivual differences for each participant and an additional non-uniform noise distribution which is dependent on the given conditions.
The non-uniform noise distribution will be a simple linear function $f_{noise}(ratio, scatteredness)$ augmenting our baseline noise $noise_{baseline}$ according to $noise = noise_{baseline} + f(ratio, scatteredness)$.
This is a plausible assumption given that the parameters ratio and scatteredness are used to generate a random image with blue and orange tiles. Some random images yield faster response times while other images may yield slower response times. This assumption about non-uniform noise adds an interesting and realistic flavour to our experiment making some conditions worth exploring more (high ratio and high scatteredness) and others less (low ratio and low scatteredness).

The theorist will be the ensemble neural network regressor which you know from the model disagreement sampling. Since neural networks don't come automatically with a uncertainty estimation but many experimentalists need some sort of uncertainty measure, we have to approximate this uncertainty through the model disagreement.

The experimentalist will be your very own implementation of the presented algorithm.


## Library imports

In [None]:
import sys, os

import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
from matplotlib import cm
from mpl_toolkits.mplot3d import Axes3D
from typing import List

# Set the path to the project folder
target_folder = os.path.abspath(os.path.join(os.getcwd(), '..'))  # Adjust path as needed
if target_folder not in sys.path:
    sys.path.append(target_folder)

from sklearn.base import BaseEstimator

from autora.experimentalist.random import random_pool
from autora.variable import Variable, VariableCollection

from resources.synthetic import twoafc

## Setting up the synthetic experiment

In [None]:
# Basic experiment parameters
n_participants = 10  # Number of synthetic participants
noise_baseline = 0.  # noise added to the experiment runners observations
individual_difference_level = 2  # the level of individual differences defines the amount of variation in the participant parameters

In [None]:
# Sample parameters for each synthetic participant
parameters = np.random.normal(1, 1, (n_participants, 2))
parameters = (parameters - parameters.min()) / (parameters.max() - parameters.min()) + 1e-6
parameters = parameters * individual_difference_level

# Create the experiment
experiment = twoafc(parameters, noise_level=noise_baseline)

print(f"Created experiment with {n_participants} synthetic participants")
print(f"Noise level: {noise_baseline}")

### Non-uniform noise

The non-uniform noise returns noise values which are dependent on the ratio and scatteredness. You can simply add this non-uniform noise onto your collected observations from the experiment. 

In [None]:
def non_uniform_noise(ratio, scatteredness, max_noise: float = 0.3) -> float:
    if isinstance(ratio, float):
        ratio = np.array(ratio).reshape(1, 1)
        scatteredness = np.array(scatteredness).reshape(1, 1)
        
    noise_level = (ratio + scatteredness) / 2 * max_noise
    
    non_uniform_noise = np.array([np.random.randn(0, noise_level) for _ in range(ratio.shape[0])])
    return non_uniform_noise

## Setting up the theorist

We are going to use two instances of our theorist.

One is fitted on the samples gathered by the random sampling experimentalist and the second is fitted on samples gathered by our own experimentalist.

That way we can compare the effectiveness of the experimentalists against each other.

In [None]:
from resources.regressors import FFN, FFNRegressor

def setup_theorist(n_models: int, n_participants: int) -> List[BaseEstimator]:

    theorist = []
    for _ in range(n_models):
        theorist.append(
            FFNRegressor(
                FFN(n_units=n_participants, n_conditions=2),
                max_epochs=100,
                lr=0.1,
                verbose=False
            )
        )
        
    return theorist

n_models = 10
theorist_random = setup_theorist(n_models, n_participants)
theorist_oed = setup_theorist(n_models, n_participants)

## Implementing the experimentalist

In [None]:
def oed_experimentalist(experiment, current_conditions: pd.DataFrame, ensemble: List[BaseEstimator], num_samples=1, pool_size=10000) -> pd.DataFrame:
    
    new_conditions = None
    
    # add your own code here
    
    # Return as DataFrame
    ivs = experiment.variables.independent_variables
    column_names = [iv.name for iv in ivs]
    new_conditions = pd.DataFrame(new_conditions, columns=column_names)
    
    return new_conditions

## Get the test data

In order to evaluate our approaches we are going to (1) fit the theorists on the conditions and observations given by the two experimentalists and (2) collect predictions on a big set of test data.

The test data will be based on 1000 random conditions collected from our experiment runner without any noise. This is an assumption which holds only in a synthetic scenario because in a real experiment we (1) may wouldn't be able to collect such a big dataset for testing and (2) couldn't get the observations without noise. 

But in the synthetic scenario this assumption helps us to verify the effectiveness of our experimentalist easily. 

In [None]:
from autora.experimentalist.random import random_pool

# collect the random test conditions
test_conditions = random_pool(experiment.variables, num_samples=1000, random_state=42)

# collect the observations without any noise
test_observations = experiment.run(test_conditions, noise_level_run=0)

## Running the closed-loop

Now we can set up the closed-loop and run it.

Please keep in mind that most experimentalists need an initial set of conditions which can be collected e.g. randomly.
Perhaps your algorithm takes already care of that but if not you have to collect them manually.

Keep also in mind to sample as many new conditions with the random experimentalist as you are sampling with your own experimentalist. (recommendation: one sample per cycle)

Further, you can keep track of the sampled conditions by plotting them cycle by cycle alongside the old conditions to see whether your experimentalist gets stuck in some regions.

In [None]:
n_cycles = 20
n_new_samples = 1
max_non_uniform_noise = 0.3

# adjust as needed
conditions_oed = None
conditions_random = None

# collect the mean squared error between the test observations and the test predictions
mse_oed_all = np.zeros(n_cycles)
mse_random_all = np.zeros(n_cycles)

for cycle in range(n_cycles):
    
    # get new conditions
    new_conditions_oed = oed_experimentalist()  # add here the inputs as needed
    conditions_oed = np.concatenate((conditions_oed, new_conditions_oed))
    
    new_conditions_random = random_pool(experiment.variables, num_samples=n_new_samples)
    conditions_random = np.concatenate((conditions_random, new_conditions_random))
    
    # collect data from experiment
    experiment_data_oed = experiment.run(conditions_oed).to_numpy()
    experiment_data_random = experiment.run(conditions_random).to_numpy()
    
    # add non-uniform noise on top
    experiment_data_oed[:, -1] += non_uniform_noise(conditions_oed.values[:, 0], conditions_oed.values[:, 1], max_noise=max_non_uniform_noise)
    experiment_data_random[:, -1] += non_uniform_noise(conditions_random.values[:, 0], conditions_random.values[:, 1], max_noise=max_non_uniform_noise)
    
    # fit your model
    for index_model, model in enumerate(theorist_oed):
        theorist_oed[index_model].fit(experiment_data_oed[:, :-1], experiment_data_oed[:, -1:])
        theorist_random[index_model].fit(experiment_data_random[:, :-1], experiment_data_random[:, -1:])
    
    # get test predictions
    mse_oed = 0
    mse_random = 0
    for index_model, model in enumerate(theorist_oed):
        pred_oed = theorist_oed[0].predict(test_conditions)
        mse_oed += np.mean((test_observations - pred_oed)**2) / len(theorist_oed)
        
        pred_random = theorist_random[0].predict(test_data[:, :-1])
        mse_random += np.mean((test_data[:, -1:]-pred_random)**2) / len(theorist_random)
        
    mse_oed_all[cycle] += mse_oed
    mse_random_all[cycle] += mse_random
    
    # print results
    print(f"Cycle {cycle+1}/{n_cycles}: Disagreement MSE = {np.round(mse_oed, 8)}; Random MSE = {np.round(mse_random, 8)}")
    
plt.plot(mse_random_all, label='random')
plt.plot(mse_oed_all, label='disagreement')
plt.legend()
plt.show()

## Analysis

Your analysis should include how the final MSE looks like when using the random experimentalist and when using your own experimentalist.

In order to account for random effects you should run the entire closed-loop several times to be able to compute averaged MSEs over cycles as well as their standard deviation. 
Plot both within one plot.

Then you can do a qualitative analysis of how well single participants are matched by a single ensemble member of the theorist.

Do your analysis in by altering the following two factors: 
1. max_non_uniform_noise 
2. individual_difference_level

Both factors change the information content of specific areas of our experimental design space and thus make these areas more interesting to sample.

In [None]:
participant_id = 7

experiment.plotter(
    participant_id=participant_id,
    model=theorist_oed[0]
)

experiment.plotter(
    participant_id=participant_id,
    model=theorist_random[0]
)