## Setup

Install dependencies and ensure that Python version >= 3.10. Restart the Kernel if needed.

In [None]:
%pip install requirements

In [None]:
import sys
import os

import pandas as pd
import json
from PIL import Image
import matplotlib.pyplot as plt
import matplotlib.image as mpimg
import ipywidgets as widgets

sys.path.append('..')
import codex
import src.output as output
import tutorial_materials.build.tutorial_functions as tutorials

with open('tutorial_materials/input/demo_input-performance_by_interaction_rp.json') as f:
    codex_input = json.load(f)
    output_path, strengths = codex.input_parser.define_experiment_variables(codex_input)

## Performance by Interaction
Performance by interaction simultaneously computes combinatorial coverage over a training set while aggregating the per-sample performance of samples possessing an interaction for each interaction in the data for each level of t specified.

This experiment is to identify which interactions in the test set have samples that the model best performs on.

Consider this example abstract dataset containing feature columns "A", "B", "C", "D" as well as a unique sample ID column and label column(s). Multiple strengths can be provided in one input file to obtain a set of $SDCC_t$ outputs as well as $CC_t$ outputs. 

#### Input file

The input file defines all the necessary components for the experiment configuration.

In [None]:
with open('tutorial_materials/input/demo_input-dataset_split_evaluation.json') as f:
    codex_input = json.load(f)
    output_path, strengths = codex.input_parser.define_experiment_variables(codex_input)

display(codex_input)

With the dataset and a binning file, a universe that describes the input space with all of its features and levels can be defined.

In [None]:
dataset_full = pd.read_csv(os.path.join('tutorial_materials', 'datasets_tabular', 'abstract_native.csv')).drop('Unnamed: 0', axis=1)
display("DATASET", dataset_full.head())

with open(os.path.join('tutorial_materials', 'binning', 'bins-abstract.txt')) as f:
    print("BINNING SCHEME:")
    print(f.read())

print("\nSTRENGTHS:")
print("t =",strengths)


### Universe
With the dataset and a binning file, a universe that describes the input space with all of its features and levels can be defined.

In [None]:
universe, dataset, features = codex.codex_universe.define_input_space(codex_input)
display(universe)

### Split and Performance

Each sample of data from a dataset that a trained machine learning model operates on not only takes values for its features, but values for its interactions between features as well. Testing performance for combinations of features can hopefully shed light on how machine learning models perform beyond the surface level feature value pair. This requires per-sample performance, as each sample has a unique set of feature-value pairs for t-way strengths. Observe the format of the performance file with Per Sample Performance:

In [None]:
# Perf GEN
split_dict, performance_dict, metric = codex.input_parser.extract_sp(codex_input)

display(performance_dict)
#output.output_json_readable(performance_dict, write_json=True, 
#                            file_path=os.path.join('tutorial_materials', 'performance', 'performance_0{}_ps.json'.format(0)))   

### Experiment Run

With a dataset, values of t, a means of universe definition, and split and performance files, performance by interaction can be run. Note: Performance by interaction requires Per-Sample-Performance in the correct formatting.

In [None]:
results_full = codex.run(codex_input, subset='')

### Experiment Results

In [None]:
interaction_info = codex.output.results_parser.consolidated_interaction_info(results_full, strengths, metric, 'ascending descending', 5)
#output.output_json_readable(interaction_info, print_json=True)
print("Top performing interactions of training set, t=2:", interaction_info[3]['top interactions'])
print("Bottom performing interactions of training set, t=2:", interaction_info[3]['bottom interactions'])

### Experiment Results: Visualizations

1) Performance and proportion frequency heatmaps

The performance-based nature of this experiment calls the inclusion of new combinatorial plots.

In [None]:
# locate plot

2) Sorted high/low performing interactions.

From the results of performance by interaction, we can identify and visualize the differences of top and bottom performing interactions for each strength t.

In [None]:
img = Image.open(os.path.join(output_path, 'pxi_performing-t3.png'))
display(img)

3) Performance vs standardized counts pooportions

Performance by interaction also includes an analysis of how interactions perform against their counts, or number of appearences across the samples in the training set. Counts are preserved in the results. The analysis included in this mode examines whether more frequent interactions in data perform better when training on the model.

In [None]:
print("COUNTS, t=3:", results_full['3']['combination counts'])
img = Image.open(os.path.join(output_path, 'pxi_performance_vs_freq-2.png'))
display(img)

## B) Model Probing

Due to the handy per-interaction basis of performance by interaction and describing performance based in the combinatorial space, this can be used for an additional experiment, the model probe. For red-teaming applications...

In [None]:
img = Image.open(os.path.join(output_path, 'dataset_split_comparison_2.png'))
display(img)

### Example on RarePlanes

Dataset evaluation can be run on any dataset as long a tabular dataset, features to compute over are provided. This can be demonstrated on the RarePlanes dataset. 

Rareplanes is an open source dataset consisting of real and synthetic images. Concerning the real dataset, 253 satellite images exist as 8,525 tiles. Provided tabular metadata is included, and this is a case in which CODEX can be used to explore a dataset that itself is not tabular in its original form.

Below is one such tile, along with its associated metadata. Both the original sample and its metadata are effectively tied.


In [None]:
rareplanes_df = pd.read_csv(os.path.join('tutorial_materials', 'datasets_tabular', 'RarePlanes_Metadata_Augmented_Processed_localized-tiled-controlpt.csv')).drop(['Unnamed: 0', 'Unnamed: 0.1'], axis=1)
rareplanes_df.index = rareplanes_df.image_tile_id

img = Image.open(os.path.join('..', 'assets', '98_104001000F15D300_tile_177.png'))

display(rareplanes_df.loc['98_104001000F15D300_tile_177'])
display(rareplanes_df[rareplanes_df.index == '98_104001000F15D300_tile_177'])
display(img)

display(rareplanes_df.head())


In [None]:
codex_input['dataset_name'] = 'RarePlanes'
codex_input['dataset_file'] = 'RarePlanes_Metadata_Augmented_Processed_localized-tiled-controlpt.csv'
codex_input['features'] = ['Hour_of_Day', 'Season', 'avg_sun_elevation_angle', 'off_nadir_max', 'avg_pan_resolution', 'biome', 'CONTROL']
codex_input['sample_id_column'] = 'image_tile_id'
codex_input['bin_file'] = "tutorial_materials/binning/bins-rareplanes_complete.txt"
results_rareplanes = codex.run(codex_input)
#output.output_json_readable(codex_input, print_json=True)



Frequency overage map for t=2 on Rareplanes is shown below.

In [None]:
img = Image.open(os.path.join(output_path, 'CC', 'CC_t2_RarePlanes_frequency_all.png'))
display(img)

Dataset evaluation can be a useful mode to diagnose a dataset's overall coverage, and what interactions are present or deficient that would be a model's potential input space.