Skip to content

usnistgov/active-evaluation

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

2 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Aegis Package

Welcome to Aegis (Active Evaluator Germane Interactive Selector) package!

Full documentation of aegis, including the rendered API docs from sphinx, are pre-built in the docs/_build directory. The main file is docs/_build/index.html and can be opened locally.

See CHANGELOG.md for version release notes.

A Quick Start is provided below, as well as instructions for how to run the testing, code formatting checks, and documentation rendering.

Contact

Please contact aegis@nist.gov with questions, comments, feedback, or issues.

Contributors

Contributors to this code repository:

  • Peter Fontana (NIST)
  • Jesse Zhang (NIST)
  • Craig Greenberg (NIST)
  • Hung-Kung Liu (NIST)

Quick Start

Here is a quick start to get up and running with the aegis package. Right now, aegis is a python package with three subcomponents:

  • acteval: the active evaluation interactive script
  • oracle: the implementation of an oracle that provides acteval the annotations

The quick start uses pre-generated data in the data/test directory with a few small files to show a full execution of the Controller.

Installing aegis

The code is pip-installable as the python package aegis. To install it, run in this directory

pip install .

or to upgrade your installation with the package, run.

pip install -U .

It is important to use pip install . and not pip install aegis because there is an aegis package available on pypi, resulting in a collision of names.

Call AEGIS in python with

import aegis

or importing its subfolders or specific classes as desired. The example below imports the file so that the controller can be called.

import aegis.acteval.controller

Installation check

To check the installation, we do an end-to-end run with a very simple example with 10,000 trials. This example is the test data directory data/test/sae_test_1. It has 10,000 trials, where 5000 of those trials have key 0, and 5000 of those trials have key 1. There are two systems. The first system was generated by producing a score as a sample from a random normal distribution whose mean is the key value and the standard deviation of 0.3 and uses a threshold of 0.5. The second system is a no information system that returns 0 for every value. This example is examined in more detail in the various testing modules, as it is one of the examples used in the test module.

First, import the various aegis modules in a python environment with

import aegis.acteval.data_processor
import aegis.acteval.strata
import aegis.acteval.metrics
import aegis.acteval.samplers
import aegis.oracle.oracle
import aegis.acteval.controller
import aegis.acteval.experiment

Then run the code below in a python interpreter. If necessary, update the paths specified by input_dir and key_fpath

# Aegis modules are imported in previous code block
import numpy as np
desired_seed = 5314
np.random.seed(seed=desired_seed)
rng = np.random.RandomState(desired_seed)
input_dir = "data/test/sae_test_1"
key_fpath = "data/test/sae_test_1/key.csv"
my_data_processor = aegis.acteval.data_processor.DataProcessor()
my_ordering = ["s1", "s2"]
init_fpath, trial_data_fpath, system_fpaths, threshold_fpaths = \
    my_data_processor.extract_files_from_directory(input_dir, my_ordering)
num_strata = 4
strata_type = aegis.acteval.strata.StrataMultiSystemIntersectDecision
my_metric = aegis.acteval.metrics.BinaryAccuracyMetric()
num_success_rounds_required = 3
num_step_samples = 100
my_alpha = 0.10
my_delta = 0.20
my_oracle = aegis.oracle.oracle.OracleScript(key_fpath)
my_experiment = aegis.acteval.experiment. \
    ExperimentParams(num_step_samples=num_step_samples,
                     alpha=my_alpha, delta=my_delta,
                     num_success_rounds_required=num_success_rounds_required,
                     num_strata=num_strata, stratification_type=strata_type,
                     metric_object=my_metric,
                     sampler_type=aegis.acteval.samplers.AdaptiveTrialSampler,
                     bin_style="equal")
my_controller = aegis.acteval.controller.Controller()
my_report = my_controller.run(init_fpath, trial_data_fpath,
                              system_fpaths, threshold_fpaths,
                              my_oracle, my_experiment, rng=rng)
print("Experiment Complete! My report:")
print(my_report)
system_accuracy_list = my_oracle.get_actual_score_all_systems(system_fpaths, threshold_fpaths,
                                                              my_metric)
print("Actual System Scores on full key:")
print(system_accuracy_list)                        

The alpha and delta parameters are quite high so that the code terminates successfully after only a few rounds, which will take a few seconds to few minutes on a single desktop computer. We feed all of the parameters to the Experiment object, but every argument has a default value that can be used for convenience.

When complete, we print the summary report. The current summary report obtained with v2019.04.22 of aegis is:

Experimental Parameters:
	100 samples per round with 3 successful rounds required, alpha=0.1, delta=0.2.
	Takes 4 strata with stratification type <class 'aegis.acteval.strata.StrataMultiSystemIntersectDecision'> using bin style equal.
	Uses metric object <aegis.acteval.metrics.BinaryAccuracyMetric object at 0x1242ea710>.
	Uses sampler type <class 'aegis.acteval.samplers.AdaptiveTrialSampler'>.
	Did request initial samples for initial coverage. Requested 200 samples requested.
Total number of rounds: 3, requiring a total sample of 500 trials.
Requested 128 additional initial trials, In addition to the 72 trials provided by init_df.
Ended with 2 non-empty stratum.System s1
	Score: 0.9579614895557186 +/- 0.017769905184605195
	Score Variance (standard error squared): 7.944221230895364e-05
	Number of counted sampled trials: 500 out of 10000 countable trials.
System s2
	Score: 0.5018832022423086 +/- 0.017769905184605195
	Score Variance (standard error squared): 7.944221230895372e-05
	Number of counted sampled trials: 500 out of 10000 countable trials.

And the output of the actual system accuracies from the Oracle

Actual System Scores on full key:
[0.9521, 0.499]

Description of above Example data/test/sae_test_1

This example is the test data directory data/test/sae_test_1.

It has 10,000 trials, where 5000 of those trials have key 0, and 5000 of those trials have key 1. There are two systems. The first system was generated by producing a score as a sample from a random normal distribution whose mean is the key value and the standard deviation of 0.3 and uses a threshold of 0.5. The second system is a no information system that returns 0 for every value.

The stratification specified stratifies by the first system only, and uses only the first system's scores as the benchmark for continuation or termination. The system ordering is specified so that the primary system is 's1', which is the system that uses random normal samples to provide scores.

The metric is accuracy for a binary classifier, and the metric thresholds according to the (different) thresholds for each system.

We leverage the pre-built Oracle that if provided the entire key, handles the interaction for us with the acteval module to allow for an easily automated procedure.

Running Experiments with Aegis when Ground Truth is Known

This section describes how to run experiments with aegis and points to examples within the code for the case when we have the ground truth. This is useful to simulate our software both as a check that the estimates are at the specified confidence levels and to see how many fewer labels are needed to evaluate systems.

Sample experiment script files are in the experiment_scripts subdirectory or in subfolders. The main method to run experiments is the aegis.oracle.OracleScript.run_experiment() static method. This method requires experiment parameters, and paths (relative or absolute) to the data directory, the key, and the output directory to be provided. Additionally, there is a requirement for a batch_id and a run_id. The batch_id allows experiments to be grouped together for further additional analysis. run_id should be unique to each run in the same batch, meaning that every experiment should be uniquely identified by (batch_id, run_id).

An example experiment is in experiment_scripts/sample/sae_test_1_README_experiment.py The file can be run purely with

cd experiment_scripts/sample
python sae_test_1_README_experiment.py

It runs without parallelization three iterations of the Installation check above as well as a similar run involving only the single-system "s1".

If one wishes to profile the experiment to see how long it takes, one can do so using the cprofile tool and then visualize it with the snakeviz tool, after installing those optional packages, run

cd experiment_scripts/sample
python -m cProfile -o sae_test_1_README.prof sae_test_1_README_experiment.py

to profile the code and output the profiler results to the command-specified file sae_test_1_README.prof, and

cd experiment_scripts/sample
snakeviz sae_test_1_README.prof

to visualize the results in a web browser.

Submission Input

Each submission is specified as a filepath to the folder containing the submission.

The required files are:

  • <system_id>_outputs.csv. A two-column csv of (trial_id, score) where each row is a trial labeled by id and that system's belief value in the score column. There is one file per system, labelled with the system_id as <system_id>. Each system must have a score for every trial_id, meaning that these files must be complete. A discrepancy in the trials for different systems will cause errors.
  • <system_id>_thresholds.csv. For Accuracy, Precision, Recall, and any other metrics that require a decision, it is a one-row csv with the threshold value, headed with the intended metric name. The header is ignored when reading in thresholds and can be any string. There is one file per system, labelled with the system_id as <system_id>.
  • key.csv. A two column csv of (trial_id, key) that is a full file of (trial_id, key), where each trial specified by id has the key. Keys can be integers, or strings. When calling the metric, the key values will be specified as the key_values parameter in the construction as the [low_key_value, high_key_value] where the low_key_value is interpreted as the negative class and the high_key_value is interpreted as the positive class. Every trial, labelled by id, must be in this file and likewise every trial must be in each <system_id>_output.csv file.

The optional files are:

  • init.csv. A two column csv of (trial_id, key) that is a full or partial file of (trial_id, key), where each trial specified by id has the key. Keys can be integers, or strings. When calling the metric, the key values will be specified as the key_values parameter in the construction as the [low_key_value, high_key_value] where the low_key_value is interpreted as the negative class and the high_key_value is interpreted as the positive class. The key.csv is required when using the aegis.oracle.oracle.OracleScript class, which requires a complete key file. It is optional for other classes.

See data/test/sae_test_1 as the example submission that we have been running experiments on.

Experiment Output

The experiments are grouped by <batch_id> in a folder, and there is a <batch_id>.log logfile in the root directory. Within the <batch_id> folder is one folder for each run, labeled by the <run_id>. Within each run currently is four files:

  • experimental_perams.txt. A text file with the copy of all experimental parameters.
  • git_commit_hash.txt. A text file providing the git commit hash of the git commit used to run the experiments, providing a version of the code according to the git repository.
  • summary_of_results.csv A summary csv file where each row is a different experimental setting of sampling strategies and stratification strategies, and each value is aggregate results that include the average number of samples taken and the percentage of time the actual estimate was contained within both the confidence interval and within the specified uncertainty range delta. (These are sometimes not the same because the confidence interval when finished is sometimes smaller than the specified data)
  • individual_run_results.csv One row per run. Each run is identified by the iteration number and all of the experiment parameters. Results include the number of samples taken, whether the actual score of all systems were contained within the specified confidence intervals, and whether the actual score of all systems were contained within the specified uncertainty range delta. (These are sometimes not the same because the confidence interval when finished is sometimes smaller than the specified data)

See readme_experiment_outputs as the example outputs from the experiments we ran. These experiments ran 3 iterations per setting to illustrate

Code Approach

The code in this repo is meant to run experiments.

  1. First, datasets are generated, with their ground truth.
  2. Second, we use the aegis.oracle.oracle.OracleScript that takes in the key as well as the data directory and feeds the relevant samples to aegis.acteval when the controller calls.
  3. We specify the experimental parameters, which are the non-optional and optional arguments to the aegis.controller.Controller.run() method.
  4. We store the outputs.

Parameters are specified both by specifying arguments and by providing different implementations of key abstract classes. The two abstract classes to implement are aegis.acteval.strata.Strata and aegis.acteval.metrics.Metric. Although there are different samplers available, the desired sampler types have already been implemented and the desired sampler need only be chosen.

Random sampling can be imitated by setting num_strata = 1

Class parameters include:

  • alpha
  • delta
  • num_strata

Stratification

All stratification strategies support multiple systems. Even the stratification class aegis.acteval.strata.StrataFirstSystem and aegis.acteval.strata.StrataFirstSystemDecision will sample and evaluate multiple systems. These stratification classes stratify according to the first system in the system_ordering parameter but then sample and evaluate all systems specified in the system_ordering.

To evaluate only some systems of a submission, specify the system_ordering list to contain only the system id's of the desired systems to evaluate. Any system whose id is not in that parameter will not be evaluated during the run.

Running Continuous Integration Components Locally

The Continuous Integration (CI) runs the test suite, generates rendered API documentation, and also checks the code for formatting using a lint code tool. These components can all be run locally, and instructions are below.

Testing

We have a test suite with the pytest package and code coverage with coverage. This requires the package coverage and pytest, both of which can be installed with pip.

The following command runs all of the unit tests and outputs code coverage into htmlcov/index.html

coverage run --branch --source=./aegis -m pytest -s tests/ -v
coverage report -m
coverage html

Code Formatting

The CI uses flake8 to check for the code formatting with the command

flake8 aegis tests --max-line-length=100 --exclude=docs,./.*

Documentation

To build the documentation with sphinx and autodoc, run

pip install -U -e .
sphinx-apidoc -fMeT -o docs/api aegis
sphinx-build -av --color -b html docs docs/_build

to generate the docs. The first command is needed for sphinx to recognize the aegis module.

See the Sphinx Installation Documentation for more information on how to install Sphinx. You will also need the m2r package which is a requirement of this package

LICENSE

The license is documented in the LICENSE file and on the NIST website.

Disclaimer

Certain commercial entities, equipment, or materials may be identified in this document in order to describe an experimental procedure or concept adequately. Such identification is not intended to imply recommendation or endorsement by the National Institute of Standards and Technology, nor is it intended to imply that the entities, materials, or equipment mentioned are necessarily the best available for the purpose. All copyrights and trademarks are properties of their respective owners.

About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Packages

No packages published