Welcome to Aegis (Active Evaluator Germane Interactive Selector) package!
Full documentation of aegis, including the rendered API docs from sphinx, are pre-built in the docs/_build directory. The main file is docs/_build/index.html and can be opened locally.
See CHANGELOG.md for version release notes.
A Quick Start is provided below, as well as instructions for how to run the testing, code formatting checks, and documentation rendering.
Please contact aegis@nist.gov with questions, comments, feedback, or issues.
Contributors to this code repository:
- Peter Fontana (NIST)
- Jesse Zhang (NIST)
- Craig Greenberg (NIST)
- Hung-Kung Liu (NIST)
Here is a quick start to get up and running with the aegis package. Right now, aegis is a python package with three subcomponents:
acteval: the active evaluation interactive scriptoracle: the implementation of an oracle that provides acteval the annotations
The quick start uses pre-generated data in the data/test directory with a few small files to
show a full execution of the Controller.
The code is pip-installable as the python package aegis. To install it, run in this directory
pip install .or to upgrade your installation with the package, run.
pip install -U .It is important to use pip install . and not pip install aegis because there
is an aegis package available
on pypi, resulting in a collision of names.
Call AEGIS in python with
import aegisor importing its subfolders or specific classes as desired. The example below imports the file so that the controller can be called.
import aegis.acteval.controllerTo check the installation, we do an end-to-end run with a very simple example with 10,000 trials.
This example is the test data directory data/test/sae_test_1. It has 10,000 trials, where
5000 of those trials have key 0, and 5000 of those trials have key 1. There are two systems. The
first system was generated by producing a score as a sample from a random normal distribution
whose mean is the key value and the standard deviation of 0.3 and uses a threshold of 0.5.
The second system is a no information system that returns 0 for every value.
This example is examined in more detail in the various testing modules, as it is one of the
examples used in the test module.
First, import the various aegis modules in a python environment with
import aegis.acteval.data_processor
import aegis.acteval.strata
import aegis.acteval.metrics
import aegis.acteval.samplers
import aegis.oracle.oracle
import aegis.acteval.controller
import aegis.acteval.experimentThen run the code below in a python interpreter. If necessary, update the paths specified by
input_dir and key_fpath
# Aegis modules are imported in previous code block
import numpy as np
desired_seed = 5314
np.random.seed(seed=desired_seed)
rng = np.random.RandomState(desired_seed)
input_dir = "data/test/sae_test_1"
key_fpath = "data/test/sae_test_1/key.csv"
my_data_processor = aegis.acteval.data_processor.DataProcessor()
my_ordering = ["s1", "s2"]
init_fpath, trial_data_fpath, system_fpaths, threshold_fpaths = \
my_data_processor.extract_files_from_directory(input_dir, my_ordering)
num_strata = 4
strata_type = aegis.acteval.strata.StrataMultiSystemIntersectDecision
my_metric = aegis.acteval.metrics.BinaryAccuracyMetric()
num_success_rounds_required = 3
num_step_samples = 100
my_alpha = 0.10
my_delta = 0.20
my_oracle = aegis.oracle.oracle.OracleScript(key_fpath)
my_experiment = aegis.acteval.experiment. \
ExperimentParams(num_step_samples=num_step_samples,
alpha=my_alpha, delta=my_delta,
num_success_rounds_required=num_success_rounds_required,
num_strata=num_strata, stratification_type=strata_type,
metric_object=my_metric,
sampler_type=aegis.acteval.samplers.AdaptiveTrialSampler,
bin_style="equal")
my_controller = aegis.acteval.controller.Controller()
my_report = my_controller.run(init_fpath, trial_data_fpath,
system_fpaths, threshold_fpaths,
my_oracle, my_experiment, rng=rng)
print("Experiment Complete! My report:")
print(my_report)
system_accuracy_list = my_oracle.get_actual_score_all_systems(system_fpaths, threshold_fpaths,
my_metric)
print("Actual System Scores on full key:")
print(system_accuracy_list) The alpha and delta parameters are quite high so that the code terminates successfully after only a few rounds, which will take a few seconds to few minutes on a single desktop computer. We feed all of the parameters to the Experiment object, but every argument has a default value that can be used for convenience.
When complete, we print the summary report. The current summary report obtained with v2019.04.22
of aegis is:
Experimental Parameters:
100 samples per round with 3 successful rounds required, alpha=0.1, delta=0.2.
Takes 4 strata with stratification type <class 'aegis.acteval.strata.StrataMultiSystemIntersectDecision'> using bin style equal.
Uses metric object <aegis.acteval.metrics.BinaryAccuracyMetric object at 0x1242ea710>.
Uses sampler type <class 'aegis.acteval.samplers.AdaptiveTrialSampler'>.
Did request initial samples for initial coverage. Requested 200 samples requested.
Total number of rounds: 3, requiring a total sample of 500 trials.
Requested 128 additional initial trials, In addition to the 72 trials provided by init_df.
Ended with 2 non-empty stratum.System s1
Score: 0.9579614895557186 +/- 0.017769905184605195
Score Variance (standard error squared): 7.944221230895364e-05
Number of counted sampled trials: 500 out of 10000 countable trials.
System s2
Score: 0.5018832022423086 +/- 0.017769905184605195
Score Variance (standard error squared): 7.944221230895372e-05
Number of counted sampled trials: 500 out of 10000 countable trials.
And the output of the actual system accuracies from the Oracle
Actual System Scores on full key:
[0.9521, 0.499]
This example is the test data directory data/test/sae_test_1.
It has 10,000 trials, where 5000 of those trials have key 0, and 5000 of those trials have key 1. There are two systems. The first system was generated by producing a score as a sample from a random normal distribution whose mean is the key value and the standard deviation of 0.3 and uses a threshold of 0.5. The second system is a no information system that returns 0 for every value.
The stratification specified stratifies by the first system only, and uses only the first system's scores as the benchmark for continuation or termination. The system ordering is specified so that the primary system is 's1', which is the system that uses random normal samples to provide scores.
The metric is accuracy for a binary classifier, and the metric thresholds according to the (different) thresholds for each system.
We leverage the pre-built Oracle that if provided the entire key, handles the interaction for us with the acteval module to allow for an easily automated procedure.
This section describes how to run experiments with aegis and points to examples within the code for the case when we have the ground truth. This is useful to simulate our software both as a check that the estimates are at the specified confidence levels and to see how many fewer labels are needed to evaluate systems.
Sample experiment script files are in the experiment_scripts subdirectory or in subfolders.
The main method to run experiments is the aegis.oracle.OracleScript.run_experiment() static method.
This method requires experiment parameters, and paths (relative or absolute) to the
data directory, the key, and the output directory to be provided. Additionally, there is a
requirement for a batch_id and a run_id. The batch_id allows experiments to be grouped
together for further additional analysis. run_id should be unique to each run in the same batch,
meaning that every experiment should be uniquely identified by (batch_id, run_id).
An example experiment is in experiment_scripts/sample/sae_test_1_README_experiment.py The file can be run
purely with
cd experiment_scripts/sample
python sae_test_1_README_experiment.pyIt runs without parallelization three iterations of the Installation check above as well as a similar run involving only the single-system "s1".
If one wishes to profile the experiment to see how long it takes, one can do so using the cprofile tool and then visualize it with the snakeviz tool, after installing those optional packages, run
cd experiment_scripts/sample
python -m cProfile -o sae_test_1_README.prof sae_test_1_README_experiment.pyto profile the code and output the profiler results to the command-specified file
sae_test_1_README.prof, and
cd experiment_scripts/sample
snakeviz sae_test_1_README.profto visualize the results in a web browser.
Each submission is specified as a filepath to the folder containing the submission.
The required files are:
- <system_id>_outputs.csv. A two-column csv of (trial_id, score) where each row is a trial labeled by id and that system's belief value in the score column. There is one file per system, labelled with the system_id as <system_id>. Each system must have a score for every trial_id, meaning that these files must be complete. A discrepancy in the trials for different systems will cause errors.
- <system_id>_thresholds.csv. For Accuracy, Precision, Recall, and any other metrics that require a decision, it is a one-row csv with the threshold value, headed with the intended metric name. The header is ignored when reading in thresholds and can be any string. There is one file per system, labelled with the system_id as <system_id>.
- key.csv. A two column csv of (trial_id, key) that is a full file of (trial_id, key),
where each trial specified by id has the key. Keys can be integers, or strings. When calling the
metric, the key values will be specified as the
key_valuesparameter in the construction as the[low_key_value, high_key_value]where thelow_key_valueis interpreted as the negative class and thehigh_key_value is interpreted as the positive class.Every trial, labelled by id, must be in this file and likewise every trial must be in each <system_id>_output.csv file.
The optional files are:
- init.csv. A two column csv of (trial_id, key) that is a full or partial file of (trial_id, key),
where each trial specified by id has the key. Keys can be integers, or strings. When calling the
metric, the key values will be specified as the
key_valuesparameter in the construction as the[low_key_value, high_key_value]where thelow_key_valueis interpreted as the negative class and thehigh_key_value is interpreted as the positive class.Thekey.csvis required when using theaegis.oracle.oracle.OracleScriptclass, which requires a complete key file. It is optional for other classes.
See data/test/sae_test_1 as the example submission that we have been running experiments on.
The experiments are grouped by <batch_id> in a folder, and there is a <batch_id>.log logfile
in the root directory. Within the <batch_id> folder is one folder for each run, labeled by the
<run_id>. Within each run currently is four files:
- experimental_perams.txt. A text file with the copy of all experimental parameters.
- git_commit_hash.txt. A text file providing the git commit hash of the git commit used to run the experiments, providing a version of the code according to the git repository.
- summary_of_results.csv A summary csv file where each row is a different experimental setting of sampling strategies and stratification strategies, and each value is aggregate results that include the average number of samples taken and the percentage of time the actual estimate was contained within both the confidence interval and within the specified uncertainty range delta. (These are sometimes not the same because the confidence interval when finished is sometimes smaller than the specified data)
- individual_run_results.csv One row per run. Each run is identified by the iteration number and all of the experiment parameters. Results include the number of samples taken, whether the actual score of all systems were contained within the specified confidence intervals, and whether the actual score of all systems were contained within the specified uncertainty range delta. (These are sometimes not the same because the confidence interval when finished is sometimes smaller than the specified data)
See readme_experiment_outputs as the example outputs from the experiments we ran. These
experiments ran 3 iterations per setting to illustrate
The code in this repo is meant to run experiments.
- First, datasets are generated, with their ground truth.
- Second, we use the
aegis.oracle.oracle.OracleScriptthat takes in the key as well as the data directory and feeds the relevant samples toaegis.actevalwhen the controller calls. - We specify the experimental parameters, which are the non-optional and optional arguments
to the
aegis.controller.Controller.run()method. - We store the outputs.
Parameters are specified both by specifying arguments and by providing different implementations
of key abstract classes. The two abstract classes to implement are aegis.acteval.strata.Strata
and aegis.acteval.metrics.Metric. Although there are different samplers available, the desired
sampler types have already been implemented and the desired sampler need only be chosen.
Random sampling can be imitated by setting num_strata = 1
Class parameters include:
- alpha
- delta
- num_strata
All stratification strategies support multiple systems. Even the stratification class
aegis.acteval.strata.StrataFirstSystem and aegis.acteval.strata.StrataFirstSystemDecision
will sample and evaluate multiple systems. These stratification classes stratify according
to the first system in the system_ordering parameter but then sample and evaluate all systems
specified in the system_ordering.
To evaluate only some systems of a submission, specify the system_ordering list to contain
only the system id's of the desired systems to evaluate. Any system whose id is not in that
parameter will not be evaluated during the run.
The Continuous Integration (CI) runs the test suite, generates rendered API documentation, and also checks the code for formatting using a lint code tool. These components can all be run locally, and instructions are below.
We have a test suite with the pytest package and code coverage with coverage. This requires the package coverage and pytest, both of which can be installed with pip.
The following command runs all of the unit tests and outputs code coverage into htmlcov/index.html
coverage run --branch --source=./aegis -m pytest -s tests/ -v
coverage report -m
coverage htmlThe CI uses flake8 to check for the code formatting with the command
flake8 aegis tests --max-line-length=100 --exclude=docs,./.*To build the documentation with sphinx and autodoc, run
pip install -U -e .
sphinx-apidoc -fMeT -o docs/api aegis
sphinx-build -av --color -b html docs docs/_buildto generate the docs. The first command is needed for sphinx to recognize the aegis module.
See the Sphinx Installation Documentation
for more information on how to install Sphinx. You will also need the m2r package which is a requirement
of this package
The license is documented in the LICENSE file and on the NIST website.
Certain commercial entities, equipment, or materials may be identified in this document in order to describe an experimental procedure or concept adequately. Such identification is not intended to imply recommendation or endorsement by the National Institute of Standards and Technology, nor is it intended to imply that the entities, materials, or equipment mentioned are necessarily the best available for the purpose. All copyrights and trademarks are properties of their respective owners.