This repository contains code sufficient to generate the datasets and
experimental setup used in our AAAI 2018 paper, but is not sufficient to
reproduce the experimental results (see
Repository vs. paper code section
Hay N, Stark M, Schlegel A, Wendelken C, Park D, Purdy E, Silver T, Phoenix DS, George D (2018) Behavior is Everything – Towards Representing Concepts with Sensorimotor Contingencies. In Proceedings of the Thirty-Second AAAI Conference on Artificial Intelligence, New Orleans, Louisiana. (PDF, Supplemental information)
Create and activate the virtual environment:
make venv source venv/bin/activate
Make sure you have
ffmpeginstalled if you plan to run the full experiment pipeline via
Demobelow). On Ubuntu:
sudo apt install ffmpeg
or on Mac:
brew install ffmpeg
Test pixelworld in interactive mode. This demo provides samples of the classification environments used in the actual experiment, in which the agent is embodied as a red pixel that moves using the "UP", "DOWN", "LEFT", and "RIGHT" actions. The agent must investigate the environment in order to determine which of two classes is represented (for instance, whether a container or a non-container is present), and then terminate the episode with a signal action ("SIG0" or "SIG1") to indicate its choice. Note that, while the entire environment is visible in interactive mode (on the left), the actual experiments provided only a 3 x 3 window around the agent as its observation (on the right).
python pixelworld/demo.py demo=interactive
Run the experiment to train and test agents in a pixelworld environment, using RLLab for reinforcement learning. You can use either the example code:
python pixelworld/demo.py demo=experiment
or our more general experimentation framework (see
Repository vs. paper codesection below):
python pixelworld/run.py experiments/demo
If you run with the general framework, output from the experiments can be found within
Experiment outputsection below). Note that this is meant only to illustrate how to use PixelWorld environments for RL. The experimental setup is not tuned for performance.
Interact with and run an experiment with a custom dataset:
python pixelworld/demo.py demo=custom_interactive python pixelworld/demo.py demo=custom_experiment
Explore other environments from the library. While the environments presented in the paper used a minimal subset of pixelworld's available features (see
Code organizationsection below), an extended library of example worlds exists that demonstrates the rich potential of pixelworld to generate concept-based reinforcement learning environments. NOTE that these environments are still in a beta stage and are not supported:
python pixelworld/envs/pixelworld/demo.py boxandblocks python pixelworld/envs/pixelworld/demo.py pixelpack python pixelworld/envs/pixelworld/demo.py help_or_hinder stage=1 python pixelworld/envs/pixelworld/demo.py pixelzuma python pixelworld/envs/pixelworld/demo.py hanoi
To list all available library worlds:
from pixelworld.envs.pixelworld import library print library.menu()
When running experiments by invoking:
python pixelworld/run.py experiments/<filepath>
output files are written for each experiment into
gym_test/*.mp4: videos recorded while training and testing, respectively,
joblib-encoded dictionary containing the RLLab algorithm, baseline, environment, and policy for iteration
joblib-encoded dictionary containing:
gt_labels: the ground truth labels for each of the test environments,
steps: the total number of timesteps the policy took for each repeat of each test environment,
final_rewards: the reward of the final step for each repeat of each test environment, and
correct_class: whether the policy made the correct classification for each repeat of each test environment,
joblib-encoded dictionary including information about the results of the
<num>th test environment:
avg_image: average over repeats, maximum over timesteps, of a visualization of the environment state,
actions: list for each repeat of a one-hot encoding of the action taken in each timestep,
prob: list for each repeat of the probability vector over actions in each timestep,
rllab.csv: output from rllab during training, both as written to standard output and encoded in csv format.
For further details of this output, see the functions
Some notes on code organization:
run.pyis the main script for running experiments specified by metaexperiment files. It calls functions in
run_policy.pyto perform the experiment, and functions in
expcfg.pyto interpret the metaexperiment file.
concept_csp.pyis called to generate datasets from logical expressions for the concept and generating expressions. It uses the function
concept_to_cspsto convert logical expressions into Constraint Satisfaction Problems (CSPs), which itself calls code in
concept_parser.pyto parse the logical expression and code in
csp.pyto implement constraints in the CSP.
pattern_generator.pygenerates patterns for different object types (e.g., containers) and
scene_annotation.pyannotates those patterns with information needed to compute relations (e.g., annotating the inside of a container).
envs/pixelworld/contains code for the full implementation of pixelworld, and
envs/minimal_pixelworld.pyfor the more computatioanlly efficient minimal implementation used in the AAAI18 experiments.
envs/modules.pycontain code for the construction of modular environments, in which an environment is composed of multiple modules. For example, a modular environment might consist of a module that interfaces with pixelworld, a module that implements a classification reward, a module that implements signaling, and a module that implements the local observation window.
Repository vs. paper code
The code in this repository is a reorganized version of that which was used in the experiments presented in our paper. Two changes in particular are worth noting:
The code contains only what is needed to train the flat smc-base policies.
After the paper's acceptance, we discovered a bug in our "MinimalPixelWorld" implementation that caused the self object to occupy both its previous and its new state locations for one step after an environment reset. We do not believe that this affected our results in any meaningful way, since it occurred uniformly across all conditions and training and testing datasets, the issue rarely appeared to the agent because of its limited view of the environment, and our own testing indicated that it had minimal, if any, effect on agent performance. The bug is fixed in this repository. However, the original implementation can be used by setting the
do_aaai18_reset_behaviorparameter of the MinimalPixelWorldModule to
True, e.g. by adding it as an additional kwarg in the
/experiments/aaai18/COMMONmetaexperiment configuration file.