-
Notifications
You must be signed in to change notification settings - Fork 3
Getting Started
- Requirements
- Tasks
- Agent architecture
- Human gameplay
- Stub Agent gameplay
- Validation
- Visualisations
- Unit tests
You must have a Development Environment to develop and run the code. You can use a Docker container that has already been prepared, or set it up yourself. See Development Environment for more information.
This repository contains a base agent (with stubs for some brain regions) and a set of tasks, including the competition task, DM2S. The base agent is provided as a working baseline for you to build on. You can change any part you wish, but the intention is for you to focus on adding working memory to enable the agent to solve the DM2S task.
The Agent comprises an active vision system, a pre-trained visual cortex, and other brain regions trained by Reinforcement Learning. The system architecture is described at Architecture
There are 4 tasks included in this repository, described below. The first 3 of them are validation tasks. They do not solve the full DM2S task, but are designed to validate various aspects of the system. The final task is DM2S, which the agent cannot solve in its current form.
The tasks are:
- Simple task - A grid world, in which the agent must move to a goal position. Then the goal moves to a new position.
- Match-to-Sample (M2S) task. An agent is shown a sample, and then given a choice of two other samples to match it. The correct match will be the same in one attribute, such as colour or shape. False matches vary in the necessary attribute. The relevant attribute varies from game to game, and the correct logic is learned by watching a tutor play the game first.
- Move-to-Light (M2L) task - A simple test case for the active vision system. See Active Vision System for more information.
- Delayed Match-to-Sample (DM2S) task As M2S, but the sample to match is displayed briefly before the possible choices are displayed. The sample to match must be memorized. The rules of the game
Note that M2S is a simplified version of DM2S, without a delay. The stubbed architecture solves this without active vision enabled. It is the main validation test. Running it is explained in Validation below.
The base agent can be configured for passive or active vision in the agent config file.
In passive vision mode, only one image of the environment is captured for the vision system, referred to by default as full
.
In active vision mode, there is an image for the fovea
and for the periphery
. See the Active Vision page for more information. Details on running the Active Vision validation are also below in the Validation section.
These tests allow you to evaluate custom env gameplay manually. This helps you get a feel for the games and rules.
python keyboard_agent.py simple-v0 configs/simple_env_human.json
Keys: 1
-Up, 2
-Right, 3
-Down, 2
-Left
python keyboard_agent.py m2s-v0 configs/m2s_env.json
Keys: 1
-Left, 2
-Right
python keyboard_agent.py dm2s-v0 configs/dm2s_env.json
Keys: 1
-Left, 2
-Right
Before training the agent, parts of the model must be pre-trained on a custom environment without rewards. The pre-trained models (as checkpoint files) are available in the repository - for both passive and active vision modes.
However, you are welcome to perform pre-training yourself, especially if you'd like to deviate from the existing models by modifying the hyperparameters.
There are two parts:
- Pre-generate some data from the environment.
- Pre-train model modules on this data.
Data is pre-generated by running an environment with a (by default) random action policy. The observations are recorded for later iid sampling.
Examples:
python generate.py m2s-v0 configs/m2s_env.json 2000 ./data/gen_m2s
python generate.py dm2s-v0 configs/dm2s_env.json 2000 ./data/gen_dm2s
Each module that requires pre-training should be pre-trained separately. Examples:
python pretrain_visual_cortex.py --config ./configs/pretrain_full.json --env m2s-v0 --env-config ./configs/m2s_env.json --env-data-dir=./data/gen_m2s --env-obs-key=full --model-file=./data/pretrain/full.pt --epochs 7
Fovea
python pretrain_visual_cortex.py --config ./configs/pretrain_fovea.json --env dm2s-v0 --env-config ./configs/dm2s_env.json --env-data-dir=./data/gen_dm2s --env-obs-key=fovea --model-file=./data/pretrain/fovea.pt --epochs 10
Peripheral
python pretrain_visual_cortex.py --config ./configs/pretrain_peripheral.json --env dm2s-v0 --env-config ./configs/dm2s_env.json --env-data-dir=./data/gen_dm2s --env-obs-key=peripheral --model-file=./data/pretrain/peripheral.pt --epochs 10
To view the output of pretraining, you can examine the Tensorboard output in the ./run
directory (see the Visualisation section below).
To start Tensorboard in this folder, use a command such as:
tensorboard --logdir . --port=6008 --samples_per_plugin images=200
This actually trains the Reinforcement-Learning parts of the Agent on the task. It reloads pretrained networks for posterior cortex and other brain modules that are not trained rapidly or via RL.
Command structure:
python train_agent.py TASK_ENV TASK_ENV_CONFIG_FILE STUB_ENV_CONFIG_FILE AGENT_CONFIG_FILE
Note that STUB_ENV_CONFIG_FILE should configure and reload pretrained networks (e.g. Cortex).
AGENT_CONFIG_FILE should configure the RL-trained agent network[s].
Example:
python train_agent.py m2s-v0 configs/m2s_env.json configs/stub_agent_env_full.json configs/stub_agent_full.json
This command will train the default stub agent on the full image, using a pretrained visual cortex, and learn via RL to play the M2S task.
To work effectively, you will need to pre-train the visual cortices. If you don't want to load a pre-trained model, you can set the field "load"
to null
in STUB_ENV_CONFIG_FILE.
This section describes the steps that can be performed to validate the functionality of the provided base agent, and that the architecture as a whole can solve tasks. First, play the game manually (step 1). Then the base agent, with a view of the full image can operate. It is trained via RL on the M2S task (step 2).
python keyboard_agent.py m2s-v0 configs/m2s_env.json
python train_agent.py m2s-v0 configs/m2s_env.json configs/stub_agent_env_full.json configs/stub_agent_full.json
Example output:
353: reward -5.86/ 0.19/ 5.98 len 100.00
The format of the message is: `[step]: reward [min]/ [mean]/ [max] [# measurements per epoch] It should optimize to around 0/4.5/6.
An example screenshot from Tensorboard showing plots of these quantities is shown below.
Follow the steps at Active Vision System
Tensorboard is the primary method for visualisations for monitoring and debugging. Summary
files are written to disk during code execution. By default, they are written to the ./runs
folder. From the command line, run Tensorboard, which serves a web page GUI (by default at localhost:6006
) to inspect the values that have been written (including scalar plots, images and more).
The agent training and pre-training scripts write summaries. ActiveVision and Retina summaries can be enabled via their respective config files.
These tests verify the performance of the provided stubs.
python train_simple_agent.py simple-v0 configs/simple_env_machine.json configs/simple_agent_model.json
Example output:
71 reward 0.00/ 1.87/ 4.00 len 389.30 saved tmp/simple/checkpoint_71/checkpoint-71
The rewards are min/mean/max per epoch. It should optimize to around 0/4/5.
Test the operation of the Retina on an example image with the following command.
The test creates matplotlib plots that are shown in a GUI and saved to file results.png
.
python tests/test_retina.py resources/screen_example.jpg
Visualise the function of positional encoding with the following test. The output is an image positional_encoding.png
.
python tests/test_positional_encoding.py