Skip to content

Getting Started

Yoshiki Onodera edited this page Nov 21, 2021 · 33 revisions

Table of Contents

  1. Requirements
  2. Tasks
  3. Agent architecture
  4. Human gameplay
  5. Stub Agent gameplay
  6. Validation
  7. Visualisations
  8. Unit tests

Requirements

You must have a Development Environment to develop and run the code. You can use a Docker container that has already been prepared, or set it up yourself. See Development Environment for more information.

Overview

This repository contains a base agent (with stubs for some brain regions) and a set of tasks, including the competition task, DM2S. The base agent is provided as a working baseline for you to build on. You can change any part you wish, but the intention is for you to focus on adding working memory to enable the agent to solve the DM2S task.

The Agent comprises an active vision system, a pre-trained visual cortex, and other brain regions trained by Reinforcement Learning. The system architecture is described at Architecture

Tasks

There are 4 tasks included in this repository, described below. The first 3 of them are validation tasks. They do not solve the full DM2S task, but are designed to validate various aspects of the system. The final task is DM2S, which the agent cannot solve in its current form.

The tasks are:

  1. Simple task - A grid world, in which the agent must move to a goal position. Then the goal moves to a new position.
  2. Match-to-Sample (M2S) task. An agent is shown a sample, and then given a choice of two other samples to match it. The correct match will be the same in one attribute, such as colour or shape. False matches vary in the necessary attribute. The relevant attribute varies from game to game, and the correct logic is learned by watching a tutor play the game first.
  3. Move-to-Light (M2L) task - A simple test case for the active vision system. See Active Vision System for more information.
  4. Delayed Match-to-Sample (DM2S) task As M2S, but the sample to match is displayed briefly before the possible choices are displayed. The sample to match must be memorized. The rules of the game

Note that M2S is a simplified version of DM2S, without a delay. The stubbed architecture solves this without active vision enabled. It is the main validation test. Running it is explained in Validation below.

Vision Systems

The base agent can be configured for passive or active vision in the agent config file. In passive vision mode, only one image of the environment is captured for the vision system, referred to by default as full. In active vision mode, there is an image for the fovea and for the periphery. See the Active Vision page for more information. Details on running the Active Vision validation are also below in the Validation section.

Human gameplay

These tests allow you to evaluate custom env gameplay manually. This helps you get a feel for the games and rules.

Simple RL task

python keyboard_agent.py simple-v0 configs/simple_env_human.json

Keys: 1-Up, 2-Right, 3-Down, 2-Left

Match to Sample task

python keyboard_agent.py m2s-v0 configs/m2s_env.json

Keys: 1-Left, 2-Right

Delayed Match to Sample task

python keyboard_agent.py dm2s-v0 configs/dm2s_env.json

Keys: 1-Left, 2-Right

Stub Agent gameplay

Before training the agent, parts of the model must be pre-trained on a custom environment without rewards. The pre-trained models (as checkpoint files) are available in the repository - for both passive and active vision modes.

However, you are welcome to perform pre-training yourself, especially if you'd like to deviate from the existing models by modifying the hyperparameters.

There are two parts:

  1. Pre-generate some data from the environment.
  2. Pre-train model modules on this data.

Pre-generating data

Data is pre-generated by running an environment with a (by default) random action policy. The observations are recorded for later iid sampling.

Examples:

Match to Sample

python generate.py m2s-v0 configs/m2s_env.json 2000 ./data/gen_m2s

Delayed Match to Sample

python generate.py dm2s-v0 configs/dm2s_env.json 2000 ./data/gen_dm2s

Pre-training modules

Each module that requires pre-training should be pre-trained separately. Examples:

Passive Vision (used for Match to Sample validatio)

python pretrain_visual_cortex.py --config ./configs/pretrain_full.json --env m2s-v0 --env-config ./configs/m2s_env.json --env-data-dir=./data/gen_m2s --env-obs-key=full --model-file=./data/pretrain/full.pt --epochs 7

Active Vision (required for Delayed Match to Sample)

Fovea

python pretrain_visual_cortex.py --config ./configs/pretrain_fovea.json --env dm2s-v0 --env-config ./configs/dm2s_env.json --env-data-dir=./data/gen_dm2s --env-obs-key=fovea --model-file=./data/pretrain/fovea.pt --epochs 10

Peripheral

python pretrain_visual_cortex.py --config ./configs/pretrain_peripheral.json --env dm2s-v0 --env-config ./configs/dm2s_env.json --env-data-dir=./data/gen_dm2s --env-obs-key=peripheral --model-file=./data/pretrain/peripheral.pt --epochs 10

To view the output of pretraining, you can examine the Tensorboard output in the ./run directory (see the Visualisation section below).

To start Tensorboard in this folder, use a command such as:

tensorboard --logdir . --port=6008 --samples_per_plugin images=200

Training the RL agent

This actually trains the Reinforcement-Learning parts of the Agent on the task. It reloads pretrained networks for posterior cortex and other brain modules that are not trained rapidly or via RL.

Command structure: python train_agent.py TASK_ENV TASK_ENV_CONFIG_FILE STUB_ENV_CONFIG_FILE AGENT_CONFIG_FILE

Note that STUB_ENV_CONFIG_FILE should configure and reload pretrained networks (e.g. Cortex).

AGENT_CONFIG_FILE should configure the RL-trained agent network[s].

Example:

python train_agent.py m2s-v0 configs/m2s_env.json configs/stub_agent_env_full.json configs/stub_agent_full.json

This command will train the default stub agent on the full image, using a pretrained visual cortex, and learn via RL to play the M2S task.

To work effectively, you will need to pre-train the visual cortices. If you don't want to load a pre-trained model, you can set the field "load" to null in STUB_ENV_CONFIG_FILE.

Validation

Base Agent validation

This section describes the steps that can be performed to validate the functionality of the provided base agent, and that the architecture as a whole can solve tasks. First, play the game manually (step 1). Then the base agent, with a view of the full image can operate. It is trained via RL on the M2S task (step 2).

  1. python keyboard_agent.py m2s-v0 configs/m2s_env.json
  2. python train_agent.py m2s-v0 configs/m2s_env.json configs/stub_agent_env_full.json configs/stub_agent_full.json

Example output: 353: reward -5.86/ 0.19/ 5.98 len 100.00

The format of the message is: `[step]: reward [min]/ [mean]/ [max] [# measurements per epoch] It should optimize to around 0/4.5/6.

An example screenshot from Tensorboard showing plots of these quantities is shown below.

Stub Agent Validation Tensorboard

Active Vision validation

Follow the steps at Active Vision System

Visualisations

Tensorboard is the primary method for visualisations for monitoring and debugging. Summary files are written to disk during code execution. By default, they are written to the ./runs folder. From the command line, run Tensorboard, which serves a web page GUI (by default at localhost:6006) to inspect the values that have been written (including scalar plots, images and more).

The agent training and pre-training scripts write summaries. ActiveVision and Retina summaries can be enabled via their respective config files.

Unit tests

These tests verify the performance of the provided stubs.

Simple Reinforcement Learning task

python train_simple_agent.py simple-v0 configs/simple_env_machine.json configs/simple_agent_model.json

Example output: 71 reward 0.00/ 1.87/ 4.00 len 389.30 saved tmp/simple/checkpoint_71/checkpoint-71

The rewards are min/mean/max per epoch. It should optimize to around 0/4/5.

Retina

Test the operation of the Retina on an example image with the following command. The test creates matplotlib plots that are shown in a GUI and saved to file results.png.

python tests/test_retina.py resources/screen_example.jpg

Positional Encoding

Visualise the function of positional encoding with the following test. The output is an image positional_encoding.png.

python tests/test_positional_encoding.py