Skip to content
Vojta Molda edited this page Dec 17, 2018 · 1 revision

OpenAI gym is a framework and the de-facto API for reinforcement learning environments. Reinforcement learning is a branch of machine learning that deals with sequential decisions that optimize for the best possible sum of rewards.

Environment

Typical episode in an environment is shown on the following picture. The environment ENV is at each time-step receiving an action a calculated by the agent and returns a reward R and new observation s. The cycle repeats the following time-step.

OpenAI Gym Episode

You must import autodrome.envs before calling gym.make(...) method to create an instance of the environment. This is neccessary because OpenAI gym environments are registered dynamically at runtime. Currently, one environment for each game is available: ATS-Indy500-v0 and ETS2-Indy500-v0. Both environments consist of a simple circular track with railing that cause damage to the truck when hit. The episode ends whenever the truck gets damaged.

Here's an example code and that runs a single episode in the ETS2 version of the environment. It uses a suicidal agent that slams the throttle and doesn't bother steering.

import gym
import numpy as np
import autodrome.envs

env = gym.make('ETS2-Indy500-v0')  # or 'ATS-Indy500-v0'
done, state = False, env.reset()
while not done:  # Episode ends when the truck crashes
    action = np.array([1, 2])  # Straight and Full Throttle :)
    state, reward, done, info = env.step(action)
    env.render()  # Display map of the world and truck position

The following video shows how the code is executed. The env.render() displays a small windows with a 2D map of the game world and the actual truck position.

Open AI Gym Demo

Rewards, Actions and Observations

The env.step(...) function consumens an action a and returns a reward R, an observation s, done flag and an info dictionary.

The reward R is the distance travelled during the executed time-step. The reward is returned until the truck crashes. The intention of this reward function is to encourage the agent to drive as quickly as possible and for as long as possible since longer drive result in larger cummulative reward.

The action space where a belongs to is a discrete set of controls corresponding to hitting the keyboard arrow keys. The space is an instance of the gym.spaces.MultiDiscrete class. Shifting, clutch and other truck controls are set to automatic and are handled by the logic built into the game. The meaning of the two components is:

  • Component 0 - 0: left, 1: straight, 2: right
  • Component 1 - 0: throttle, 1: coast, 2: brake.

Observation space are raw pixels of RGB screen images. Resolution of the returned image is 640x480. Autodrome automatically switches to the front camera view before starting the episode. This view is ideal for reinformcement learning purposes since no truck gometry obscures the view and all pixels convey useful information about the game world.

The info dictionary contains an instance of the Policeman class that contains links to the map and the world defintions.

Clone this wiki locally