Roundabout Q-Learning Simulation

A multi-agent reinforcement learning simulation where autonomous car agents learn to navigate a roundabout — avoiding collisions and finding optimal routes — through Q-Learning enhanced with an emergent norm-building mechanism.

Overview

Standard Q-Learning works well for a single agent in isolation, but in shared environments, agents can interfere with one another in ways that individual reward signals don't capture. This project extends Q-Learning with a social norm system: agents collectively build and reinforce behavioral rules when collisions occur, leading to emergent traffic patterns without any pre-programmed rules.

Key question explored: Can individual reinforcement learning agents spontaneously develop collective safety behaviors?

Architecture

src/
├── Main.java                   # Entry point & parameter collection
├── QLearning/
│   ├── QTable.java             # Q-table, reward matrix R, transition matrix P
│   └── State.java              # (position, objective) state representation
├── environment/
│   ├── Environment.java        # Procedural roundabout map generation
│   ├── Car.java                # Agent with perception, score, direction
│   ├── Position.java           # 2D grid coordinate
│   └── Simulation.java         # Episode loop, crash detection, chart export
└── norms/
    ├── Norm.java               # Probabilistic norm: perception → stop/go
    ├── NormBase.java           # Global norm registry shared by all agents
    ├── NormsBase.java
    ├── PerceptionBase.java
    └── PerceptionState.java    # 4-directional perception (left/right/front/back)

How It Works

Q-Learning

Each car agent maintains a shared Q-table indexed by (state, action) pairs, where:

State = (current position, target exit)
Actions = {UP, RIGHT, DOWN, LEFT}

Q-values are updated after every move using the Bellman equation:

Q(s, a) ← Q(s, a) + α · [R(s, a) + γ · max Q(s', a') − Q(s, a)]

Parameter	Role	Recommended value
`α` (alpha)	Learning rate — how fast Q-values update	`1.0`
`γ` (gamma)	Discount factor — `0` = greedy, `1` = far-sighted	`0.7`

Rewards:

+100 — reaching the target exit
−1 — each step taken (encourages efficiency)
−20 — illegal move (wall / out of bounds)
−50 — collision with another car

Norm-Building Mechanism

When a collision occurs, the simulation records the perceptual context of the involved car (what it perceived on all four sides) and creates or reinforces a Norm in a shared NormBase.

A Norm stores a perception pattern and a probability p of triggering a stop:

After a collision: p ← √p (reinforced — more likely to stop next time)
After a false alarm: p ← p² (weakened — reduces over-caution)

This allows agents to collectively learn when it is dangerous to move, purely from experience — without any hard-coded traffic rules.

Results

The charts below were generated from a simulation run with 2 lanes, 4 exits, 5 cars, 500 episodes.

Metric	Chart
Average reward per episode
Collisions per episode

The reward curve shows convergence as agents learn optimal paths. The crash curve demonstrates the effectiveness of the norm-building mechanism in reducing collisions over time.

Running the Simulation

Prerequisites

Java 8 or higher

Quick start

java -jar Simulation.jar

You will be prompted to configure the simulation:

Type number of lanes        (>= 1)       → e.g. 2
Type number of exits        (>= 2)       → e.g. 4
Type number of episodes                  → e.g. 500
Type number of cars                      → e.g. 5
Display the crashes? (true/false)        → true
Choose alpha factor                      → 1
Choose gamma factor                      → 0.7

After the simulation completes, two PNG charts are saved in the current directory: Rewards.png and Crashes.png.

Recommended configurations

Scenario	Lanes	Exits	Episodes	Cars	Alpha	Gamma
Quick test	1	2	100	3	1	0.7
Standard run	2	4	500	5	1	0.7
Dense traffic	3	6	1000	10	1	0.9

Console output legend

Symbol	Meaning
`^` `>` `v` `<`	Car moving up / right / down / left
`C`	Car involved in a collision
`_`	Drivable road cell
`X`	Wall / off-road cell

Documentation

Full Javadoc is available at tanguycad.github.io/Reinforcement-Learning-simulation-Java.

Background

This project was developed as an exploration of normative multi-agent systems — a research area studying how agents operating in shared environments can develop and follow collective behavioral norms, either by design or emergence.

Related concepts:

Q-Learning (Watkins & Dayan, 1992)
Emergent norms in MAS (Shoham & Tennenholtz)
Norm synthesis from experience

Author

Tanguy Cadieux — GitHub

Name		Name	Last commit message	Last commit date
Latest commit History 42 Commits
.github/workflows		.github/workflows
bin		bin
docs		docs
src		src
test		test
Crashes.png		Crashes.png
README.md		README.md
Rewards.png		Rewards.png
Simulation.jar		Simulation.jar

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Roundabout Q-Learning Simulation

Overview

Architecture

How It Works

Q-Learning

Norm-Building Mechanism

Results

Running the Simulation

Prerequisites

Quick start

Recommended configurations

Console output legend

Documentation

Background

Author

About

Uh oh!

Releases 1

Packages

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Roundabout Q-Learning Simulation

Overview

Architecture

How It Works

Q-Learning

Norm-Building Mechanism

Results

Running the Simulation

Prerequisites

Quick start

Recommended configurations

Console output legend

Documentation

Background

Author

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases 1

Packages 0

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Packages