Skip to content

tanguycad/Reinforcement-Learning-simulation-Java

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

42 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Roundabout Q-Learning Simulation

Java Algorithm Domain License

Download

A multi-agent reinforcement learning simulation where autonomous car agents learn to navigate a roundabout — avoiding collisions and finding optimal routes — through Q-Learning enhanced with an emergent norm-building mechanism.


Overview

Standard Q-Learning works well for a single agent in isolation, but in shared environments, agents can interfere with one another in ways that individual reward signals don't capture. This project extends Q-Learning with a social norm system: agents collectively build and reinforce behavioral rules when collisions occur, leading to emergent traffic patterns without any pre-programmed rules.

Key question explored: Can individual reinforcement learning agents spontaneously develop collective safety behaviors?


Architecture

src/
├── Main.java                   # Entry point & parameter collection
├── QLearning/
│   ├── QTable.java             # Q-table, reward matrix R, transition matrix P
│   └── State.java              # (position, objective) state representation
├── environment/
│   ├── Environment.java        # Procedural roundabout map generation
│   ├── Car.java                # Agent with perception, score, direction
│   ├── Position.java           # 2D grid coordinate
│   └── Simulation.java         # Episode loop, crash detection, chart export
└── norms/
    ├── Norm.java               # Probabilistic norm: perception → stop/go
    ├── NormBase.java           # Global norm registry shared by all agents
    ├── NormsBase.java
    ├── PerceptionBase.java
    └── PerceptionState.java    # 4-directional perception (left/right/front/back)

How It Works

Q-Learning

Each car agent maintains a shared Q-table indexed by (state, action) pairs, where:

  • State = (current position, target exit)
  • Actions = {UP, RIGHT, DOWN, LEFT}

Q-values are updated after every move using the Bellman equation:

Q(s, a) ← Q(s, a) + α · [R(s, a) + γ · max Q(s', a') − Q(s, a)]
Parameter Role Recommended value
α (alpha) Learning rate — how fast Q-values update 1.0
γ (gamma) Discount factor — 0 = greedy, 1 = far-sighted 0.7

Rewards:

  • +100 — reaching the target exit
  • −1 — each step taken (encourages efficiency)
  • −20 — illegal move (wall / out of bounds)
  • −50 — collision with another car

Norm-Building Mechanism

When a collision occurs, the simulation records the perceptual context of the involved car (what it perceived on all four sides) and creates or reinforces a Norm in a shared NormBase.

A Norm stores a perception pattern and a probability p of triggering a stop:

  • After a collision: p ← √p (reinforced — more likely to stop next time)
  • After a false alarm: p ← p² (weakened — reduces over-caution)

This allows agents to collectively learn when it is dangerous to move, purely from experience — without any hard-coded traffic rules.


Results

The charts below were generated from a simulation run with 2 lanes, 4 exits, 5 cars, 500 episodes.

Metric Chart
Average reward per episode Rewards
Collisions per episode Crashes

The reward curve shows convergence as agents learn optimal paths. The crash curve demonstrates the effectiveness of the norm-building mechanism in reducing collisions over time.


Running the Simulation

Prerequisites

  • Java 8 or higher

Quick start

java -jar Simulation.jar

You will be prompted to configure the simulation:

Type number of lanes        (>= 1)       → e.g. 2
Type number of exits        (>= 2)       → e.g. 4
Type number of episodes                  → e.g. 500
Type number of cars                      → e.g. 5
Display the crashes? (true/false)        → true
Choose alpha factor                      → 1
Choose gamma factor                      → 0.7

After the simulation completes, two PNG charts are saved in the current directory: Rewards.png and Crashes.png.

Recommended configurations

Scenario Lanes Exits Episodes Cars Alpha Gamma
Quick test 1 2 100 3 1 0.7
Standard run 2 4 500 5 1 0.7
Dense traffic 3 6 1000 10 1 0.9

Console output legend

Symbol Meaning
^ > v < Car moving up / right / down / left
C Car involved in a collision
_ Drivable road cell
X Wall / off-road cell

Documentation

Full Javadoc is available at tanguycad.github.io/Reinforcement-Learning-simulation-Java.


Background

This project was developed as an exploration of normative multi-agent systems — a research area studying how agents operating in shared environments can develop and follow collective behavioral norms, either by design or emergence.

Related concepts:

  • Q-Learning (Watkins & Dayan, 1992)
  • Emergent norms in MAS (Shoham & Tennenholtz)
  • Norm synthesis from experience

Author

Tanguy CadieuxGitHub

About

QLearning implementation reinforced by norms in a multi-agent simulation

Topics

Resources

Stars

Watchers

Forks

Packages

 
 
 

Contributors

Languages