A multi-agent reinforcement learning simulation where autonomous car agents learn to navigate a roundabout — avoiding collisions and finding optimal routes — through Q-Learning enhanced with an emergent norm-building mechanism.
Standard Q-Learning works well for a single agent in isolation, but in shared environments, agents can interfere with one another in ways that individual reward signals don't capture. This project extends Q-Learning with a social norm system: agents collectively build and reinforce behavioral rules when collisions occur, leading to emergent traffic patterns without any pre-programmed rules.
Key question explored: Can individual reinforcement learning agents spontaneously develop collective safety behaviors?
src/
├── Main.java # Entry point & parameter collection
├── QLearning/
│ ├── QTable.java # Q-table, reward matrix R, transition matrix P
│ └── State.java # (position, objective) state representation
├── environment/
│ ├── Environment.java # Procedural roundabout map generation
│ ├── Car.java # Agent with perception, score, direction
│ ├── Position.java # 2D grid coordinate
│ └── Simulation.java # Episode loop, crash detection, chart export
└── norms/
├── Norm.java # Probabilistic norm: perception → stop/go
├── NormBase.java # Global norm registry shared by all agents
├── NormsBase.java
├── PerceptionBase.java
└── PerceptionState.java # 4-directional perception (left/right/front/back)
Each car agent maintains a shared Q-table indexed by (state, action) pairs, where:
- State =
(current position, target exit) - Actions =
{UP, RIGHT, DOWN, LEFT}
Q-values are updated after every move using the Bellman equation:
Q(s, a) ← Q(s, a) + α · [R(s, a) + γ · max Q(s', a') − Q(s, a)]
| Parameter | Role | Recommended value |
|---|---|---|
α (alpha) |
Learning rate — how fast Q-values update | 1.0 |
γ (gamma) |
Discount factor — 0 = greedy, 1 = far-sighted |
0.7 |
Rewards:
- +100 — reaching the target exit
- −1 — each step taken (encourages efficiency)
- −20 — illegal move (wall / out of bounds)
- −50 — collision with another car
When a collision occurs, the simulation records the perceptual context of the involved car (what it perceived on all four sides) and creates or reinforces a Norm in a shared NormBase.
A Norm stores a perception pattern and a probability p of triggering a stop:
- After a collision:
p ← √p(reinforced — more likely to stop next time) - After a false alarm:
p ← p²(weakened — reduces over-caution)
This allows agents to collectively learn when it is dangerous to move, purely from experience — without any hard-coded traffic rules.
The charts below were generated from a simulation run with 2 lanes, 4 exits, 5 cars, 500 episodes.
| Metric | Chart |
|---|---|
| Average reward per episode | ![]() |
| Collisions per episode | ![]() |
The reward curve shows convergence as agents learn optimal paths. The crash curve demonstrates the effectiveness of the norm-building mechanism in reducing collisions over time.
- Java 8 or higher
java -jar Simulation.jarYou will be prompted to configure the simulation:
Type number of lanes (>= 1) → e.g. 2
Type number of exits (>= 2) → e.g. 4
Type number of episodes → e.g. 500
Type number of cars → e.g. 5
Display the crashes? (true/false) → true
Choose alpha factor → 1
Choose gamma factor → 0.7
After the simulation completes, two PNG charts are saved in the current directory: Rewards.png and Crashes.png.
| Scenario | Lanes | Exits | Episodes | Cars | Alpha | Gamma |
|---|---|---|---|---|---|---|
| Quick test | 1 | 2 | 100 | 3 | 1 | 0.7 |
| Standard run | 2 | 4 | 500 | 5 | 1 | 0.7 |
| Dense traffic | 3 | 6 | 1000 | 10 | 1 | 0.9 |
| Symbol | Meaning |
|---|---|
^ > v < |
Car moving up / right / down / left |
C |
Car involved in a collision |
_ |
Drivable road cell |
X |
Wall / off-road cell |
Full Javadoc is available at tanguycad.github.io/Reinforcement-Learning-simulation-Java.
This project was developed as an exploration of normative multi-agent systems — a research area studying how agents operating in shared environments can develop and follow collective behavioral norms, either by design or emergence.
Related concepts:
- Q-Learning (Watkins & Dayan, 1992)
- Emergent norms in MAS (Shoham & Tennenholtz)
- Norm synthesis from experience
Tanguy Cadieux — GitHub

