2D Drone Pathfinding RL Simulation

A local-only desktop application that trains a drone agent to navigate a 2D grid using tabular Q-Learning. Built with Python and tkinter, compliant with Dr. Yoram Segal Professional Software Guidelines v1.00.

Overview

The simulation presents a configurable grid environment where a drone (agent) learns optimal paths from a start position to a goal using the Q-Learning reinforcement learning algorithm. The agent encounters three obstacle types and must learn to navigate around buildings, avoid traps, and compensate for crosswind drift.

All computation is performed locally using pure tabular math. No neural networks, no external APIs, no cloud services. Token cost: $0.00.

Features

Tabular Q-Learning with configurable hyperparameters (alpha, gamma, epsilon)
Interactive tkinter GUI with real-time training visualization and drone movement
Three obstacle types: Buildings (gray), Traps (red), Crosswinds (dodger blue)
Direct grid editing (Click to cycle, Right-click menu, Shift/Ctrl+Click for Start/Goal)
Scrollable Grid and Analytics panels with mouse wheel support
Q-value heatmap overlay and policy arrow display
Real-time convergence chart (matplotlib embedded)
Save/load Q-tables (policies) and grid layouts as JSON
Episode CSV logging for post-analysis
Fully reproducible via local RNG seeding

Requirements

Python 3.10 or higher
uv package manager (mandatory)
tkinter (typically bundled with Python; on Linux may require sudo apt install python3-tk)

Installation

git clone <repo-url> drone-rl
cd drone-rl
uv sync

Running the App

uv run drone-rl

or equivalently:

python -m drone_rl
uv run -m drone_rl

Running Tests

uv run pytest
uv run pytest -v
uv run pytest tests/unit/test_rl/test_grid_types.py -v

Obstacle Types

Type	Color	Reward	Behavior
Building	Gray	-10	Blocks movement; agent bounces back
Trap	Red	-100	Ends the episode immediately
Crosswind	Dodger Blue	-10	Drifts agent in configured wind direction

Reward Values

Event	Reward	Source
Reach goal	+100	PRD section 4.1
Empty step	-1	Encourages shortest path
Building collision	-10	Discourages wall-hugging
Trap hit	-100	Terminal penalty
Crosswind tile	-10	Penalty for wind zones

Keyboard Shortcuts

Shortcut	Action
`Space`	Toggle Pause / Resume training
`Ctrl+S`	Save current policy to JSON
`Ctrl+L`	Load policy from JSON
`Ctrl+R`	Reset Q-table and episode log (Reset Grid to default)
`Ctrl+E`	Export episode log to CSV
`Delete`	Clear selected grid cell to Empty
`Mouse Wheel`	Scroll Grid or Analytics panels
`L-Click`	Cycle cell type
`R-Click`	Open cell menu (Set Start/Goal)
`Shift+Click`	Set Start position
`Ctrl+Click`	Set Goal position
Mouse drag	Paint multiple cells in one gesture

Development

Lint & Format

# Check all files (ruff)
uv run ruff check src/ tests/

# Auto-fix lint issues
uv run ruff check --fix src/ tests/

# Format code
uv run ruff format src/ tests/

Tests & Coverage

# Full test suite
uv run pytest

# With coverage report
uv run pytest --cov=src/drone_rl --cov-report=term-missing

# Single module
uv run pytest tests/unit/test_rl/ -v

# Integration tests only
uv run pytest tests/integration/ -v

# Coverage gate (fails below 85%)
uv run pytest --cov=src/drone_rl --cov-fail-under=85

Line-count audit (150-line rule §3.2)

find src/ tests/ -name "*.py" | xargs wc -l | sort -n | tail -20

Run the parameter sensitivity notebook

uv run jupyter nbconvert --to notebook --execute \
  notebooks/parameter_sensitivity.ipynb \
  --output notebooks/parameter_sensitivity_executed.ipynb

Project Structure

drone-rl/
  src/drone_rl/       Python package root
    types/            Canonical dataclasses (no logic)
    rl/               Pure RL engine (no GUI imports)
    sdk/              SDK layer (single entry point for business logic)
    gui/              Thin tkinter presentation layer
    shared/           Configuration, version, gatekeeper
    constants.py      Immutable project constants
    utils.py          Helper functions
    main.py           Application entry point
  tests/
    unit/             Fast isolated unit tests
    integration/      Multi-module integration tests
  config/             JSON configuration files
  docs/               Project documentation
  notebooks/          Jupyter parameter sensitivity analysis
  results/            Sensitivity analysis output JSON
  assets/             Exported notebook HTML
  policies/           Saved Q-tables (JSON)
  layouts/            Saved grid layouts (JSON)
  logs/               Episode CSV logs

Troubleshooting

ModuleNotFoundError: No module named 'drone_rl' Run via uv run python -m drone_rl.main, not python main.py. The src/ layout requires the package to be installed in the uv virtual environment (uv sync).

_tkinter.TclError: no display name and no $DISPLAY environment variable tkinter requires a display. On headless Linux servers, run with a virtual display:

sudo apt install xvfb
Xvfb :99 &
DISPLAY=:99 uv run python -m drone_rl.main

ImportError: cannot import name 'tkinter' on Ubuntu/Debian

sudo apt install python3-tk

Training appears frozen The agent may be stuck in a large state space. Try: reduce grid size, increase epsilon (more exploration), or raise alpha (faster learning) in the Hyperparameters panel, then click Reset → Train.

Q-table heatmap is all the same colour The Q-table is uninitialized (all zeros). Start a training run first. Values diverge after the first successful episode reaches the goal.

git index.lock errors during development The git index lock can accumulate in some environments. Clear with:

rm -f .git/index.lock .git/HEAD.lock

Saving & Loading Policies

Trained Q-tables are saved as JSON in policies/. Each file contains the full Q-table mapping state keys to action-value pairs, plus metadata (episode count, hyperparameters, timestamp).

Saving & Loading Layouts

Grid layouts are saved as JSON in layouts/. Each file stores grid dimensions, cell types, start/goal positions, and crosswind directions.

Configuration Files

All runtime configuration is externalized to config/ JSON files.

`config/setup.json` — Application Settings

Controls grid defaults and UI window size. Increasing max_width/max_height allows larger grids but may slow rendering. Changes take effect on next app launch.

{
  "version": "1.00",
  "grid": { "default_width": 10, "default_height": 10, "max_width": 20, "max_height": 20 },
  "ui": { "window_width": 1400, "window_height": 900 }
}

`config/hyperparameters.json` — Learning Parameters

Controls the Q-Learning algorithm. Increasing alpha (e.g., 0.5) makes the agent learn faster but less stably. Higher gamma values long-term rewards more. epsilon controls exploration vs exploitation. Can be modified and reloaded via the GUI without restarting.

{
  "alpha": 0.1, "gamma": 0.99, "epsilon": 1.0,
  "epsilon_decay": 0.995, "epsilon_min": 0.01,
  "max_steps_per_episode": 500, "total_episodes": 1000, "random_seed": 42
}

`config/rewards.json` — Reward Schedule

Defines exact reward values per PRD section 4.1 (mandatory, tested). Modifying values changes agent behavior significantly.

{
  "goal_reached": 100.0, "empty_step": -1.0,
  "building_collision": -10.0, "trap_hit": -100.0, "crosswind_penalty": -10.0
}

`config/rate_limits.json` — GUI Update Throttle

Controls the internal event gatekeeper. max_gui_updates_per_second caps UI refresh rate. max_episode_callbacks_queued limits pending episode results.

{ "max_gui_updates_per_second": 30, "max_episode_callbacks_queued": 100 }

Contribution Guidelines

Ensure your code passes all linting (uv run ruff check .).
Run the test suite (uv run pytest --cov=src/drone_rl) and maintain >=85% coverage.
Keep all source files under 150 lines.
Follow the strict SDK architecture (no business logic in GUI).
Update TODO.md as you make progress.

License & Credits

MIT License. See LICENSE for details.

Developed in compliance with the Professional Software Guidelines v1.00 by Dr. Yoram Segal.

Built with uv | Dr. Segal Guidelines v1.00 compliant

Name		Name	Last commit message	Last commit date
Latest commit History 36 Commits
.github/workflows		.github/workflows
assets		assets
config		config
docs		docs
layouts		layouts
logs		logs
notebooks		notebooks
policies		policies
results		results
sandbox		sandbox
src/drone_rl		src/drone_rl
tests		tests
.DS_Store		.DS_Store
.env-example		.env-example
.gitignore		.gitignore
.python-version		.python-version
CHANGELOG.md		CHANGELOG.md
LICENSE		LICENSE
README.md		README.md
main.py		main.py
pyproject.toml		pyproject.toml
uv.lock		uv.lock

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

2D Drone Pathfinding RL Simulation

Overview

Features

Requirements

Installation

Running the App

Running Tests

Obstacle Types

Reward Values

Keyboard Shortcuts

Development

Lint & Format

Tests & Coverage

Line-count audit (150-line rule §3.2)

Run the parameter sensitivity notebook

Project Structure

Troubleshooting

Saving & Loading Policies

Saving & Loading Layouts

Configuration Files

`config/setup.json` — Application Settings

`config/hyperparameters.json` — Learning Parameters

`config/rewards.json` — Reward Schedule

`config/rate_limits.json` — GUI Update Throttle

Contribution Guidelines

License & Credits

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

2D Drone Pathfinding RL Simulation

Overview

Features

Requirements

Installation

Running the App

Running Tests

Obstacle Types

Reward Values

Keyboard Shortcuts

Development

Lint & Format

Tests & Coverage

Line-count audit (150-line rule §3.2)

Run the parameter sensitivity notebook

Project Structure

Troubleshooting

Saving & Loading Policies

Saving & Loading Layouts

Configuration Files

config/setup.json — Application Settings

config/hyperparameters.json — Learning Parameters

config/rewards.json — Reward Schedule

config/rate_limits.json — GUI Update Throttle

Contribution Guidelines

License & Credits

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

`config/setup.json` — Application Settings

`config/hyperparameters.json` — Learning Parameters

`config/rewards.json` — Reward Schedule

`config/rate_limits.json` — GUI Update Throttle

Packages