Autonomous DRL Trading Agent (PPO)

A production-grade Proximal Policy Optimization (PPO) agent for portfolio allocation and optimal execution in cryptocurrency markets. This project demonstrates deep understanding of reinforcement learning, financial engineering, and MLOps best practices.

Overview

This project implements a complete pipeline for training, evaluating, and deploying a deep reinforcement learning agent for autonomous trading:

Custom Gymnasium environment with realistic friction modeling (transaction costs, slippage, execution delay)
Differential Sharpe Ratio reward (Moody & Saffell, 2001) for risk-adjusted optimization
Feature engineering pipeline with log-returns, technical indicators (RSI, MACD, ATR, Bollinger), and rolling Z-score normalization
Interactive Streamlit dashboard for real-time training visualization and backtesting
Full MLOps infrastructure with MLflow tracking, Docker containerization, and TensorBoard logging

Disclaimer: This project is for educational and research purposes. It is NOT financial advice. Trading involves substantial risk of loss. Past performance does not guarantee future results.

Architecture

┌──────────────────────────────────────────────────────────────────┐
│                        OBSERVATION                                │
│   State: (window_size, features) - Normalized OHLCV + Indicators │
└───────────────────────────┬──────────────────────────────────────┘
                            │
                            ▼
┌──────────────────────────────────────────────────────────────────┐
│                     FEATURE EXTRACTOR                             │
│              MLP [64, 64] with ReLU activation                   │
└───────────────────────────┬──────────────────────────────────────┘
                            │
              ┌─────────────┴─────────────┐
              │                           │
              ▼                           ▼
┌─────────────────────────┐   ┌─────────────────────────┐
│     POLICY HEAD         │   │     VALUE HEAD          │
│   (Actor - π(a|s))      │   │   (Critic - V(s))       │
│   Softmax / Gaussian    │   │   Single Output         │
└─────────────────────────┘   └─────────────────────────┘
              │
              ▼
┌──────────────────────────────────────────────────────────────────┐
│                          ACTION                                   │
│   Discrete: {HOLD, BUY, SELL} or Continuous: [-1, 1]             │
└──────────────────────────────────────────────────────────────────┘

Quick Start

Installation

# Clone repository
git clone https://github.com/timothim/trading-agent.git
cd trading-agent

# Install with pip
pip install -e .

# Or with development dependencies
pip install -e ".[dev]"

# Setup pre-commit hooks (optional)
pre-commit install

Dashboard

The easiest way to interact with the project:

# Launch interactive Streamlit dashboard
make dashboard
# → http://localhost:8501

# Or directly
streamlit run app/dashboard.py

The dashboard lets you:

Train a PPO agent with configurable hyperparameters and watch it learn in real-time
Backtest the trained model on unseen data with step-by-step visualization
Analyze performance metrics (Sharpe, Sortino, drawdown) and compare against buy-and-hold
Verify that the training is real (model inspection, TensorBoard logs)

Training (CLI)

# Train with default configuration
make train

# Train with custom config
python scripts/train.py --config configs/training.yaml --timesteps 1000000

# Run hyperparameter optimization
python scripts/train.py --config configs/training.yaml --hpo --hpo-trials 100

Evaluation

# Evaluate trained model
python scripts/evaluate.py --model models/best_model --config configs/default.yaml

# Run walk-forward backtest
python scripts/backtest.py --model models/best_model --walk-forward --folds 5

Docker

# Build and run with Docker Compose
docker-compose up training

# Launch MLflow UI
docker-compose up mlflow
# → http://localhost:5000

# Start TensorBoard
docker-compose up tensorboard
# → http://localhost:6006

Project Structure

trading-agent/
├── app/
│   ├── dashboard.py              # Streamlit UI (interactive dashboard)
│   └── engine.py                 # Training & backtest orchestration
│
├── src/trading_agent/
│   ├── data/                     # Data acquisition
│   │   ├── fetcher.py            # CCXT exchange integration
│   │   └── dataset.py            # Data management & caching
│   │
│   ├── preprocessing/            # Feature engineering
│   │   ├── transformers.py       # Log-returns, fractional differentiation
│   │   ├── normalizers.py        # Rolling Z-score normalization
│   │   ├── indicators.py         # RSI, MACD, ATR, Bollinger Bands
│   │   └── pipeline.py           # Orchestrated preprocessing pipeline
│   │
│   ├── simulation/               # Custom Gymnasium environment
│   │   ├── trading_env.py        # Core RL environment
│   │   ├── action_schemes.py     # Discrete / Continuous action spaces
│   │   ├── reward_functions.py   # Differential Sharpe, Sortino, etc.
│   │   └── friction.py           # Transaction costs, slippage, latency
│   │
│   ├── model/                    # PPO agent
│   │   ├── networks.py           # Actor-Critic MLP networks
│   │   ├── agent.py              # PPO wrapper (Stable-Baselines3)
│   │   └── callbacks.py          # Training callbacks & checkpoints
│   │
│   ├── training/                 # Training infrastructure
│   │   ├── trainer.py            # Training orchestration
│   │   ├── vectorized.py         # SubprocVecEnv parallel environments
│   │   └── hyperopt.py           # Optuna hyperparameter optimization
│   │
│   ├── evaluation/               # Validation & backtesting
│   │   ├── metrics.py            # Sharpe, Sortino, max drawdown, Calmar
│   │   ├── backtester.py         # Walk-forward validation
│   │   └── visualization.py      # Performance charts
│   │
│   └── deployment/               # Production inference
│       ├── inference.py          # Lightweight model serving
│       └── live_trader.py        # Live trading loop with safety controls
│
├── configs/                      # YAML configuration files
│   ├── default.yaml              # Default hyperparameters
│   ├── training.yaml             # Training overrides
│   └── inference.yaml            # Deployment configuration
│
├── scripts/                      # CLI entry points
│   ├── train.py                  # Training script
│   ├── evaluate.py               # Evaluation script
│   ├── backtest.py               # Backtesting script
│   ├── live.py                   # Live trading script
│   └── quickstart.py             # Quick start demo
│
├── tests/                        # Test suite
├── notebooks/                    # Jupyter notebooks for exploration
├── models/                       # Saved models (gitignored)
├── data/                         # Data storage (gitignored)
└── logs/                         # Training logs (gitignored)

Configuration

All hyperparameters are configured via YAML files:

# configs/default.yaml
ppo:
  learning_rate: 0.0003
  n_steps: 2048
  batch_size: 64
  n_epochs: 10
  gamma: 0.99
  gae_lambda: 0.95
  clip_range: 0.2
  ent_coef: 0.01

environment:
  window_size: 30
  action_space_type: "discrete"  # or "continuous"
  initial_balance: 10000.0

reward:
  type: "differential_sharpe"
  risk_aversion: 0.5

Mathematical Background

Differential Sharpe Ratio (Reward)

Instead of rewarding raw returns (which encourages excessive risk), the agent optimizes the marginal contribution to the Sharpe ratio (Moody & Saffell, 2001):

$$\frac{dS_t}{dR_t} = \frac{B_{t-1} \cdot \Delta A_t - \frac{1}{2} A_{t-1} \cdot \Delta B_t}{(B_{t-1} - A_{t-1}^2)^{3/2}}$$

Where $A_t$ and $B_t$ are exponential moving averages of returns and squared returns.

Stationarity Transformation

Log-returns make price data stationary for neural network consumption:

$$r_t = \ln\left(\frac{P_t}{P_{t-1}}\right)$$

Rolling Z-Score Normalization

Features are normalized using rolling statistics to maintain consistent scale:

$$z_t = \frac{x_t - \mu_{t-w:t}}{\sigma_{t-w:t}}$$

Performance Metrics

Metric	Formula	Good	Excellent
Sharpe Ratio	$(E[R] - R_f) / \sigma$	> 1.0	> 2.0
Sortino Ratio	$(E[R] - R_f) / \sigma_{down}$	> 1.5	> 2.5
Max Drawdown	$\max(Peak - Trough) / Peak$	< 20%	< 10%
Calmar Ratio	Annual Return / Max Drawdown	> 1.0	> 2.0

Development

# Run tests
make test

# Run with coverage
make test-cov

# Lint and format
make lint
make format

# Type checking
make type-check

# All quality checks
make lint && make test

Safety Features

The live trading system includes multiple safety controls:

Paper Trading Mode - Test without real money (enabled by default)
Daily Loss Limits - Stop trading after exceeding configurable threshold
Maximum Position Size - Limit exposure per asset
Trade Cooldown - Minimum time between trades to prevent overtrading
Graceful Shutdown - Handle SIGTERM/SIGINT signals cleanly

References

Topic	Paper
PPO	Schulman, J., et al. (2017). Proximal Policy Optimization Algorithms. arXiv:1707.06347
Differential Sharpe	Moody, J. & Saffell, M. (2001). Learning to Trade via Direct Reinforcement. IEEE Trans. Neural Networks
Financial ML	Lopez de Prado, M. (2018). Advances in Financial Machine Learning. Wiley
GAE	Schulman, J., et al. (2015). High-Dimensional Continuous Control Using GAE. arXiv:1506.02438

License

MIT License - see LICENSE for details.

Acknowledgments

Stable-Baselines3 - PPO implementation
CCXT - Exchange connectivity
Gymnasium - RL environment API
Streamlit - Dashboard framework

Name		Name	Last commit message	Last commit date
Latest commit History 2 Commits
app		app
configs		configs
data		data
logs		logs
models		models
notebooks		notebooks
scripts		scripts
src/trading_agent		src/trading_agent
tests		tests
.env.example		.env.example
.gitignore		.gitignore
.pre-commit-config.yaml		.pre-commit-config.yaml
Dockerfile		Dockerfile
LICENSE		LICENSE
Makefile		Makefile
README.md		README.md
docker-compose.yaml		docker-compose.yaml
pyproject.toml		pyproject.toml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Autonomous DRL Trading Agent (PPO)

Overview

Architecture

Quick Start

Installation

Dashboard

Training (CLI)

Evaluation

Docker

Project Structure

Configuration

Mathematical Background

Differential Sharpe Ratio (Reward)

Stationarity Transformation

Rolling Z-Score Normalization

Performance Metrics

Development

Safety Features

References

License

Acknowledgments

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Autonomous DRL Trading Agent (PPO)

Overview

Architecture

Quick Start

Installation

Dashboard

Training (CLI)

Evaluation

Docker

Project Structure

Configuration

Mathematical Background

Differential Sharpe Ratio (Reward)

Stationarity Transformation

Rolling Z-Score Normalization

Performance Metrics

Development

Safety Features

References

License

Acknowledgments

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages