Skip to content

timothim/trading-agent

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

2 Commits
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 

Repository files navigation

Autonomous DRL Trading Agent (PPO)

A production-grade Proximal Policy Optimization (PPO) agent for portfolio allocation and optimal execution in cryptocurrency markets. This project demonstrates deep understanding of reinforcement learning, financial engineering, and MLOps best practices.

Python 3.10+ PyTorch Stable-Baselines3 License: MIT

Overview

This project implements a complete pipeline for training, evaluating, and deploying a deep reinforcement learning agent for autonomous trading:

  • Custom Gymnasium environment with realistic friction modeling (transaction costs, slippage, execution delay)
  • Differential Sharpe Ratio reward (Moody & Saffell, 2001) for risk-adjusted optimization
  • Feature engineering pipeline with log-returns, technical indicators (RSI, MACD, ATR, Bollinger), and rolling Z-score normalization
  • Interactive Streamlit dashboard for real-time training visualization and backtesting
  • Full MLOps infrastructure with MLflow tracking, Docker containerization, and TensorBoard logging

Disclaimer: This project is for educational and research purposes. It is NOT financial advice. Trading involves substantial risk of loss. Past performance does not guarantee future results.

Architecture

β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚                        OBSERVATION                                β”‚
β”‚   State: (window_size, features) - Normalized OHLCV + Indicators β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
                            β”‚
                            β–Ό
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚                     FEATURE EXTRACTOR                             β”‚
β”‚              MLP [64, 64] with ReLU activation                   β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
                            β”‚
              β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”΄β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
              β”‚                           β”‚
              β–Ό                           β–Ό
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”   β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚     POLICY HEAD         β”‚   β”‚     VALUE HEAD          β”‚
β”‚   (Actor - Ο€(a|s))      β”‚   β”‚   (Critic - V(s))       β”‚
β”‚   Softmax / Gaussian    β”‚   β”‚   Single Output         β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜   β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
              β”‚
              β–Ό
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚                          ACTION                                   β”‚
β”‚   Discrete: {HOLD, BUY, SELL} or Continuous: [-1, 1]             β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜

Quick Start

Installation

# Clone repository
git clone https://github.com/timothim/trading-agent.git
cd trading-agent

# Install with pip
pip install -e .

# Or with development dependencies
pip install -e ".[dev]"

# Setup pre-commit hooks (optional)
pre-commit install

Dashboard

The easiest way to interact with the project:

# Launch interactive Streamlit dashboard
make dashboard
# β†’ http://localhost:8501

# Or directly
streamlit run app/dashboard.py

The dashboard lets you:

  • Train a PPO agent with configurable hyperparameters and watch it learn in real-time
  • Backtest the trained model on unseen data with step-by-step visualization
  • Analyze performance metrics (Sharpe, Sortino, drawdown) and compare against buy-and-hold
  • Verify that the training is real (model inspection, TensorBoard logs)

Training (CLI)

# Train with default configuration
make train

# Train with custom config
python scripts/train.py --config configs/training.yaml --timesteps 1000000

# Run hyperparameter optimization
python scripts/train.py --config configs/training.yaml --hpo --hpo-trials 100

Evaluation

# Evaluate trained model
python scripts/evaluate.py --model models/best_model --config configs/default.yaml

# Run walk-forward backtest
python scripts/backtest.py --model models/best_model --walk-forward --folds 5

Docker

# Build and run with Docker Compose
docker-compose up training

# Launch MLflow UI
docker-compose up mlflow
# β†’ http://localhost:5000

# Start TensorBoard
docker-compose up tensorboard
# β†’ http://localhost:6006

Project Structure

trading-agent/
β”œβ”€β”€ app/
β”‚   β”œβ”€β”€ dashboard.py              # Streamlit UI (interactive dashboard)
β”‚   └── engine.py                 # Training & backtest orchestration
β”‚
β”œβ”€β”€ src/trading_agent/
β”‚   β”œβ”€β”€ data/                     # Data acquisition
β”‚   β”‚   β”œβ”€β”€ fetcher.py            # CCXT exchange integration
β”‚   β”‚   └── dataset.py            # Data management & caching
β”‚   β”‚
β”‚   β”œβ”€β”€ preprocessing/            # Feature engineering
β”‚   β”‚   β”œβ”€β”€ transformers.py       # Log-returns, fractional differentiation
β”‚   β”‚   β”œβ”€β”€ normalizers.py        # Rolling Z-score normalization
β”‚   β”‚   β”œβ”€β”€ indicators.py         # RSI, MACD, ATR, Bollinger Bands
β”‚   β”‚   └── pipeline.py           # Orchestrated preprocessing pipeline
β”‚   β”‚
β”‚   β”œβ”€β”€ simulation/               # Custom Gymnasium environment
β”‚   β”‚   β”œβ”€β”€ trading_env.py        # Core RL environment
β”‚   β”‚   β”œβ”€β”€ action_schemes.py     # Discrete / Continuous action spaces
β”‚   β”‚   β”œβ”€β”€ reward_functions.py   # Differential Sharpe, Sortino, etc.
β”‚   β”‚   └── friction.py           # Transaction costs, slippage, latency
β”‚   β”‚
β”‚   β”œβ”€β”€ model/                    # PPO agent
β”‚   β”‚   β”œβ”€β”€ networks.py           # Actor-Critic MLP networks
β”‚   β”‚   β”œβ”€β”€ agent.py              # PPO wrapper (Stable-Baselines3)
β”‚   β”‚   └── callbacks.py          # Training callbacks & checkpoints
β”‚   β”‚
β”‚   β”œβ”€β”€ training/                 # Training infrastructure
β”‚   β”‚   β”œβ”€β”€ trainer.py            # Training orchestration
β”‚   β”‚   β”œβ”€β”€ vectorized.py         # SubprocVecEnv parallel environments
β”‚   β”‚   └── hyperopt.py           # Optuna hyperparameter optimization
β”‚   β”‚
β”‚   β”œβ”€β”€ evaluation/               # Validation & backtesting
β”‚   β”‚   β”œβ”€β”€ metrics.py            # Sharpe, Sortino, max drawdown, Calmar
β”‚   β”‚   β”œβ”€β”€ backtester.py         # Walk-forward validation
β”‚   β”‚   └── visualization.py      # Performance charts
β”‚   β”‚
β”‚   └── deployment/               # Production inference
β”‚       β”œβ”€β”€ inference.py          # Lightweight model serving
β”‚       └── live_trader.py        # Live trading loop with safety controls
β”‚
β”œβ”€β”€ configs/                      # YAML configuration files
β”‚   β”œβ”€β”€ default.yaml              # Default hyperparameters
β”‚   β”œβ”€β”€ training.yaml             # Training overrides
β”‚   └── inference.yaml            # Deployment configuration
β”‚
β”œβ”€β”€ scripts/                      # CLI entry points
β”‚   β”œβ”€β”€ train.py                  # Training script
β”‚   β”œβ”€β”€ evaluate.py               # Evaluation script
β”‚   β”œβ”€β”€ backtest.py               # Backtesting script
β”‚   β”œβ”€β”€ live.py                   # Live trading script
β”‚   └── quickstart.py             # Quick start demo
β”‚
β”œβ”€β”€ tests/                        # Test suite
β”œβ”€β”€ notebooks/                    # Jupyter notebooks for exploration
β”œβ”€β”€ models/                       # Saved models (gitignored)
β”œβ”€β”€ data/                         # Data storage (gitignored)
└── logs/                         # Training logs (gitignored)

Configuration

All hyperparameters are configured via YAML files:

# configs/default.yaml
ppo:
  learning_rate: 0.0003
  n_steps: 2048
  batch_size: 64
  n_epochs: 10
  gamma: 0.99
  gae_lambda: 0.95
  clip_range: 0.2
  ent_coef: 0.01

environment:
  window_size: 30
  action_space_type: "discrete"  # or "continuous"
  initial_balance: 10000.0

reward:
  type: "differential_sharpe"
  risk_aversion: 0.5

Mathematical Background

Differential Sharpe Ratio (Reward)

Instead of rewarding raw returns (which encourages excessive risk), the agent optimizes the marginal contribution to the Sharpe ratio (Moody & Saffell, 2001):

$$\frac{dS_t}{dR_t} = \frac{B_{t-1} \cdot \Delta A_t - \frac{1}{2} A_{t-1} \cdot \Delta B_t}{(B_{t-1} - A_{t-1}^2)^{3/2}}$$

Where $A_t$ and $B_t$ are exponential moving averages of returns and squared returns.

Stationarity Transformation

Log-returns make price data stationary for neural network consumption:

$$r_t = \ln\left(\frac{P_t}{P_{t-1}}\right)$$

Rolling Z-Score Normalization

Features are normalized using rolling statistics to maintain consistent scale:

$$z_t = \frac{x_t - \mu_{t-w:t}}{\sigma_{t-w:t}}$$

Performance Metrics

Metric Formula Good Excellent
Sharpe Ratio $(E[R] - R_f) / \sigma$ > 1.0 > 2.0
Sortino Ratio $(E[R] - R_f) / \sigma_{down}$ > 1.5 > 2.5
Max Drawdown $\max(Peak - Trough) / Peak$ < 20% < 10%
Calmar Ratio Annual Return / Max Drawdown > 1.0 > 2.0

Development

# Run tests
make test

# Run with coverage
make test-cov

# Lint and format
make lint
make format

# Type checking
make type-check

# All quality checks
make lint && make test

Safety Features

The live trading system includes multiple safety controls:

  • Paper Trading Mode - Test without real money (enabled by default)
  • Daily Loss Limits - Stop trading after exceeding configurable threshold
  • Maximum Position Size - Limit exposure per asset
  • Trade Cooldown - Minimum time between trades to prevent overtrading
  • Graceful Shutdown - Handle SIGTERM/SIGINT signals cleanly

References

Topic Paper
PPO Schulman, J., et al. (2017). Proximal Policy Optimization Algorithms. arXiv:1707.06347
Differential Sharpe Moody, J. & Saffell, M. (2001). Learning to Trade via Direct Reinforcement. IEEE Trans. Neural Networks
Financial ML Lopez de Prado, M. (2018). Advances in Financial Machine Learning. Wiley
GAE Schulman, J., et al. (2015). High-Dimensional Continuous Control Using GAE. arXiv:1506.02438

License

MIT License - see LICENSE for details.

Acknowledgments

About

Deep Reinforcement Learning trading agent using PPO with interactive dashboard and MLOps infrastructure

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors