Skip to content

High-performance concurrent data structures using off-heap memory, lock-free algorithms, and hardware-aware programming techniques.

Notifications You must be signed in to change notification settings

techishthoughts-org/off_heap_algorithms

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

7 Commits
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 

Repository files navigation

Off-Heap Algorithms in Java

From Naive Implementations to Ultra Low-Latency Systems

A comprehensive journey through the algorithms that power high-frequency trading systems, real-time dashboards, and mission-critical applications where every microsecond counts.


πŸ“– About This Series

Have you ever watched a trading dashboard breatheβ€”candles sprouting in real time, volumes pulsing like a heartbeatβ€”and wondered what sort of sorcery keeps it all flowing when millions of trades per minute hammer the backend?

This repository accompanies a technical article series that explores the real algorithms and data structures used in high-frequency trading (HFT) systems. We start with naive, heap-based implementations and progressively optimize them into production-grade, off-heap solutions that eliminate GC pauses and achieve sub-microsecond latency.

Why Off-Heap?

In high-frequency trading, a single innocent-looking trade tick spawns a horde of objects. The heap swells, the dreaded garbage collector swoops in, andβ€”bangβ€”your latency budget is assassinated. In the sub-millisecond world, those stalls aren't mere nuisances. They are lost profits, evaporated opportunities, and the reason your CTO hasn't slept.

Stop throwing faster CPUs and bigger heaps at your performance problems. The real demons are:

  • Allocation churn - Creating and destroying millions of short-lived objects
  • Synchronization hotspots - Lock contention that serializes your parallel work
  • Fighting the hardware - Cache misses, false sharing, and unpredictable memory access patterns

Off-heap data structures live in native memory that the JVM doesn't manageβ€”so the GC has nothing to clean. This repository shows you how.


🎯 What You'll Learn

This series covers 11 battle-tested scenarios, each implementing a real-world problem with both naive (on-heap) and optimized (off-heap) approaches:

The Scenarios

# Scenario Problem Domain Key Optimization Speedup
01 Circular Ring Buffer OHLCV Candlestick Aggregation Off-heap memory, false sharing elimination 2x + 62% less memory
02 SPSC Queue Trade tick ingestion pipeline Lock-free single-writer pattern 3x throughput
03 MPSC Queue Multi-venue order routing CAS-based coordination 4x under contention
04 Work Distribution Task queue for order processing Sharded queues per core 5x, no false sharing
05 Event Pipeline Multi-stage trade enrichment Disruptor-style ring buffer 8x, predictable latency
06 Telemetry Logging High-frequency metrics capture Lossy, wait-free logging Negligible overhead
07 Sharded Processing Partitioned order book Per-core ownership Linear scaling
08 Zero-Allocation Messaging Message passing without GC Flyweight object pooling Zero GC pressure
09 K-FIFO Buffer Relaxed-order event processing Bounded reordering 2x throughput
10 Batch Processing Grouped trade settlement Amortized synchronization 10x fewer context switches
11 Trading Application Real-time trading dashboard Composing all patterns Production-ready system

πŸ—οΈ Architecture

Each scenario follows a consistent structure:

scenario-XX-name/
β”œβ”€β”€ approach-01-naive/          # On-heap, synchronized baseline
β”‚   β”œβ”€β”€ src/main/java/         # Implementation
β”‚   └── src/test/java/         # Comprehensive tests
β”œβ”€β”€ approach-02-optimized/      # Off-heap, lock-free solution
β”‚   β”œβ”€β”€ src/main/java/         # Optimized implementation
β”‚   └── src/test/java/         # Same tests + performance tests
β”œβ”€β”€ benchmarks/                 # JMH microbenchmarks
β”‚   └── src/jmh/java/          # Comparative benchmarks
└── README.md                   # Scenario deep-dive

Core Principles

Every optimized implementation follows these principles:

  1. Off-Heap Memory Management - Using sun.misc.Unsafe or java.nio.ByteBuffer to allocate outside the heap
  2. Lock-Free Coordination - CAS operations and memory barriers instead of locks
  3. Cache-Friendly Design - Padding to eliminate false sharing, sequential access patterns
  4. Zero-Copy Operations - Direct memory access without intermediate buffers
  5. Mechanical Sympathy - Code that respects how modern CPUs actually work

πŸš€ Quick Start

Prerequisites

  • Java 21+ (uses modern JVM features)
  • Gradle 8.5+ (build automation)
  • Understanding of Java memory model (watch this)

Build & Test

# Build everything
./gradlew build

# Run all tests
./gradlew test

# Run specific scenario tests
./gradlew :scenario-01-circurlar-ring-buffer:approach-02-optimized:test

# Run benchmarks (requires JMH)
./gradlew :scenario-01-circurlar-ring-buffer:benchmarks:jmh

Test Reports

After running tests, view HTML reports:

open scenario-01-circurlar-ring-buffer/approach-02-optimized/build/reports/tests/test/index.html

πŸ“š Learning Path

Start Here

If you're new to off-heap programming, follow this order:

  1. Scenario 01: Circular Ring Buffer - Foundation concepts

    • Off-heap memory allocation
    • Memory barriers and visibility
    • False sharing and cache line padding
  2. Scenario 02: SPSC Queue - Lock-free basics

    • Single-writer principle
    • Volatile semantics
    • When lock-free actually matters
  3. Scenario 03: MPSC Queue - Handling contention

    • CAS operations
    • ABA problem and solutions
    • Backoff strategies

Then Dive Deeper

  1. Scenarios 04-07 - Advanced patterns for distributed workloads
  2. Scenarios 08-10 - Extreme optimization techniques
  3. Scenario 11 - Putting it all together in a real application

πŸ§ͺ Testing Philosophy

Every implementation includes:

  • βœ… Unit tests - Correctness verification
  • βœ… Concurrency tests - Thread-safety validation with high contention
  • βœ… Edge case tests - Boundary conditions, wraparounds, error cases
  • βœ… Stress tests - Extended runs to catch memory leaks and race conditions
  • βœ… JMH benchmarks - Objective performance measurements

Test coverage: 85%+ on all optimized implementations


πŸ“Š Performance Numbers

All benchmarks run on: Apple M2 Pro, 10 cores, 32GB RAM, Java 21

Scenario 01: Ring Buffer (OHLCV Aggregation)

Benchmark                          Mode  Cnt   Score   Error  Units
RingBufferBenchmark.naiveOffer     avgt   25   245.3 Β± 12.1  ns/op
RingBufferBenchmark.optimizedOffer avgt   25   92.7  Β± 4.3   ns/op  (2.6x faster)

Memory footprint: -62% (off-heap doesn't count against heap)
GC pause frequency: -100% (no heap allocations)

Scenario 03: MPSC Queue (Multi-Producer)

8 producers, 1 consumer, 1M messages:
- Naive (synchronized):     ~180K msg/sec,  ~250ms p99 latency
- Optimized (lock-free):    ~720K msg/sec,  ~45ms p99 latency

Under heavy contention: 4x throughput improvement

See individual scenario READMEs for detailed benchmarks.


πŸ”¬ Real-World Application

Scenario 11: Trading Dashboard

The final scenario composes all patterns into a real-time trading application:

  • Ingests trades from multiple venues (MPSC)
  • Aggregates OHLCV candles (Ring Buffer)
  • Processes orders through multi-stage pipeline (Event Pipeline)
  • Logs telemetry without blocking (Lossy Logging)
  • Handles 500K+ trades/second with p99 latency < 100ΞΌs
cd scenario-11-trading-application
../gradlew run

πŸ“– Documentation Structure

docs/
β”œβ”€β”€ INDEX.md                       # πŸ“š Documentation hub (start here!)
β”œβ”€β”€ REPOSITORY-GUIDE.md            # πŸ—ΊοΈ Complete repository overview
β”œβ”€β”€ deep-dives/                    # 🎯 Scenario deep-dives (11 scenarios)
β”‚   β”œβ”€β”€ SCENARIO-01-APPROACH-01-NAIVE.md
β”‚   β”œβ”€β”€ SCENARIO-01-APPROACH-02-OFFHEAP.md
β”‚   β”œβ”€β”€ SCENARIO-02-SPSC-QUEUE.md
β”‚   └── ...                        # (8 more scenarios)
β”œβ”€β”€ guides/                        # πŸ“– Practical guides
β”‚   β”œβ”€β”€ GETTING-STARTED.md         # Setup and quick start
β”‚   β”œβ”€β”€ MEMORY-MANAGEMENT.md       # Off-heap memory techniques
β”‚   β”œβ”€β”€ PROFILING-GUIDE.md         # JFR, GC logs, analysis
β”‚   β”œβ”€β”€ JMH-VS-REGULAR-JAVA-RESEARCH.md  # Profiling approach
β”‚   └── PERFORMANCE-TUNING.md      # JVM optimization
└── assets/graphs/                 # πŸ“Š Performance visualizations
    β”œβ”€β”€ scenario-01/approach-comparison.png
    β”œβ”€β”€ scenario-02/approach-comparison.png
    └── scenario-03/approach-comparison.png

Key Entry Points:


πŸ› οΈ Technologies Used

Core Technologies

  • Java 21 - Modern language features, virtual threads
  • sun.misc.Unsafe - Direct memory access (approach-02 implementations)
  • JMH - Microbenchmarking framework
  • JUnit 5 - Testing framework
  • AssertJ - Fluent assertions
  • Jacoco - Code coverage
  • Gradle - Build automation

Production-Grade Alternatives

This repository demonstrates educational implementations using sun.misc.Unsafe. For production systems, consider these battle-tested libraries:

πŸ‘‰ Read our comprehensive OpenHFT guide for detailed comparisons and integration examples


πŸ“ˆ Dataset

Scenarios use real market data from World Stock Prices Daily Updating (Kaggle).

Place the CSV file in the project root or configure the path in scenario READMEs.


⚠️ Production Considerations

These implementations are educational and demonstrate core concepts. For production use:

  • βœ… Add comprehensive error handling and recovery
  • βœ… Implement proper resource cleanup (try-with-resources)
  • βœ… Add monitoring and observability hooks
  • βœ… Validate on your target hardware and JVM version
  • βœ… Use proven libraries instead of rolling your own:
  • βœ… Test under real workload patterns (not just microbenchmarks)

⚠️ sun.misc.Unsafe Warning: This API is internal and subject to change. Java 22+ offers Foreign Function & Memory API as a replacement.

Why Use Production Libraries?

Aspect Educational (This Repo) Production (OpenHFT/Agrona)
Purpose Learn concepts Ship to production
Testing Comprehensive Battle-tested in HFT
Edge Cases Educational scope Years of bug fixes
Performance Optimized Microsecond-level tuning
Support Community Commercial options
Monitoring Basic Production-grade

πŸ‘‰ Start by understanding our implementations, then graduate to production libraries for real systems.


🀝 Contributing

This is an educational repository. If you find issues or have improvements:

  1. Ensure tests pass: ./gradlew test
  2. Add tests for new features
  3. Follow the existing code style
  4. Update relevant documentation

πŸ“š Further Reading

Prerequisites

Production Libraries (Detailed in Our Docs)

  • OpenHFT Chronicle Libraries Guide ⭐ - Our comprehensive guide to production-grade off-heap tools
    • Chronicle Queue - Persistent messaging with 0.78Β΅s p99 latency
    • Chronicle Map - Off-heap key-value store with replication
    • Chronicle Wire - Zero-GC serialization framework
    • Java Thread Affinity - CPU pinning for predictable performance
    • Complete integration examples and performance comparisons

Advanced Resources


πŸ“„ License

MIT License - See LICENSE file for details


πŸ‘¨β€πŸ’» About

This project accompanies a technical article series exploring high-performance Java systems. Each scenario represents a real problem from high-frequency trading, with detailed explanations of:

  • Why the naive approach fails
  • What hardware-level bottlenecks exist
  • How the optimized solution addresses them
  • Performance characteristics and trade-offs

Goal: Bridge the gap between academic lock-free algorithms and real-world production systems.


🎯 Quick Reference Card

When You Need... Use Scenario... Key Pattern
Fixed-size cyclic buffer 01 - Ring Buffer Off-heap, sequential access
Single writer, single reader 02 - SPSC Lock-free, happens-before
Multiple writers, single reader 03 - MPSC CAS coordination
Multiple writers & readers 04 - MPMC Advanced CAS, backoff
Multi-stage processing 05 - Event Pipeline Disruptor pattern
Non-blocking logging 06 - Telemetry Lossy, wait-free
Per-core data 07 - Sharding Cache line alignment
Zero GC overhead 08 - Zero-Alloc Object pooling
Relaxed ordering OK 09 - K-FIFO Bounded reordering
Bulk operations 10 - Batching Amortization

Start with Scenario 01: Circular Ring Buffer β†’

Build the foundation, then progress through increasingly complex scenarios. Each one builds on previous concepts while introducing new optimization techniques.

Good luck, and may your latencies be ever in your favor! πŸš€

About

High-performance concurrent data structures using off-heap memory, lock-free algorithms, and hardware-aware programming techniques.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published