Off-Heap Algorithms in Java

From Naive Implementations to Ultra Low-Latency Systems

A comprehensive journey through the algorithms that power high-frequency trading systems, real-time dashboards, and mission-critical applications where every microsecond counts.

📖 About This Series

Have you ever watched a trading dashboard breathe—candles sprouting in real time, volumes pulsing like a heartbeat—and wondered what sort of sorcery keeps it all flowing when millions of trades per minute hammer the backend?

This repository accompanies a technical article series that explores the real algorithms and data structures used in high-frequency trading (HFT) systems. We start with naive, heap-based implementations and progressively optimize them into production-grade, off-heap solutions that eliminate GC pauses and achieve sub-microsecond latency.

Why Off-Heap?

In high-frequency trading, a single innocent-looking trade tick spawns a horde of objects. The heap swells, the dreaded garbage collector swoops in, and—bang—your latency budget is assassinated. In the sub-millisecond world, those stalls aren't mere nuisances. They are lost profits, evaporated opportunities, and the reason your CTO hasn't slept.

Stop throwing faster CPUs and bigger heaps at your performance problems. The real demons are:

Allocation churn - Creating and destroying millions of short-lived objects
Synchronization hotspots - Lock contention that serializes your parallel work
Fighting the hardware - Cache misses, false sharing, and unpredictable memory access patterns

Off-heap data structures live in native memory that the JVM doesn't manage—so the GC has nothing to clean. This repository shows you how.

🎯 What You'll Learn

This series covers 11 battle-tested scenarios, each implementing a real-world problem with both naive (on-heap) and optimized (off-heap) approaches:

The Scenarios

#	Scenario	Problem Domain	Key Optimization	Speedup
01	Circular Ring Buffer	OHLCV Candlestick Aggregation	Off-heap memory, false sharing elimination	2x + 62% less memory
02	SPSC Queue	Trade tick ingestion pipeline	Lock-free single-writer pattern	3x throughput
03	MPSC Queue	Multi-venue order routing	CAS-based coordination	4x under contention
04	Work Distribution	Task queue for order processing	Sharded queues per core	5x, no false sharing
05	Event Pipeline	Multi-stage trade enrichment	Disruptor-style ring buffer	8x, predictable latency
06	Telemetry Logging	High-frequency metrics capture	Lossy, wait-free logging	Negligible overhead
07	Sharded Processing	Partitioned order book	Per-core ownership	Linear scaling
08	Zero-Allocation Messaging	Message passing without GC	Flyweight object pooling	Zero GC pressure
09	K-FIFO Buffer	Relaxed-order event processing	Bounded reordering	2x throughput
10	Batch Processing	Grouped trade settlement	Amortized synchronization	10x fewer context switches
11	Trading Application	Real-time trading dashboard	Composing all patterns	Production-ready system

🏗️ Architecture

Each scenario follows a consistent structure:

scenario-XX-name/
├── approach-01-naive/          # On-heap, synchronized baseline
│   ├── src/main/java/         # Implementation
│   └── src/test/java/         # Comprehensive tests
├── approach-02-optimized/      # Off-heap, lock-free solution
│   ├── src/main/java/         # Optimized implementation
│   └── src/test/java/         # Same tests + performance tests
├── benchmarks/                 # JMH microbenchmarks
│   └── src/jmh/java/          # Comparative benchmarks
└── README.md                   # Scenario deep-dive

Core Principles

Every optimized implementation follows these principles:

Off-Heap Memory Management - Using sun.misc.Unsafe or java.nio.ByteBuffer to allocate outside the heap
Lock-Free Coordination - CAS operations and memory barriers instead of locks
Cache-Friendly Design - Padding to eliminate false sharing, sequential access patterns
Zero-Copy Operations - Direct memory access without intermediate buffers
Mechanical Sympathy - Code that respects how modern CPUs actually work

🚀 Quick Start

Prerequisites

Java 21+ (uses modern JVM features)
Gradle 8.5+ (build automation)
Understanding of Java memory model (watch this)

Build & Test

# Build everything
./gradlew build

# Run all tests
./gradlew test

# Run specific scenario tests
./gradlew :scenario-01-circurlar-ring-buffer:approach-02-optimized:test

# Run benchmarks (requires JMH)
./gradlew :scenario-01-circurlar-ring-buffer:benchmarks:jmh

Test Reports

After running tests, view HTML reports:

open scenario-01-circurlar-ring-buffer/approach-02-optimized/build/reports/tests/test/index.html

📚 Learning Path

Start Here

If you're new to off-heap programming, follow this order:

Scenario 01: Circular Ring Buffer - Foundation concepts
- Off-heap memory allocation
- Memory barriers and visibility
- False sharing and cache line padding
Scenario 02: SPSC Queue - Lock-free basics
- Single-writer principle
- Volatile semantics
- When lock-free actually matters
Scenario 03: MPSC Queue - Handling contention
- CAS operations
- ABA problem and solutions
- Backoff strategies

Then Dive Deeper

Scenarios 04-07 - Advanced patterns for distributed workloads
Scenarios 08-10 - Extreme optimization techniques
Scenario 11 - Putting it all together in a real application

🧪 Testing Philosophy

Every implementation includes:

✅ Unit tests - Correctness verification
✅ Concurrency tests - Thread-safety validation with high contention
✅ Edge case tests - Boundary conditions, wraparounds, error cases
✅ Stress tests - Extended runs to catch memory leaks and race conditions
✅ JMH benchmarks - Objective performance measurements

Test coverage: 85%+ on all optimized implementations

📊 Performance Numbers

All benchmarks run on: Apple M2 Pro, 10 cores, 32GB RAM, Java 21

Scenario 01: Ring Buffer (OHLCV Aggregation)

Benchmark                          Mode  Cnt   Score   Error  Units
RingBufferBenchmark.naiveOffer     avgt   25   245.3 ± 12.1  ns/op
RingBufferBenchmark.optimizedOffer avgt   25   92.7  ± 4.3   ns/op  (2.6x faster)

Memory footprint: -62% (off-heap doesn't count against heap)
GC pause frequency: -100% (no heap allocations)

Scenario 03: MPSC Queue (Multi-Producer)

8 producers, 1 consumer, 1M messages:
- Naive (synchronized):     ~180K msg/sec,  ~250ms p99 latency
- Optimized (lock-free):    ~720K msg/sec,  ~45ms p99 latency

Under heavy contention: 4x throughput improvement

See individual scenario READMEs for detailed benchmarks.

🔬 Real-World Application

Scenario 11: Trading Dashboard

The final scenario composes all patterns into a real-time trading application:

Ingests trades from multiple venues (MPSC)
Aggregates OHLCV candles (Ring Buffer)
Processes orders through multi-stage pipeline (Event Pipeline)
Logs telemetry without blocking (Lossy Logging)
Handles 500K+ trades/second with p99 latency < 100μs

cd scenario-11-trading-application
../gradlew run

📖 Documentation Structure

docs/
├── INDEX.md                       # 📚 Documentation hub (start here!)
├── REPOSITORY-GUIDE.md            # 🗺️ Complete repository overview
├── deep-dives/                    # 🎯 Scenario deep-dives (11 scenarios)
│   ├── SCENARIO-01-APPROACH-01-NAIVE.md
│   ├── SCENARIO-01-APPROACH-02-OFFHEAP.md
│   ├── SCENARIO-02-SPSC-QUEUE.md
│   └── ...                        # (8 more scenarios)
├── guides/                        # 📖 Practical guides
│   ├── GETTING-STARTED.md         # Setup and quick start
│   ├── MEMORY-MANAGEMENT.md       # Off-heap memory techniques
│   ├── PROFILING-GUIDE.md         # JFR, GC logs, analysis
│   ├── JMH-VS-REGULAR-JAVA-RESEARCH.md  # Profiling approach
│   └── PERFORMANCE-TUNING.md      # JVM optimization
└── assets/graphs/                 # 📊 Performance visualizations
    ├── scenario-01/approach-comparison.png
    ├── scenario-02/approach-comparison.png
    └── scenario-03/approach-comparison.png

Key Entry Points:

🚀 New to off-heap? → docs/INDEX.md
🏗️ Want complete overview? → docs/REPOSITORY-GUIDE.md
📈 Ready to profile? → docs/guides/PROFILING-GUIDE.md

🛠️ Technologies Used

Core Technologies

Java 21 - Modern language features, virtual threads
sun.misc.Unsafe - Direct memory access (approach-02 implementations)
JMH - Microbenchmarking framework
JUnit 5 - Testing framework
AssertJ - Fluent assertions
Jacoco - Code coverage
Gradle - Build automation

Production-Grade Alternatives

This repository demonstrates educational implementations using sun.misc.Unsafe. For production systems, consider these battle-tested libraries:

OpenHFT Chronicle Libraries ⭐ - Production-hardened off-heap tools
- Chronicle Queue - Persistent messaging (0.78µs p99)
- Chronicle Map - Off-heap key-value store
- Chronicle Wire - Zero-GC serialization

👉 Read our comprehensive OpenHFT guide for detailed comparisons and integration examples

📈 Dataset

Scenarios use real market data from World Stock Prices Daily Updating (Kaggle).

Place the CSV file in the project root or configure the path in scenario READMEs.

⚠️ Production Considerations

These implementations are educational and demonstrate core concepts. For production use:

✅ Add comprehensive error handling and recovery
✅ Implement proper resource cleanup (try-with-resources)
✅ Add monitoring and observability hooks
✅ Validate on your target hardware and JVM version
✅ Use proven libraries instead of rolling your own:
- OpenHFT Chronicle Libraries - Battle-tested in banks and hedge funds (Our Guide)
- Agrona - Off-heap utilities by Real Logic
- JCTools - High-performance concurrent queues
- LMAX Disruptor - Ring buffer pattern
✅ Test under real workload patterns (not just microbenchmarks)

⚠️ sun.misc.Unsafe Warning: This API is internal and subject to change. Java 22+ offers Foreign Function & Memory API as a replacement.

Why Use Production Libraries?

Aspect	Educational (This Repo)	Production (OpenHFT/Agrona)
Purpose	Learn concepts	Ship to production
Testing	Comprehensive	Battle-tested in HFT
Edge Cases	Educational scope	Years of bug fixes
Performance	Optimized	Microsecond-level tuning
Support	Community	Commercial options
Monitoring	Basic	Production-grade

👉 Start by understanding our implementations, then graduate to production libraries for real systems.

🤝 Contributing

This is an educational repository. If you find issues or have improvements:

Ensure tests pass: ./gradlew test
Add tests for new features
Follow the existing code style
Update relevant documentation

📚 Further Reading

Prerequisites

Actor Model Series - Understanding concurrency patterns
Trash Talk - JVM Memory Management - How the JVM really works

Production Libraries (Detailed in Our Docs)

OpenHFT Chronicle Libraries Guide ⭐ - Our comprehensive guide to production-grade off-heap tools
- Chronicle Queue - Persistent messaging with 0.78µs p99 latency
- Chronicle Map - Off-heap key-value store with replication
- Chronicle Wire - Zero-GC serialization framework
- Java Thread Affinity - CPU pinning for predictable performance
- Complete integration examples and performance comparisons

Advanced Resources

Mechanical Sympathy Blog - Martin Thompson's insights
LMAX Disruptor - The pattern that started it all
Java Concurrency in Practice - The definitive reference
OpenHFT GitHub Organization - Production-grade off-heap libraries

📄 License

MIT License - See LICENSE file for details

👨‍💻 About

This project accompanies a technical article series exploring high-performance Java systems. Each scenario represents a real problem from high-frequency trading, with detailed explanations of:

Why the naive approach fails
What hardware-level bottlenecks exist
How the optimized solution addresses them
Performance characteristics and trade-offs

Goal: Bridge the gap between academic lock-free algorithms and real-world production systems.

🎯 Quick Reference Card

When You Need...	Use Scenario...	Key Pattern
Fixed-size cyclic buffer	01 - Ring Buffer	Off-heap, sequential access
Single writer, single reader	02 - SPSC	Lock-free, happens-before
Multiple writers, single reader	03 - MPSC	CAS coordination
Multiple writers & readers	04 - MPMC	Advanced CAS, backoff
Multi-stage processing	05 - Event Pipeline	Disruptor pattern
Non-blocking logging	06 - Telemetry	Lossy, wait-free
Per-core data	07 - Sharding	Cache line alignment
Zero GC overhead	08 - Zero-Alloc	Object pooling
Relaxed ordering OK	09 - K-FIFO	Bounded reordering
Bulk operations	10 - Batching	Amortization

Start with Scenario 01: Circular Ring Buffer →

Build the foundation, then progress through increasingly complex scenarios. Each one builds on previous concepts while introducing new optimization techniques.

Good luck, and may your latencies be ever in your favor! 🚀

Name		Name	Last commit message	Last commit date
Latest commit History 7 Commits
common		common
docs		docs
gradle/wrapper		gradle/wrapper
scenario-01-circurlar-ring-buffer		scenario-01-circurlar-ring-buffer
scenario-02-single-producer-consumer		scenario-02-single-producer-consumer
scenario-03-multi-producer-consumer		scenario-03-multi-producer-consumer
scenario-04-work-distribution		scenario-04-work-distribution
scenario-05-event-pipeline		scenario-05-event-pipeline
scenario-06-telemetry-logging		scenario-06-telemetry-logging
scenario-07-sharded-processing		scenario-07-sharded-processing
scenario-08-zero-allocation-messaging		scenario-08-zero-allocation-messaging
scenario-09-relaxed-ordering		scenario-09-relaxed-ordering
scenario-10-batch-processing		scenario-10-batch-processing
scenario-11-trading-application		scenario-11-trading-application
scripts		scripts
.dockerignore		.dockerignore
.gitignore		.gitignore
README.md		README.md
build.gradle		build.gradle
gradle.properties		gradle.properties
gradlew		gradlew
gradlew.bat		gradlew.bat
settings.gradle		settings.gradle

techishthoughts-org/off_heap_algorithms

Folders and files

Latest commit

History

Repository files navigation

Off-Heap Algorithms in Java

📖 About This Series

Why Off-Heap?

🎯 What You'll Learn

The Scenarios

🏗️ Architecture

Core Principles

🚀 Quick Start

Prerequisites

Build & Test

Test Reports

📚 Learning Path

Start Here

Then Dive Deeper

🧪 Testing Philosophy

📊 Performance Numbers

Scenario 01: Ring Buffer (OHLCV Aggregation)

Scenario 03: MPSC Queue (Multi-Producer)

🔬 Real-World Application

Scenario 11: Trading Dashboard

📖 Documentation Structure

🛠️ Technologies Used

Core Technologies

Production-Grade Alternatives

📈 Dataset

⚠️ Production Considerations

Why Use Production Libraries?

🤝 Contributing

📚 Further Reading

Prerequisites

Production Libraries (Detailed in Our Docs)

Advanced Resources

📄 License

👨‍💻 About

🎯 Quick Reference Card

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages