# Hyperliquid Data Overview

This notebook provides a conceptual overview of Hyperliquid's publicly available data.

**Goal**: Understand what data exists, where it lives, and which dataset to use for trader analysis.

---

## What is Hyperliquid?

Hyperliquid is a high-performance perpetual futures exchange. Key characteristics:

- **On-chain order book**: All orders and fills are recorded on their L1
- **Public data**: Historical data is available via S3 (requester-pays)
- **High throughput**: ~100k+ orders per 100 blocks in recent data

This makes it ideal for analyzing trader behavior at scale.

---

## The Two S3 Buckets

Hyperliquid exposes data via two public S3 buckets:

| Bucket | Purpose | Key Data |
|--------|---------|----------|
| `hyperliquid-archive` | Market data archives | L2 orderbook snapshots, asset contexts |
| `hl-mainnet-node-data` | Node-streamed data | **Fills**, trades, blocks |

For trader analysis, we use `hl-mainnet-node-data`.

---

## Data Availability Timeline

| Dataset | Available From | Content | Notes |
|---------|----------------|---------|-------|
| `explorer_blocks/` | Feb 2023 | Raw blocks (orders, cancels) | Requires reconstruction to get fills |
| `node_trades/hourly/` | Mar 2025 | Trades with buyer/seller | Legacy format |
| `node_fills/hourly/` | May 2025 | Fills with PnL, fees | Legacy format |
| `node_fills_by_block/hourly/` | **Jul 2025** | **Complete fills by block** | **Best format** |

### Recommendation: Use `node_fills_by_block`

This is the most complete and recent format:
- Every fill with full metadata
- Realized PnL (`closedPnl`)
- Fees paid
- Maker/taker indicator (`crossed`)
- No reconstruction needed

---

## Fill Schema (`node_fills_by_block`)

Each fill record contains:

```json
{
  "user": "0x...",        // Wallet address
  "coin": "BTC",          // Asset traded
  "px": "92000.5",        // Fill price
  "sz": "0.1",            // Fill size
  "dir": "Open Long",     // Direction: Open Long/Short, Close Long/Short
  "closedPnl": "150.25",  // Realized PnL (only on closes)
  "fee": "2.30",          // Fee paid
  "crossed": true,        // true = taker, false = maker
  "startPosition": "0.5", // Position before this fill
  "time": 1732832668368,  // Unix timestamp (ms)
  "hash": "0x...",        // Transaction hash
  "oid": 12345678,        // Order ID
  "tid": 87654321         // Trade ID
}
```

### Key Fields for Analysis

| Field | Use Case |
|-------|----------|
| `user` | Group by trader |
| `px * sz` | Calculate volume |
| `closedPnl` | Sum for realized PnL |
| `fee` | Sum for total fees paid |
| `crossed` | Maker vs taker ratio |
| `dir` | Win rate (closes with positive PnL) |

---

## Order Schema (`explorer_blocks`)

Raw blocks contain orders, not fills:

```json
{
  "type": "order",
  "orders": [{
    "a": 0,              // Asset ID (0=BTC, 1=ETH, ...)
    "b": true,           // true=buy, false=sell
    "p": "92000.0",      // Price
    "s": "0.1",          // Size
    "r": false,          // Reduce only
    "t": {"limit": {"tif": "Alo"}}  // Order type
  }]
}
```

### Order Types (TIF)

| TIF | Meaning | Fills? |
|-----|---------|--------|
| `Alo` | Add Liquidity Only | Never immediately (maker only) |
| `Ioc` | Immediate or Cancel | Yes, if crosses spread |
| `Gtc` | Good til Cancelled | Yes, if crosses spread |

**Note**: To get fills from raw blocks, you must reconstruct them via a matching engine. This is complex and not recommended when `node_fills_by_block` is available.

---

## Cost Estimates

S3 requester-pays at ~$0.09/GB:

| Dataset | Size (est.) | Cost |
|---------|-------------|------|
| `node_fills_by_block` (5 months) | ~200-400 GB | ~$20-35 |
| `explorer_blocks` (full history) | ~3-4 TB | ~$300+ |

**Start with `node_fills_by_block`** — it's cheaper and has better data.

---

## What We Can Compute

From `node_fills_by_block`, we can calculate:

| Metric | Calculation |
|--------|-------------|
| Realized PnL | `SUM(closedPnl)` |
| Volume | `SUM(px * sz)` |
| Trade count | `COUNT(*)` |
| Maker % | `SUM(crossed=false) / COUNT(*)` |
| Win rate | Positive closes / total closes |
| Fees paid | `SUM(fee)` |

These metrics enable trader classification (smart money, whales, market makers, etc.).

---

## Next Steps

1. **[02_explore_data.ipynb](./02_explore_data.ipynb)** — Download and explore sample data from each source
2. **[03_analysis_pipeline.ipynb](./03_analysis_pipeline.ipynb)** — Analyze local data with SQL patterns for TimescaleDB