# Explore Hyperliquid Data Sources

Download samples from each S3 bucket and decompress to readable JSON/CSV.

**Output**: Sample files in `../hyperliquid_samples/` that you can browse in your file explorer.

---

## Setup

In [28]:
import os
import json
import boto3
import lz4.frame
import msgpack
from pathlib import Path
from dotenv import load_dotenv

load_dotenv()

# S3 client with requester-pays
s3 = boto3.client(
    's3',
    aws_access_key_id=os.getenv('AWS_ACCESS_KEY_ID'),
    aws_secret_access_key=os.getenv('AWS_SECRET_ACCESS_KEY'),
    region_name=os.getenv('AWS_REGION', 'us-east-2')
)

REQUEST_PAYER = {'RequestPayer': 'requester'}

# Output directories organized by bucket
SAMPLES_DIR = Path('../hyperliquid_samples')
ARCHIVE_DIR = SAMPLES_DIR / 'hyperliquid-archive'
NODE_DATA_DIR = SAMPLES_DIR / 'hl-mainnet-node-data'

SAMPLES_DIR.mkdir(exist_ok=True)
ARCHIVE_DIR.mkdir(exist_ok=True)
NODE_DATA_DIR.mkdir(exist_ok=True)

print(f"Samples will be saved to: {SAMPLES_DIR.resolve()}")
print(f"  hyperliquid-archive → {ARCHIVE_DIR.name}/")
print(f"  hl-mainnet-node-data → {NODE_DATA_DIR.name}/")

Samples will be saved to: /Users/trevor/Developer/trevor/vigil-contract/hyperliquid_samples
  hyperliquid-archive → hyperliquid-archive/
  hl-mainnet-node-data → hl-mainnet-node-data/


### Utility Functions

In [29]:
def list_prefixes(bucket, prefix=''):
    """List folder prefixes in S3"""
    r = s3.list_objects_v2(Bucket=bucket, Prefix=prefix, Delimiter='/', **REQUEST_PAYER)
    return [p['Prefix'] for p in r.get('CommonPrefixes', [])]

def list_files(bucket, prefix, limit=100):
    """List files in S3"""
    r = s3.list_objects_v2(Bucket=bucket, Prefix=prefix, MaxKeys=limit, **REQUEST_PAYER)
    return [(obj['Key'], obj['Size']) for obj in r.get('Contents', [])]

def download(bucket, key):
    """Download file from S3"""
    return s3.get_object(Bucket=bucket, Key=key, **REQUEST_PAYER)['Body'].read()

def decompress_lz4(data):
    """Decompress LZ4 data"""
    return lz4.frame.decompress(data)

def decode_msgpack(data):
    """Decode MessagePack data"""
    return msgpack.unpackb(data, raw=False)

def save_json(data, path):
    """Save data as formatted JSON"""
    path = Path(path)
    path.parent.mkdir(parents=True, exist_ok=True)
    with open(path, 'w') as f:
        json.dump(data, f, indent=2, default=str)
    print(f"Saved: {path} ({path.stat().st_size / 1024:.1f} KB)")

def save_text(data, path):
    """Save raw text/CSV"""
    path = Path(path)
    path.parent.mkdir(parents=True, exist_ok=True)
    with open(path, 'wb') as f:
        f.write(data)
    print(f"Saved: {path} ({path.stat().st_size / 1024:.1f} KB)")

---

## 1. hyperliquid-archive

Market data archives. Less relevant for trader analysis, but let's see what's there.

In [30]:
print("hyperliquid-archive structure:")
for p in list_prefixes('hyperliquid-archive'):
    print(f"  {p}")

hyperliquid-archive structure:
  Testnet/
  asset_ctxs/
  market_data/


### 1a. Asset Contexts (CSV)

In [31]:
# Download asset context sample
files = list_files('hyperliquid-archive', 'asset_ctxs/', limit=5)
print("Available asset_ctxs files:")
for key, size in files:
    print(f"  {key} ({size/1024:.1f} KB)")

Available asset_ctxs files:
  asset_ctxs/20230520.csv.lz4 (981.4 KB)
  asset_ctxs/20230521.csv.lz4 (1137.4 KB)
  asset_ctxs/20230522.csv.lz4 (1186.6 KB)
  asset_ctxs/20230523.csv.lz4 (1173.3 KB)
  asset_ctxs/20230524.csv.lz4 (1312.8 KB)


In [32]:
# Download and decompress
sample_key = 'asset_ctxs/20240101.csv.lz4'
data = download('hyperliquid-archive', sample_key)
decompressed = decompress_lz4(data)

# Save as CSV
save_text(decompressed, ARCHIVE_DIR / 'asset_ctxs' / '20240101.csv')

# Preview
print("\nFirst 500 chars:")
print(decompressed[:500].decode())

Saved: ../hyperliquid_samples/hyperliquid-archive/asset_ctxs/20240101.csv (16072.7 KB)

First 500 chars:
time,coin,funding,open_interest,prev_day_px,day_ntl_vlm,premium,oracle_px,mark_px,mid_px,impact_bid_px,impact_ask_px
2024-01-01T00:00:00Z,AAVE,0.00002789,1075.9,111.09,1759054.2941,0.00072316,108.69,108.79,108.765,108.75,108.7872
2024-01-01T00:00:00Z,ACE,0.00004605,9238.23,10.331,1168558.484093,0.00086837,9.3278,9.331,9.3361,9.334,9.3378
2024-01-01T00:00:00Z,ADA,0.0000125,725829,0.60135,723752.77572,0.00048152,0.59395,0.59424,0.594245,0.594144,0.594328
2024-01-01T00:00:00Z,APE,0.00013221,136489.


### 1b. L2 Book (JSON lines)

In [33]:
# Find available L2 book files
files = list_files('hyperliquid-archive', 'market_data/20240101/12/l2Book/', limit=5)
print("L2 Book files:")
for key, size in files:
    print(f"  {key.split('/')[-1]} ({size/1024:.1f} KB)")

L2 Book files:
  AAVE.lz4 (243.8 KB)
  ACE.lz4 (491.5 KB)
  ADA.lz4 (297.2 KB)
  APE.lz4 (205.3 KB)
  APT.lz4 (332.1 KB)


In [34]:
# Download BTC L2 book sample
sample_key = 'market_data/20240101/12/l2Book/BTC.lz4'
data = download('hyperliquid-archive', sample_key)
decompressed = decompress_lz4(data)

# Parse JSON lines and save first 100 as JSON array
lines = decompressed.decode().strip().split('\n')
sample = [json.loads(line) for line in lines[:100]]
save_json(sample, ARCHIVE_DIR / 'market_data' / 'BTC_l2book_sample.json')

# Preview first record
print("\nSample L2 book record:")
print(json.dumps(sample[0], indent=2))

Saved: ../hyperliquid_samples/hyperliquid-archive/market_data/BTC_l2book_sample.json (462.8 KB)

Sample L2 book record:
{
  "time": "2024-01-01T12:00:01.138942031",
  "ver_num": 1,
  "raw": {
    "channel": "l2Book",
    "data": {
      "coin": "BTC",
      "time": 1704110399915,
      "levels": [
        [
          {
            "px": "42727.0",
            "sz": "0.02265",
            "n": 1
          },
          {
            "px": "42723.0",
            "sz": "0.17828",
            "n": 1
          },
          {
            "px": "42720.0",
            "sz": "0.07722",
            "n": 1
          },
          {
            "px": "42719.0",
            "sz": "0.0234",
            "n": 1
          },
          {
            "px": "42718.0",
            "sz": "4.00898",
            "n": 2
          },
          {
            "px": "42711.0",
            "sz": "7.65792",
            "n": 2
          },
          {
            "px": "42709.0",
            "sz": "4.06794",
          

---

## 2. hl-mainnet-node-data

Node-streamed data. This is where the good stuff is.

In [35]:
print("hl-mainnet-node-data structure:")
for p in list_prefixes('hl-mainnet-node-data'):
    print(f"  {p}")

print("\n" + "=" * 60)
print("Date Ranges:")
print("=" * 60)

from datetime import datetime

def get_date_range(prefix):
    """Get first/last date for an hourly prefix"""
    dates = list_prefixes('hl-mainnet-node-data', f'{prefix}hourly/')
    if not dates:
        return None, None, 0
    first = dates[0].rstrip('/').split('/')[-1]
    last = dates[-1].rstrip('/').split('/')[-1]
    return first, last, len(dates)

datasets = [
    ('node_fills_by_block/', 'Fills by block (best)'),
    ('node_fills/', 'Node fills (legacy)'),
    ('node_trades/', 'Node trades (legacy)'),
    ('replica_cmds/', 'Replica commands'),
    ('misc_events_by_block/', 'Misc events'),
]

for prefix, label in datasets:
    first, last, count = get_date_range(prefix)
    if first:
        first_fmt = datetime.strptime(first, '%Y%m%d').strftime('%b %d, %Y')
        last_fmt = datetime.strptime(last, '%Y%m%d').strftime('%b %d, %Y')
        print(f"{label:25} {first_fmt} → {last_fmt} ({count} days)")
    else:
        print(f"{label:25} No data found")

# Explorer blocks are organized differently
prefixes = list_prefixes('hl-mainnet-node-data', 'explorer_blocks/')
if prefixes:
    print(f"{'Explorer blocks':25} Block 0 → {prefixes[-1].split('/')[1]}+ (since Feb 2023)")

hl-mainnet-node-data structure:
  explorer_blocks/
  misc_events_by_block/
  node_fills/
  node_fills_by_block/
  node_trades/
  replica_cmds/

Date Ranges:
Fills by block (best)     Jul 27, 2025 → Nov 29, 2025 (126 days)
Node fills (legacy)       May 25, 2025 → Jul 27, 2025 (64 days)
Node trades (legacy)      Mar 22, 2025 → Jun 21, 2025 (66 days)
Replica commands          No data found
Misc events               Sep 27, 2025 → Nov 29, 2025 (64 days)
Explorer blocks           Block 0 → 800000000+ (since Feb 2023)


### 2a. node_fills_by_block (Best Format)

In [36]:
# Get latest date and download one hour of fills
dates = list_prefixes('hl-mainnet-node-data', 'node_fills_by_block/hourly/')
latest_date = dates[-2].split('/')[-2]  # Second to last (likely complete)

files = list_files('hl-mainnet-node-data', f'node_fills_by_block/hourly/{latest_date}/', limit=5)
print(f"Files for {latest_date}:")
for key, size in files:
    print(f"  {key.split('/')[-1]} ({size/1024/1024:.1f} MB)")

Files for 20251128:
  0.lz4 (16.0 MB)
  1.lz4 (21.2 MB)
  10.lz4 (18.6 MB)
  11.lz4 (23.5 MB)
  12.lz4 (19.5 MB)


In [37]:
# Download hour 12 (midday, usually active)
sample_key = f'node_fills_by_block/hourly/{latest_date}/12.lz4'
print(f"Downloading {sample_key}...")
data = download('hl-mainnet-node-data', sample_key)
decompressed = decompress_lz4(data)

# Each line is a block with multiple fill events
lines = decompressed.decode().strip().split('\n')
blocks = [json.loads(line) for line in lines]
print(f"Loaded {len(blocks):,} blocks")

# Flatten to individual fills: each event is [user_address, fill_data]
fills = []
for block in blocks:
    for user, fill_data in block.get('events', []):
        fill_data['user'] = user
        fill_data['block_time'] = block['block_time']
        fills.append(fill_data)

print(f"Extracted {len(fills):,} fills")

# Save sample (first 1000 fills)
save_json(fills[:1000], NODE_DATA_DIR / 'node_fills_by_block' / f'{latest_date}_12_sample.json')

# Preview
print("\nSample fill:")
print(json.dumps(fills[0], indent=2))

Downloading node_fills_by_block/hourly/20251128/12.lz4...
Loaded 44,437 blocks
Extracted 230,074 fills
Saved: ../hyperliquid_samples/hl-mainnet-node-data/node_fills_by_block/20251128_12_sample.json (569.5 KB)

Sample fill:
{
  "coin": "HYPE",
  "px": "35.982",
  "sz": "104.87",
  "side": "B",
  "time": 1764331199926,
  "startPosition": "-4036.68",
  "dir": "Close Short",
  "closedPnl": "-77.089937",
  "hash": "0x8535edac35dc9a1786af043059f8ba02031b0091d0dfb8e928fe98fef4d07402",
  "oid": 251510202190,
  "crossed": true,
  "fee": "0.0",
  "tid": 124684432082590,
  "feeToken": "USDC",
  "twapId": null,
  "user": "0x010461c14e146ac35fe42271bdc1134ee31c703a",
  "block_time": "2025-11-28T11:59:59.926782774"
}


### 2b. explorer_blocks (Raw Blocks)

In [38]:
# Download recent blocks (from 800M range)
block_files = list_files('hl-mainnet-node-data', 'explorer_blocks/800000000/811600000/', limit=5)
print("Recent block files:")
for key, size in block_files:
    print(f"  {key.split('/')[-1]} ({size/1024:.1f} KB)")

Recent block files:
  811600100.rmp.lz4 (5347.6 KB)
  811600200.rmp.lz4 (2486.2 KB)
  811600300.rmp.lz4 (1729.9 KB)
  811600400.rmp.lz4 (2790.9 KB)
  811600500.rmp.lz4 (3112.1 KB)


In [39]:
# Download and decode a block file
if block_files:
    sample_key = block_files[0][0]
    print(f"Downloading {sample_key}...")
    data = download('hl-mainnet-node-data', sample_key)
    
    # Decompress and decode MessagePack
    decompressed = decompress_lz4(data)
    blocks = decode_msgpack(decompressed)
    print(f"Loaded {len(blocks)} blocks")
    
    # Save as JSON
    block_num = sample_key.split('/')[-1].replace('.rmp.lz4', '')
    save_json(blocks, NODE_DATA_DIR / 'explorer_blocks' / f'{block_num}.json')
    
    # Preview
    print("\nBlock header:")
    print(json.dumps(blocks[0]['header'], indent=2))

Downloading explorer_blocks/800000000/811600000/811600100.rmp.lz4...
Loaded 100 blocks
Saved: ../hyperliquid_samples/hl-mainnet-node-data/explorer_blocks/811600100.json (115703.2 KB)

Block header:
{
  "block_time": "2025-11-28T20:54:18.103847361",
  "height": 811600001,
  "hash": "0xd70c139e745457b7c70b4d3bf380b91fb7b1019ebd84499deed7d4a2cb819948",
  "proposer": "0xb796a00b6e50c3dd46e43346c921fe8e146f4e06"
}


### 2c. node_trades (Legacy Format)

In [None]:
# Download node_trades sample (legacy format - ended June 2025)
trade_dates = list_prefixes('hl-mainnet-node-data', 'node_trades/hourly/')

if trade_dates:
    # Use middle of date range (edges may be incomplete)
    sample_date = trade_dates[len(trade_dates)//2].split('/')[-2]
    trade_files = list_files('hl-mainnet-node-data', f'node_trades/hourly/{sample_date}/', limit=24)
    
    # Find file with content
    valid_file = next((key for key, size in trade_files if size > 100), None)
    
    if valid_file:
        print(f"Downloading {valid_file}...")
        data = download('hl-mainnet-node-data', valid_file)
        content = decompress_lz4(data).decode()
        lines = [l for l in content.strip().split('\n') if l.strip()]
        
        trades = [json.loads(l) for l in lines[:100]]
        save_json(trades, NODE_DATA_DIR / 'node_trades' / f'{sample_date}_sample.json')
        
        print(f"\nSample trade:")
        print(json.dumps(trades[0], indent=2))
else:
    print("No node_trades data found")

---

## Summary

Files saved to `../hyperliquid_samples/`:

In [41]:
print("Downloaded samples:")
print("=" * 60)
for path in sorted(SAMPLES_DIR.rglob('*')):
    if path.is_file():
        size_kb = path.stat().st_size / 1024
        rel_path = path.relative_to(SAMPLES_DIR)
        print(f"  {rel_path}  ({size_kb:.1f} KB)")

Downloaded samples:
  README.md  (1.2 KB)
  hl-mainnet-node-data/explorer_blocks/811600100.json  (115703.2 KB)
  hl-mainnet-node-data/node_fills_by_block/20251128_12_sample.json  (569.5 KB)
  hyperliquid-archive/asset_ctxs/20240101.csv  (16072.7 KB)
  hyperliquid-archive/market_data/BTC_l2book_sample.json  (462.8 KB)


### Which Format to Use?

| Format | Best For | Complexity |
|--------|----------|------------|
| `node_fills_by_block` | **Trader analysis** | Low - ready to use |
| `explorer_blocks` | Full history reconstruction | High - needs matching engine |
| `node_trades` | Legacy trade data | Medium - less complete |
| `asset_ctxs` | Market context | Low - CSV format |

**Recommendation**: Use `node_fills_by_block` for all trader analysis. It has complete data since July 2025.

---

## Next Steps

Now that you have sample data, proceed to **[02_analysis_pipeline.ipynb](./02_analysis_pipeline.ipynb)** to analyze it.