# Top of Book Snapshots at One-Minute Intervals

This notebook demonstrates how to capture best bid and ask prices at regular intervals from ITCH 5.0 data.

Top of book snapshots are useful for:
- Creating time series of bid-ask spreads
- Analyzing price movements over time
- Calculating mid-price evolution
- Building datasets for machine learning models

In [None]:
from pathlib import Path
import pandas as pd
import time
from meatpy.itch50 import ITCH50MessageReader, ITCH50MarketProcessor

# Define paths and parameters
data_dir = Path("data")
file_path = data_dir / "S081321-v50.txt.gz"

# Symbols to track (using commonly traded symbols)
symbols = ["AAPL", "SPY"]

# Snapshot interval in minutes
interval_minutes = 1

print(f"Input file: {file_path}")
print(f"Target symbols: {symbols}")
print(f"Snapshot interval: {interval_minutes} minute(s)")

# Check if file exists
if not file_path.exists():
    print(f"❌ Input file not found: {file_path}")
    print(
        "Please place an ITCH 5.0 file (e.g., S081321-v50.txt.gz) in the data/ directory"
    )
else:
    print(f"✅ Input file found: {file_path}")
    print(f"File size: {file_path.stat().st_size / (1024**3):.2f} GB")

In [None]:
def extract_top_of_book_snapshots(file_path, symbols, interval_minutes=1):
    """Extract top of book snapshots at regular intervals.

    Args:
        file_path: Path to ITCH file
        symbols: List of symbols to track
        interval_minutes: Interval between snapshots in minutes

    Returns:
        List of snapshot dictionaries
    """
    processor = ITCH50MarketProcessor()
    snapshots = []

    # Convert interval to nanoseconds (ITCH timestamps are in nanoseconds)
    interval_ns = interval_minutes * 60 * 1_000_000_000
    next_snapshot_time = None

    message_count = 0
    start_time = time.time()

    print(f"🚀 Processing file to extract {interval_minutes}-minute snapshots...\n")

    with ITCH50MessageReader(file_path) as reader:
        for message in reader:
            message_count += 1
            processor.process_message(message)

            # Initialize snapshot time based on first timestamped message
            if next_snapshot_time is None and hasattr(message, "timestamp"):
                next_snapshot_time = message.timestamp + interval_ns
                print(f"📅 First timestamp: {message.timestamp} ns")
                print(f"⏰ Next snapshot at: {next_snapshot_time} ns\n")

            # Check if it's time for a snapshot
            if (
                hasattr(message, "timestamp")
                and next_snapshot_time is not None
                and message.timestamp >= next_snapshot_time
            ):
                # Take snapshot for each symbol
                for symbol in symbols:
                    lob = processor.get_lob(symbol)
                    if lob and lob.best_bid and lob.best_ask:
                        snapshot = {
                            "timestamp": message.timestamp,
                            "symbol": symbol,
                            "best_bid": lob.best_bid.price,
                            "best_bid_size": lob.best_bid.size,
                            "best_ask": lob.best_ask.price,
                            "best_ask_size": lob.best_ask.size,
                            "spread": lob.best_ask.price - lob.best_bid.price,
                            "mid_price": (lob.best_bid.price + lob.best_ask.price) / 2,
                        }
                        snapshots.append(snapshot)

                # Set next snapshot time
                next_snapshot_time += interval_ns

                # Progress update
                if len(snapshots) % 50 == 0:
                    elapsed = time.time() - start_time
                    print(
                        f"📸 Captured {len(snapshots)} snapshots after {elapsed:.1f}s"
                    )

            # General progress indicator
            if message_count % 5_000_000 == 0:
                elapsed = time.time() - start_time
                rate = message_count / elapsed if elapsed > 0 else 0
                print(f"   Processed {message_count:,} messages ({rate:,.0f} msg/s)")

    elapsed = time.time() - start_time
    print(f"\n✅ Processing complete in {elapsed:.1f} seconds")
    print(f"   Total messages processed: {message_count:,}")
    print(f"   Snapshots captured: {len(snapshots)}")

    return snapshots

In [None]:
# Extract snapshots
if file_path.exists():
    snapshots = extract_top_of_book_snapshots(file_path, symbols, interval_minutes)

    if snapshots:
        # Convert to DataFrame for easier analysis
        df = pd.DataFrame(snapshots)

        print("\n📊 Snapshot Summary:")
        print(f"   Total snapshots: {len(df)}")
        print(f"   Symbols covered: {sorted(df['symbol'].unique())}")
        print(f"   Time span: {df['timestamp'].min()} to {df['timestamp'].max()}")

        # Show sample data
        print("\n🔍 First 10 snapshots:")
        pd.set_option("display.precision", 4)
        display(df[["symbol", "best_bid", "best_ask", "spread", "mid_price"]].head(10))

        # Summary statistics by symbol
        print("\n📈 Summary statistics by symbol:")
        summary = df.groupby("symbol")[
            ["best_bid", "best_ask", "spread", "mid_price"]
        ].agg(
            {
                "best_bid": ["mean", "min", "max", "std"],
                "best_ask": ["mean", "min", "max", "std"],
                "spread": ["mean", "min", "max", "std"],
                "mid_price": ["mean", "min", "max", "std"],
            }
        )
        display(summary)

    else:
        print("❌ No snapshots were captured")
        print("   This might happen if:")
        print("   - The symbols are not present in the file")
        print("   - The file is too short for the snapshot interval")
        print("   - There were no valid order book states")

else:
    print("⚠️  Cannot run snapshot extraction without input file")

In [None]:
# Save snapshots to CSV for further analysis
if "df" in locals() and not df.empty:
    output_csv = data_dir / f"top_of_book_snapshots_{interval_minutes}min.csv"
    df.to_csv(output_csv, index=False)

    print(f"\n💾 Snapshots saved to: {output_csv}")
    print(f"   File size: {output_csv.stat().st_size / 1024:.1f} KB")
    print(f"   Columns: {list(df.columns)}")

    # Show file format
    print("\n📝 CSV file preview:")
    with open(output_csv, "r") as f:
        for i, line in enumerate(f):
            if i < 5:  # Show first 5 lines
                print(f"   {line.strip()}")
            else:
                break

    print("\n✅ Data is ready for further analysis in Excel, R, Python, etc.")

## Analysis Ideas

With the top-of-book snapshot data, you can perform various analyses:

### Time Series Analysis
- Plot mid-price evolution over time
- Analyze bid-ask spread patterns
- Identify periods of high/low liquidity

### Statistical Analysis
- Calculate price volatility
- Measure average spreads by time of day
- Compare liquidity across different symbols

### Visualization
```python
import matplotlib.pyplot as plt

# Plot mid-price for each symbol
for symbol in df['symbol'].unique():
    symbol_data = df[df['symbol'] == symbol]
    plt.plot(symbol_data.index, symbol_data['mid_price'], label=symbol)

plt.xlabel('Snapshot Number')
plt.ylabel('Mid Price ($)')
plt.title('Mid Price Evolution')
plt.legend()
plt.show()
```

## Key Points

- **Regular Intervals**: Snapshots are taken at regular time intervals, not based on message count
- **Best Bid/Ask Only**: This captures only the top level of the order book for efficiency
- **Multiple Symbols**: Can track multiple symbols simultaneously
- **Memory Efficient**: Only stores snapshot data, not the full order book history
- **Flexible Intervals**: Easily adjustable from seconds to hours depending on your needs

## Performance Considerations

- **Interval Selection**: Shorter intervals = more data but longer processing time
- **Symbol Count**: More symbols = proportionally more snapshots
- **File Size**: Large ITCH files may take significant time to process
- **Memory Usage**: The market processor maintains order books for all symbols in memory