# Listing Symbols in an ITCH 5.0 File

This notebook shows how to extract all available symbols from an ITCH 5.0 file.

The example uses a sample ITCH 5.0 file that should be placed in the `data/` subdirectory. Since these files can be very large (several GB), they are not included in the repository but should be provided separately.

In [1]:
from pathlib import Path
from meatpy.itch50 import ITCH50MessageReader

# Define the path to our sample data file
data_dir = Path("data")
file_path = data_dir / "S081321-v50.txt.gz"

# Check if the file exists
if not file_path.exists():
    print(f"❌ Sample file not found: {file_path}")
    print(
        "Please place an ITCH 5.0 file (e.g., S081321-v50.txt.gz) in the data/ directory"
    )
else:
    print(f"✅ Found sample file: {file_path}")
    print(f"File size: {file_path.stat().st_size / (1024**3):.2f} GB")

✅ Found sample file: data/S081321-v50.txt.gz
File size: 4.55 GB


In [2]:
def list_symbols(file_path):
    """Extract all symbols from an ITCH 5.0 file.

    Args:
        file_path: Path to the ITCH 5.0 file

    Returns:
        sorted list of symbols found in the file
    """
    symbols = set()
    message_count = 0

    print("Reading ITCH file to extract symbols...")

    with ITCH50MessageReader(file_path) as reader:
        for message in reader:
            message_count += 1

            # Stock Directory messages (type 'R') contain symbol information
            if message.type == b"R":
                symbol = message.stock.decode().strip()
                symbols.add(symbol)

            # We can break early since all stock directory messages
            # appear at the beginning of the file
            elif len(symbols) > 0 and message.type != b"R":
                # Once we've seen stock directory messages and encounter
                # a different message type, we've likely seen all symbols
                print(
                    f"Found {len(symbols)} symbols after processing {message_count:,} messages"
                )
                break

            # Progress indicator
            if message_count % 100000 == 0:
                print(
                    f"Processed {message_count:,} messages, found {len(symbols)} symbols so far..."
                )

    return sorted(symbols)

In [3]:
# Extract symbols from the file
if file_path.exists():
    symbols = list_symbols(file_path)

    print("\n📊 Summary:")
    print(f"Found {len(symbols)} unique symbols in the file")

    # Display first 20 symbols
    print("\n🔤 First 20 symbols:")
    for i, symbol in enumerate(symbols[:20]):
        print(f"  {i + 1:2d}. {symbol}")

    if len(symbols) > 20:
        print(f"  ... and {len(symbols) - 20} more symbols")

    # Show some well-known symbols if they exist
    well_known = ["AAPL", "MSFT", "GOOGL", "AMZN", "TSLA", "SPY", "QQQ"]
    found_well_known = [s for s in well_known if s in symbols]

    if found_well_known:
        print("\n🏆 Well-known symbols found:")
        for symbol in found_well_known:
            print(f"  ✓ {symbol}")
else:
    print("⚠️  Cannot run example without sample data file")

Reading ITCH file to extract symbols...
Found 92 symbols after processing 94 messages

📊 Summary:
Found 92 unique symbols in the file

🔤 First 20 symbols:
   1. A
   2. AA
   3. AAA
   4. AAAU
   5. AAC
   6. AAC+
   7. AAC=
   8. AACG
   9. AACIU
  10. AACOU
  11. AADR
  12. AAIC
  13. AAIC-B
  14. AAIC-C
  15. AAIN
  16. AAL
  17. AAMC
  18. AAME
  19. AAN
  20. AAOI
  ... and 72 more symbols

🏆 Well-known symbols found:
  ✓ AAPL


## Key Points

- **Stock Directory Messages**: ITCH files begin with Stock Directory messages (type 'R') that contain symbol information
- **Early Termination**: Since these messages appear at the beginning, we can stop reading once we encounter other message types
- **Memory Efficiency**: This approach is memory-efficient for large files since we don't need to process the entire file
- **Symbol Format**: ITCH symbols are 8-byte fields, often padded with spaces, which we strip for display

## Next Steps

Once you have the list of symbols, you can:
1. Filter the file to extract data for specific symbols of interest
2. Process order book data for particular symbols
3. Generate reports or visualizations for selected symbols