# Aegis Project Setup

This notebook sets up the environment and verifies that all dependencies and data files are in place.

## Overview
1. Install required packages
2. Verify Python environment
3. Create project directories
4. Check for dataset files

In [1]:
# Install all required packages (uncomment if running for first time)
# %pip install -r ../requirements.txt

# Or install individual packages:
%pip install torch torch-geometric numpy pandas scikit-learn matplotlib seaborn networkx tqdm

Note: you may need to restart the kernel to use updated packages.



[notice] A new release of pip is available: 25.2 -> 25.3
[notice] To update, run: python.exe -m pip install --upgrade pip


## 1. Install Dependencies

Install all required Python packages. This may take a few minutes on first run.

In [2]:
import sys
import os
import torch
from pathlib import Path

print(f"Python: {sys.version}")
print(f"PyTorch: {torch.__version__}")
print(f"CUDA available: {torch.cuda.is_available()}")

Python: 3.13.7 (tags/v3.13.7:bcee1c3, Aug 14 2025, 14:15:11) [MSC v.1944 64 bit (AMD64)]
PyTorch: 2.9.1+cpu
CUDA available: False


## 2. Verify Environment

Check Python version, PyTorch installation, and CUDA availability.

In [3]:
# Define required directories
dirs = ['data', 'artifacts', 'figures', 'src']

# Create each directory if it doesn't exist
for d in dirs:
    Path(d).mkdir(parents=True, exist_ok=True)
    print(f"✓ Created/verified: {d}")

# Create src package
Path('src/__init__.py').touch()
print(f"✓ Initialized src package")

✓ Created/verified: data
✓ Created/verified: artifacts
✓ Created/verified: figures
✓ Created/verified: src
✓ Initialized src package


## 3. Create Project Directories

Create the required directory structure for data, artifacts, figures, and source code.

In [4]:
# List of required dataset files
required_files = [
    'data/txs_features.csv',
    'data/txs_edgelist.csv',
    'data/txs_classes.csv'
]

# Check each file
print("Dataset files status:")
all_present = True
for file_path in required_files:
    exists = os.path.exists(file_path)
    status = "✓" if exists else "✗"
    print(f"  {status} {file_path}")
    if not exists:
        all_present = False

# Summary
print()
if all_present:
    print("✅ All dataset files found! Ready to proceed.")
else:
    print("⚠️  Some dataset files are missing. Please add them to the data/ directory.")

Dataset files status:
  ✗ data/txs_features.csv
  ✗ data/txs_edgelist.csv
  ✗ data/txs_classes.csv

⚠️  Some dataset files are missing. Please add them to the data/ directory.


---

## ✅ Setup Complete!

If all checks passed, proceed to **02_data_graph.ipynb** to load and build the transaction graph.

## 4. Verify Dataset Files

Check that all required dataset files are present in the `data/` directory.

**Required files:**
- `txs_features.csv` - Transaction node features
- `txs_edgelist.csv` - Transaction graph edges  
- `txs_classes.csv` - Transaction labels (1=illicit, 2=licit, 3=unknown)