# EAGLE-2 SE OmniDraft Setup and Execution

This notebook automates the complete setup process for the EAGLE-2 SE OmniDraft project:
1. Clone the repository
2. Install dependencies
3. Load models (TinyLlama and Meta-Llama)
4. Execute the heterogeneous speculative decoding script

## Prerequisites
- Python 3.8+
- CUDA-compatible GPU (recommended)
- HuggingFace account for model access

## Step 1: Environment Setup and Repository Cloning

In [None]:
import os
import subprocess
import sys
from pathlib import Path

# Configuration
REPO_URL = "https://github.com/tenet-diver/EAGLE-2_SE_OmniDraft.git"  # Update with actual repo URL
PROJECT_DIR = "EAGLE-2_SE_OmniDraft"
SCRIPT_NAME = "heterogeneous_spd.py"

print(f"Python version: {sys.version}")
print(f"Current working directory: {os.getcwd()}")

In [None]:
# Clone repository if it doesn't exist
if not os.path.exists(PROJECT_DIR):
    print(f"Cloning repository from {REPO_URL}...")
    result = subprocess.run(["git", "clone", REPO_URL], capture_output=True, text=True)
    if result.returncode == 0:
        print("✅ Repository cloned successfully")
    else:
        print(f"❌ Error cloning repository: {result.stderr}")
        print("Note: Update REPO_URL with the correct repository URL")
else:
    print(f"✅ Repository directory '{PROJECT_DIR}' already exists")

# Change to project directory
os.chdir(PROJECT_DIR)
print(f"Changed to directory: {os.getcwd()}")

## Step 2: Install Dependencies

In [None]:
# Check if requirements.txt exists, if not create one based on the script dependencies
requirements_content = """
torch>=2.0.0
transformers>=4.30.0
numpy>=1.21.0
accelerate>=0.20.0
sentencepiece>=0.1.99
protobuf>=3.20.0
""".strip()

if not os.path.exists("requirements.txt"):
    print("Creating requirements.txt...")
    with open("requirements.txt", "w") as f:
        f.write(requirements_content)
    print("✅ requirements.txt created")
else:
    print("✅ requirements.txt already exists")

# Display requirements
with open("requirements.txt", "r") as f:
    print("\nDependencies to install:")
    print(f.read())

In [None]:
# Install dependencies
print("Installing dependencies...")
result = subprocess.run([sys.executable, "-m", "pip", "install", "-r", "requirements.txt"], 
                       capture_output=True, text=True)

if result.returncode == 0:
    print("✅ Dependencies installed successfully")
else:
    print(f"❌ Error installing dependencies: {result.stderr}")
    print("Stdout:", result.stdout)

## Step 3: Verify Installation and Check GPU Availability

In [None]:
# Verify installations
try:
    import torch
    import transformers
    import numpy as np
    
    print(f"✅ PyTorch version: {torch.__version__}")
    print(f"✅ Transformers version: {transformers.__version__}")
    print(f"✅ NumPy version: {np.__version__}")
    
    # Check CUDA availability
    if torch.cuda.is_available():
        print(f"✅ CUDA available: {torch.cuda.get_device_name(0)}")
        print(f"   CUDA version: {torch.version.cuda}")
        print(f"   Available GPUs: {torch.cuda.device_count()}")
    else:
        print("⚠️  CUDA not available - will use CPU (slower)")
        
except ImportError as e:
    print(f"❌ Import error: {e}")

## Step 4: Pre-load and Verify Models

Before running the main script, let's verify that we can load the required models.

In [None]:
# Test model loading (similar to what the script does)
from transformers import AutoTokenizer, AutoModelForCausalLM
import torch

DEVICE = "cuda" if torch.cuda.is_available() else "cpu"
print(f"Using device: {DEVICE}")

# Model IDs from the script
TINY_ID = "TinyLlama/TinyLlama-1.1B-Chat-v1.0"
LARGE_ID = "microsoft/Phi-4-mini-instruct"

print(f"\nLoading models...")
print(f"Tiny model: {TINY_ID}")
print(f"Large model: {LARGE_ID}")

In [None]:
# Load tiny model (should be fast)
try:
    print("Loading TinyLlama tokenizer...")
    tiny_tok = AutoTokenizer.from_pretrained(TINY_ID)
    print("✅ TinyLlama tokenizer loaded")
    
    print("Loading TinyLlama model...")
    tiny_lm = AutoModelForCausalLM.from_pretrained(TINY_ID).to(DEVICE).eval()
    print("✅ TinyLlama model loaded")
    
    # Test tokenization
    test_text = "Hello, world!"
    tokens = tiny_tok.encode(test_text)
    print(f"Test tokenization: '{test_text}' -> {tokens}")
    
except Exception as e:
    print(f"❌ Error loading TinyLlama: {e}")

In [None]:
# Load large model (may require HuggingFace authentication)
try:
    print("Loading Meta-Llama tokenizer...")
    large_tok = AutoTokenizer.from_pretrained(LARGE_ID)
    print("✅ Meta-Llama tokenizer loaded")
    
    print("Loading Meta-Llama model (this may take a while)...")
    large_lm = AutoModelForCausalLM.from_pretrained(LARGE_ID).to(DEVICE).eval()
    print("✅ Meta-Llama model loaded")
    
    # Test tokenization
    test_text = "Hello, world!"
    tokens = large_tok.encode(test_text)
    print(f"Test tokenization: '{test_text}' -> {tokens}")
    
except Exception as e:
    print(f"❌ Error loading Meta-Llama: {e}")
    print("Note: You may need to:")
    print("1. Accept the license agreement on HuggingFace")
    print("2. Login with: huggingface-cli login")
    print("3. Or use a different model ID")

## Step 5: Execute the Main Script

Now let's run the heterogeneous speculative decoding script.

In [None]:
# Check if the script exists
if os.path.exists(SCRIPT_NAME):
    print(f"✅ Found script: {SCRIPT_NAME}")
    
    # Display script size and modification time
    script_path = Path(SCRIPT_NAME)
    stat = script_path.stat()
    print(f"   Size: {stat.st_size} bytes")
    print(f"   Modified: {stat.st_mtime}")
else:
    print(f"❌ Script not found: {SCRIPT_NAME}")
    print("Available files:")
    for file in os.listdir("."):
        print(f"  - {file}")

In [None]:
# Execute the script
if os.path.exists(SCRIPT_NAME):
    print(f"Executing {SCRIPT_NAME}...")
    print("=" * 50)
    
    # Run the script and capture output
    result = subprocess.run([sys.executable, SCRIPT_NAME], 
                           capture_output=True, text=True, timeout=300)  # 5 minute timeout
    
    print("STDOUT:")
    print(result.stdout)
    
    if result.stderr:
        print("\nSTDERR:")
        print(result.stderr)
    
    print("=" * 50)
    if result.returncode == 0:
        print("✅ Script executed successfully")
    else:
        print(f"❌ Script failed with return code: {result.returncode}")
else:
    print(f"❌ Cannot execute: {SCRIPT_NAME} not found")

## Step 6: Alternative - Run Script Interactively

If you prefer to run the script interactively within the notebook:

In [None]:
# Alternative: Import and run the script functions directly
try:
    # Import the script as a module
    import importlib.util
    spec = importlib.util.spec_from_file_location("heterogeneous_spd", SCRIPT_NAME)
    hetero_module = importlib.util.module_from_spec(spec)
    spec.loader.exec_module(hetero_module)
    
    print("✅ Script imported successfully")
    print("Available functions:")
    for attr in dir(hetero_module):
        if not attr.startswith('_') and callable(getattr(hetero_module, attr)):
            print(f"  - {attr}")
            
except Exception as e:
    print(f"❌ Error importing script: {e}")

In [None]:
# Run a custom test with the imported functions
try:
    if 'hetero_module' in locals():
        print("Running custom test...")
        
        # Example: Run the heterogeneous speculative decoding function
        test_prompt = "The future of artificial intelligence is"
        print(f"Test prompt: '{test_prompt}'")
        
        # Call the main function from the script
        result = hetero_module.heterogeneous_spec_decode(
            prompt=test_prompt,
            max_new_tokens=50,
            K=32,
            alpha=0.15
        )
        
        print(f"Generated text: {result}")
        print("✅ Interactive execution completed")
        
except Exception as e:
    print(f"❌ Error in interactive execution: {e}")

## Summary

This notebook has:
1. ✅ Cloned the repository (or verified it exists)
2. ✅ Installed all required dependencies
3. ✅ Verified model loading capabilities
4. ✅ Executed the heterogeneous speculative decoding script

### Next Steps
- Experiment with different prompts and parameters
- Monitor GPU memory usage during execution
- Consider using quantized models for better performance
- Add logging and performance metrics

### Troubleshooting
- If models fail to load, check HuggingFace authentication
- For CUDA out of memory errors, try reducing batch sizes or using CPU
- Update model IDs if newer versions are available