# ONNX Pipeline Integration Demo

This notebook demonstrates how to use the enhanced pipeline with ONNXTokenizer for seamless ONNX model integration.

## Features Demonstrated:
1. **Auto-detecting ONNXTokenizer**: Automatically detects batch_size and sequence_length from ONNX models
2. **Enhanced Pipeline**: Uses `data_processor` parameter for any processor type
3. **Convenience Functions**: Helper functions for common use cases


## Setup and Imports

In [None]:
import sys
from pathlib import Path
import numpy as np

# Add src directory to path
sys.path.append('../src')

from transformers import AutoTokenizer
from optimum.onnxruntime import ORTModelForFeatureExtraction

# Import our enhanced components
from enhanced_pipeline import pipeline, create_pipeline
from onnx_tokenizer import ONNXTokenizer, create_auto_shape_tokenizer

## Load Model and Base Tokenizer

In [None]:
# Setup paths
model_dir = Path("../models/bert-tiny-optimum")

# Load base tokenizer and ONNX model
base_tokenizer = AutoTokenizer.from_pretrained("prajjwal1/bert-tiny")
onnx_model = ORTModelForFeatureExtraction.from_pretrained(model_dir)

print(f"✅ Loaded model from {model_dir}")
print(f"✅ Loaded tokenizer: {base_tokenizer.__class__.__name__}")

## Method 1: Direct ONNXTokenizer Usage (Recommended)

In [None]:
# Create ONNXTokenizer with auto-detection
onnx_tokenizer = ONNXTokenizer(
    tokenizer=base_tokenizer,
    onnx_model=onnx_model  # Auto-detects shapes from model
)

print(f"Auto-detected shapes: {onnx_tokenizer.fixed_batch_size}x{onnx_tokenizer.fixed_sequence_length}")

# Use with enhanced pipeline
pipe = pipeline(
    "feature-extraction",
    model=onnx_model,
    data_processor=onnx_tokenizer  # Generic parameter works for any processor!
)

# Test inference
results = pipe(["Hello world!", "This is a test."])
print(f"Pipeline results shape: {np.array(results).shape}")

## Method 2: Using create_pipeline (Full-Featured)

In [None]:
# Create pipeline with all options
full_pipe = create_pipeline(
    task="feature-extraction",
    model=onnx_model,
    data_processor=onnx_tokenizer,
    device="cpu",
    framework="pt"
)

# Test with different inputs
test_texts = [
    "Short text",
    "This is a longer text that might need truncation or padding based on the model's fixed input size"
]

results = full_pipe(test_texts)
print(f"Full pipeline results shape: {np.array(results).shape}")

## Method 3: Convenience Helper Functions

In [None]:
# Convenience function for auto-detection
def create_onnx_pipeline(task, model_path, tokenizer_name, **kwargs):
    """
    Create a pipeline with ONNX model and auto-detecting tokenizer.
    
    Args:
        task: Pipeline task (e.g., "feature-extraction")
        model_path: Path to ONNX model directory
        tokenizer_name: HuggingFace tokenizer identifier
        **kwargs: Additional pipeline arguments
    
    Returns:
        Pipeline configured for ONNX inference
    """
    # Load components
    tokenizer = AutoTokenizer.from_pretrained(tokenizer_name)
    model = ORTModelForFeatureExtraction.from_pretrained(model_path)
    
    # Create auto-detecting ONNX tokenizer
    onnx_tokenizer = ONNXTokenizer(tokenizer=tokenizer, onnx_model=model)
    
    # Create enhanced pipeline
    return create_pipeline(
        task=task,
        model=model,
        data_processor=onnx_tokenizer,
        **kwargs
    )

# Usage example
convenience_pipe = create_onnx_pipeline(
    task="feature-extraction",
    model_path=model_dir,
    tokenizer_name="prajjwal1/bert-tiny"
)

results = convenience_pipe("Convenience function test")
print(f"Convenience pipeline shape: {np.array(results).shape}")

## Method 4: Manual Shape Override (When Needed)

In [None]:
# Sometimes you want specific shapes regardless of the model
custom_tokenizer = ONNXTokenizer(
    tokenizer=base_tokenizer,
    onnx_model=onnx_model,  # Model provided for reference
    fixed_batch_size=4,     # But override with specific values
    fixed_sequence_length=32
)

print(f"Custom shapes: {custom_tokenizer.fixed_batch_size}x{custom_tokenizer.fixed_sequence_length}")

# Create pipeline with custom tokenizer
custom_pipe = pipeline(
    "feature-extraction",
    model=onnx_model,
    data_processor=custom_tokenizer
)

# Test with custom shapes
results = custom_pipe(["Test 1", "Test 2", "Test 3", "Test 4"])
print(f"Custom pipeline shape: {np.array(results).shape}")

## Comparison: Before vs After

In [None]:
print("=" * 60)
print("BEFORE: Standard Pipeline (What doesn't work)")
print("=" * 60)
print("""
from transformers import pipeline

# ❌ This would fail:
pipe = pipeline('feature-extraction', model=model, data_processor=custom_tokenizer)
# ERROR: pipeline() got an unexpected keyword argument 'data_processor'

# ❌ You'd have to remember different parameter names:
text_pipe = pipeline('text-classification', tokenizer=tokenizer, model=model)
vision_pipe = pipeline('image-classification', image_processor=processor, model=model)
audio_pipe = pipeline('audio-classification', feature_extractor=extractor, model=model)
""")

print("=" * 60)
print("AFTER: Enhanced Pipeline (What now works)")
print("=" * 60)
print("""
from enhanced_pipeline import pipeline

# ✅ This works perfectly:
pipe = pipeline('feature-extraction', model=model, data_processor=custom_tokenizer)

# ✅ Same parameter name for all modalities:
text_pipe = pipeline('text-classification', model=model, data_processor=tokenizer)
vision_pipe = pipeline('image-classification', model=model, data_processor=processor)
audio_pipe = pipeline('audio-classification', model=model, data_processor=extractor)

# ✅ Auto-detection makes it even easier:
onnx_tokenizer = ONNXTokenizer(base_tokenizer, onnx_model=model)  # Auto-detects shapes
pipe = pipeline('feature-extraction', model=model, data_processor=onnx_tokenizer)
""")

## Performance Comparison

In [None]:
import time

# Test data
test_data = ["This is a test sentence for performance comparison."] * 10

# Time the enhanced pipeline
start_time = time.time()
for _ in range(10):
    results = pipe(test_data)
enhanced_time = time.time() - start_time

print(f"Enhanced pipeline (10 batches of 10): {enhanced_time:.3f}s")
print(f"Average per batch: {enhanced_time/10:.3f}s")
print(f"Results shape consistency: {np.array(results).shape}")

# Demonstrate shape consistency
single_result = pipe("Single input")
batch_result = pipe(["Input 1", "Input 2"])
print(f"\nShape consistency:")
print(f"Single input shape: {np.array(single_result).shape}")
print(f"Batch input shape: {np.array(batch_result).shape}")
print(f"✅ Shapes are consistent thanks to ONNX tokenizer!")

## Summary

### ✅ Key Benefits:

1. **Auto-Detection**: No need to manually specify batch_size and sequence_length
2. **Generic Interface**: Same `data_processor` parameter for all modalities  
3. **Shape Consistency**: Handles variable input sizes with fixed ONNX shapes
4. **Drop-in Replacement**: Works as a replacement for standard pipeline
5. **Flexible**: Supports manual override when needed

### 🎯 Usage Patterns:

```python
# Most common usage (recommended)
onnx_tokenizer = ONNXTokenizer(base_tokenizer, onnx_model=model)
pipe = pipeline("feature-extraction", model=model, data_processor=onnx_tokenizer)

# Full-featured usage
pipe = create_pipeline("feature-extraction", model=model, data_processor=onnx_tokenizer, device="cpu")

# Manual override when needed
custom_tokenizer = ONNXTokenizer(base_tokenizer, onnx_model=model, fixed_batch_size=8, fixed_sequence_length=64)
```

This integration makes working with ONNX models as simple as regular HuggingFace pipelines while handling all the shape complexity automatically!