# HuggingFace Hub Integration Demo\n\nThis notebook demonstrates the Smart Hybrid Approach for HuggingFace Hub model integration:\n1. Export BERT-tiny with HTP (Hub metadata is automatically stored)\n2. Load the ONNX model with Optimum (config loaded from Hub metadata)\n3. Run inference without needing local config files

In [None]:
import sys\nfrom pathlib import Path\nimport onnx\nimport torch\n\n# Add project root to path\nproject_root = Path().absolute().parent.parent.parent\nsys.path.insert(0, str(project_root))\n\nprint(f"Project root: {project_root}")

## Step 1: Check the Exported ONNX Model\n\nWe already exported bert-tiny with the CLI. Let's check the Hub metadata:

In [None]:
# Path to the exported model\nonnx_path = project_root / "temp" / "hub-integration-test" / "bert-tiny.onnx"\n\n# Load ONNX model and check metadata\nonnx_model = onnx.load(str(onnx_path))\n\nprint("📦 ONNX Model Metadata:")\nprint("=" * 50)\n\nhub_metadata = {}\nfor prop in onnx_model.metadata_props:\n    if prop.key.startswith('hf_'):\n        hub_metadata[prop.key] = prop.value\n        print(f"{prop.key}: {prop.value}")\n\nprint("\n✅ Hub model detected:", hub_metadata.get('hf_model_type') == 'hub')\nprint(f"📍 Model ID: {hub_metadata.get('hf_hub_id')}")\nprint(f"🔖 Revision: {hub_metadata.get('hf_hub_revision')}")

## Step 2: Load Components from ONNX\n\nUse our new utility to load config and preprocessor from the ONNX metadata:

In [None]:
from modelexport.utils import load_hf_components_from_onnx\n\nprint("🔧 Loading HuggingFace components from ONNX metadata...")\n\ntry:\n    config, preprocessor = load_hf_components_from_onnx(str(onnx_path))\n    \n    print("\n✅ Successfully loaded from Hub!")\n    print(f"Config type: {config.model_type}")\n    print(f"Hidden size: {config.hidden_size}")\n    print(f"Num layers: {config.num_hidden_layers}")\n    \n    if preprocessor:\n        print(f"\nPreprocessor type: {type(preprocessor).__name__}")\n        print(f"Vocab size: {preprocessor.vocab_size if hasattr(preprocessor, 'vocab_size') else 'N/A'}")\nexcept Exception as e:\n    print(f"❌ Error loading from Hub: {e}")\n    print("This might happen if you're offline or the Hub is unavailable.")

## Step 3: Setup for Optimum Inference\n\nSince Optimum expects files in a directory, we'll create a temporary setup:

In [None]:
import tempfile\nimport shutil\nfrom optimum.onnxruntime import ORTModelForFeatureExtraction\n\n# Create temporary directory with required files\nwith tempfile.TemporaryDirectory() as temp_dir:\n    temp_path = Path(temp_dir)\n    \n    # Copy ONNX model\n    shutil.copy(onnx_path, temp_path / "model.onnx")\n    \n    # Save config (loaded from Hub metadata)\n    if 'config' in locals():\n        config.save_pretrained(temp_path)\n        print(f"✅ Config saved to temp directory")\n    \n    # Save tokenizer if available\n    if 'preprocessor' in locals() and preprocessor:\n        preprocessor.save_pretrained(temp_path)\n        print(f"✅ Tokenizer saved to temp directory")\n    \n    # Load with Optimum\n    print("\n🚀 Loading with Optimum ORTModel...")\n    ort_model = ORTModelForFeatureExtraction.from_pretrained(temp_path)\n    \n    print(f"✅ Model loaded: {type(ort_model).__name__}")\n    print(f"Model config: {ort_model.config.model_type}")

## Step 4: Run Inference\n\nTest the model with some sample text:

In [None]:
# Test texts\ntest_texts = [\n    "The Smart Hybrid Approach works perfectly!",\n    "Hub models store metadata, local models copy configs.",\n    "This is a test of the BERT-tiny model with Hub integration."\n]\n\nprint("📝 Test texts:")\nfor i, text in enumerate(test_texts, 1):\n    print(f"{i}. {text}")\n\n# Tokenize if we have a tokenizer\nif 'preprocessor' in locals() and preprocessor:\n    print("\n🔤 Tokenizing...")\n    inputs = preprocessor(\n        test_texts,\n        padding=True,\n        truncation=True,\n        return_tensors="pt"\n    )\n    \n    print(f"Input shape: {inputs['input_ids'].shape}")\n    \n    # Run inference\n    print("\n🎯 Running inference...")\n    with torch.no_grad():\n        outputs = ort_model(**inputs)\n    \n    if hasattr(outputs, 'last_hidden_state'):\n        print(f"✅ Output shape: {outputs.last_hidden_state.shape}")\n        print(f"Output type: {type(outputs.last_hidden_state)}")\n        \n        # Show first few values\n        print(f"\nFirst output values (sample):")\n        print(outputs.last_hidden_state[0, 0, :5].numpy())\nelse:\n    print("⚠️ No tokenizer available. Skipping inference test.")

## Step 5: Alternative - Use Our Optimum Loader Utility\n\nWe also provide a convenient utility function:

In [None]:
from modelexport.utils import load_optimum_model\n\nprint("🔧 Using load_optimum_model utility...")\n\ntry:\n    model, tokenizer = load_optimum_model(str(onnx_path))\n    \n    print("✅ Model and tokenizer loaded!")\n    print(f"Model type: {type(model).__name__}")\n    print(f"Tokenizer type: {type(tokenizer).__name__ if tokenizer else 'None'}")\n    \n    if tokenizer:\n        # Quick test\n        test_input = tokenizer("Hello world!", return_tensors="pt")\n        with torch.no_grad():\n            test_output = model(**test_input)\n        print(f"\n✅ Test inference successful!")\n        print(f"Output shape: {test_output.last_hidden_state.shape if hasattr(test_output, 'last_hidden_state') else 'N/A'}")\n        \nexcept Exception as e:\n    print(f"❌ Error: {e}")\n    print("This is expected if running offline or without network access.")

## Summary\n\n### What We Demonstrated:\n\n1. **Hub Metadata Storage**: The ONNX model contains HuggingFace Hub metadata\n2. **Automatic Config Loading**: Config and tokenizer loaded from Hub using metadata\n3. **No Local Files Needed**: Hub models don't require config.json to be stored locally\n4. **Optimum Compatibility**: Seamless integration with Optimum ORTModel\n\n### Smart Hybrid Approach Benefits:\n\n- **Hub Models**: Lightweight - only metadata stored (model ID, revision, etc.)\n- **Local Models**: Full support - configs copied alongside ONNX\n- **Backward Compatible**: Works with existing models\n- **Network Optional**: Can work offline with cached Hub models\n\n### Key Implementation Files:\n\n- `modelexport/utils/hub_utils.py`: Hub detection and metadata injection\n- `modelexport/utils/optimum_loader.py`: Optimum integration utilities\n- `modelexport/strategies/htp/htp_exporter.py`: Integration in HTP export