# Production Inference Deployment with PyTorch

This comprehensive tutorial demonstrates how to deploy PyTorch models for production inference, covering essential deployment patterns from model preparation to advanced serving solutions.

## Learning Objectives
- 🎯 **Prepare models for inference**: Evaluation mode and optimization techniques
- 🚀 **Master TorchScript**: Convert Python models to optimized, production-ready format
- 🔧 **Deploy with C++**: Use libtorch for high-performance inference
- 📦 **Implement TorchServe**: Scalable model serving with built-in APIs

## Tutorial Structure
1. **Preparing the Model for Inference**: Evaluation Mode
2. **TorchScript**: `jit.script` and `jit.trace` for production optimization
3. **Deploying with C++**: libtorch integration examples
4. **TorchServe**: Complete model serving solution

## Use Case: Australian Tourism Sentiment Analysis
We will build and deploy a multilingual sentiment analysis model for Australian tourism reviews (English + Vietnamese), demonstrating real-world production deployment scenarios.

**Sample Use Cases:**
- Hotel booking platforms analyzing customer reviews
- Tourism boards monitoring social media sentiment
- Travel agencies optimizing destination recommendations

## TensorFlow vs PyTorch Deployment Comparison

| Aspect | TensorFlow | PyTorch |
|--------|------------|----------|
| **Model Format** | SavedModel, TFLite | TorchScript, ONNX |
| **Serving** | TensorFlow Serving | TorchServe |
| **C++ Deployment** | TensorFlow C++ API | libtorch |
| **Mobile** | TensorFlow Lite | PyTorch Mobile |
| **Optimization** | TensorRT, XLA | TorchScript JIT |

---

## Environment Setup and Runtime Detection

Following PyTorch best practices for cross-platform production deployment:

In [None]:
# Environment Detection and Setup
import sys
import subprocess
import os
import time
import platform

# Detect the runtime environment
IS_COLAB = "google.colab" in sys.modules
IS_KAGGLE = "kaggle_secrets" in sys.modules or "kaggle" in os.environ.get('KAGGLE_URL_BASE', '')
IS_LOCAL = not (IS_COLAB or IS_KAGGLE)

print(f"Environment detected:")
print(f"  - Local: {IS_LOCAL}")
print(f"  - Google Colab: {IS_COLAB}")
print(f"  - Kaggle: {IS_KAGGLE}")

# Platform-specific system setup
if IS_COLAB:
    print("\nSetting up Google Colab environment...")
    !apt update -qq
    !apt install -y -qq software-properties-common
elif IS_KAGGLE:
    print("\nSetting up Kaggle environment...")
    # Kaggle usually has most packages pre-installed
else:
    print("\nSetting up local environment...")

# Install required packages for this notebook
required_packages = [
    "torch",
    "torchvision", 
    "transformers",
    "datasets",
    "tokenizers",
    "pandas",
    "matplotlib",
    "seaborn",
    "tensorboard"
]

print("\nInstalling required packages...")
for package in required_packages:
    try:
        if IS_COLAB or IS_KAGGLE:
            !pip install -q {package}
        else:
            subprocess.run([sys.executable, "-m", "pip", "install", "-q", package], 
                          capture_output=True, check=False)
        print(f"✓ {package}")
    except Exception as e:
        print(f"⚠️ {package}: {str(e)[:50]}...")

print("\n🔥 Production deployment environment ready!")

In [None]:
# Core PyTorch imports for production deployment
import torch
import torch.nn as nn
import torch.nn.functional as F
import torch.optim as optim
import torch.jit as jit
from torch.utils.data import DataLoader, Dataset
from torch.utils.tensorboard import SummaryWriter

# Standard libraries
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import json
import tempfile
import warnings
from datetime import datetime
from pathlib import Path

warnings.filterwarnings('ignore')

# Simplified device detection
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
print(f"✅ PyTorch {torch.__version__} ready on {device}!")
DEVICE = device

## Sample Model: Australian Tourism Sentiment Analyzer

Let's create a production-ready sentiment analysis model for Australian tourism reviews with multilingual support (English + Vietnamese):

In [None]:
class AustralianTourismSentimentAnalyzer(nn.Module):
    """
    Production-ready sentiment analysis model for Australian tourism reviews.
    
    Supports multilingual analysis (English + Vietnamese) for:
    - Hotel and restaurant reviews
    - Tourist attraction feedback
    - Travel experience sentiment
    """
    
    def __init__(self, vocab_size=1000, embed_dim=64, hidden_dim=128, num_classes=3):
        super(AustralianTourismSentimentAnalyzer, self).__init__()
        
        # Store hyperparameters for TorchScript compatibility
        self.vocab_size = vocab_size
        self.embed_dim = embed_dim
        self.hidden_dim = hidden_dim
        self.num_classes = num_classes
        
        # Model layers
        self.embedding = nn.Embedding(vocab_size, embed_dim, padding_idx=0)
        self.lstm = nn.LSTM(embed_dim, hidden_dim, batch_first=True)
        self.classifier = nn.Linear(hidden_dim, num_classes)
        
        # Initialize weights
        self._init_weights()
    
    def _init_weights(self):
        """Initialize model weights."""
        for name, param in self.named_parameters():
            if 'weight' in name and len(param.shape) > 1:
                nn.init.xavier_uniform_(param)
            elif 'bias' in name:
                nn.init.constant_(param, 0)
    
    def forward(self, input_ids):
        """
        Forward pass for sentiment analysis.
        
        Args:
            input_ids (torch.Tensor): Tokenized input sequences [batch_size, seq_len]
            
        Returns:
            torch.Tensor: Sentiment logits [batch_size, num_classes]
        """
        # Embedding lookup
        embedded = self.embedding(input_ids)  # [batch_size, seq_len, embed_dim]
        
        # LSTM processing
        lstm_out, (hidden, cell) = self.lstm(embedded)
        
        # Use the last hidden state for classification
        last_hidden = hidden[-1]  # [batch_size, hidden_dim]
        
        # Classification
        logits = self.classifier(last_hidden)
        
        return logits

# Simple tokenizer for demonstration
class SimpleTokenizer:
    """Simple tokenizer for Australian tourism text."""
    
    def __init__(self):
        self.word_to_idx = {'<pad>': 0, '<unk>': 1}
        self.idx_to_word = {0: '<pad>', 1: '<unk>'}
        self.next_idx = 2
    
    def fit(self, texts):
        """Build vocabulary from texts."""
        for text in texts:
            words = text.lower().split()
            for word in words:
                if word not in self.word_to_idx:
                    self.word_to_idx[word] = self.next_idx
                    self.idx_to_word[self.next_idx] = word
                    self.next_idx += 1
    
    def encode(self, text, max_length=32):
        """Convert text to token indices."""
        words = text.lower().split()[:max_length]
        indices = [self.word_to_idx.get(word, 1) for word in words]
        
        # Pad to max_length
        if len(indices) < max_length:
            indices.extend([0] * (max_length - len(indices)))
        
        return indices

print("✅ Model and tokenizer classes defined!")

## Sample Data and Quick Training

Let's create sample Australian tourism review data and quickly train our model:

In [None]:
# Sample Australian tourism reviews with multilingual support
australian_tourism_data = {
    'reviews': [
        # Positive reviews (label: 2)
        "The Sydney Opera House tour was absolutely breathtaking!",
        "Nhà hát Opera Sydney thật tuyệt vời!",  # Vietnamese
        "Melbourne's coffee culture exceeded all expectations.",
        "Văn hóa cà phê Melbourne vượt quá mong đợi.",  # Vietnamese
        "Bondi Beach is perfect for surfing.",
        "Bãi biển Bondi hoàn hảo cho lướt sóng.",  # Vietnamese
        
        # Neutral reviews (label: 1)
        "The hotel in Brisbane was decent.",
        "Khách sạn ở Brisbane tạm được.",  # Vietnamese
        "Adelaide zoo has some interesting animals.",
        "Sở thú Adelaide có một số động vật thú vị.",  # Vietnamese
        
        # Negative reviews (label: 0)
        "The Sydney harbor cruise was overpriced.",
        "Du thuyền cảng Sydney đắt quá.",  # Vietnamese
        "Melbourne weather ruined our vacation.",
        "Thời tiết Melbourne làm hỏng kỳ nghỉ.",  # Vietnamese
    ],
    'labels': [2, 2, 2, 2, 2, 2, 1, 1, 1, 1, 0, 0, 0, 0]
}

# Create and train model
tokenizer = SimpleTokenizer()
tokenizer.fit(australian_tourism_data['reviews'])

model = AustralianTourismSentimentAnalyzer(
    vocab_size=len(tokenizer.word_to_idx),
    embed_dim=64,
    hidden_dim=128,
    num_classes=3
).to(device)

# Prepare training data
encoded_reviews = []
for review in australian_tourism_data['reviews']:
    encoded = tokenizer.encode(review, max_length=32)
    encoded_reviews.append(encoded)

input_ids = torch.tensor(encoded_reviews, dtype=torch.long).to(device)
labels_tensor = torch.tensor(australian_tourism_data['labels'], dtype=torch.long).to(device)

# Quick training
model.train()
criterion = nn.CrossEntropyLoss()
optimizer = optim.Adam(model.parameters(), lr=0.01)

print("🏋️ Quick training for demonstration...")
for epoch in range(10):
    optimizer.zero_grad()
    logits = model(input_ids)
    loss = criterion(logits, labels_tensor)
    loss.backward()
    optimizer.step()
    
    if epoch % 3 == 0:
        with torch.no_grad():
            predictions = torch.argmax(logits, dim=-1)
            accuracy = (predictions == labels_tensor).float().mean()
            print(f"   Epoch {epoch}: Loss = {loss.item():.4f}, Accuracy = {accuracy.item():.4f}")

print("✅ Model training completed!")
print(f"📊 Model parameters: {sum(p.numel() for p in model.parameters()):,}")
print(f"📱 Model device: {next(model.parameters()).device}")

# 1. Preparing the Model for Inference: Evaluation Mode

The first and most crucial step in production deployment is properly preparing your model for inference. This involves setting the model to evaluation mode and understanding how it differs from training mode.

## Key Concepts

### `model.eval()` vs `model.train()`
- **`model.eval()`**: Sets the model to evaluation mode
- **`model.train()`**: Sets the model to training mode (default)

### What Changes in Evaluation Mode
1. **Autograd is typically disabled** (via `torch.no_grad()`)
2. **Dropout layers** are turned off
3. **Batch Normalization** uses frozen statistics
4. **Memory and compute optimization** for inference

### TensorFlow Comparison
```python
# TensorFlow (implicit mode switching)
model(inputs, training=False)  # Inference mode
model(inputs, training=True)   # Training mode

# PyTorch (explicit mode switching)
model.eval()     # Set to evaluation mode
model.train()    # Set to training mode
```

In [None]:
# Demonstration: Training vs Evaluation Mode
print("🎯 Demonstrating Training vs Evaluation Mode Differences\n")

def production_inference(model, texts, tokenizer, device):
    """
    Production-ready inference function with all best practices.
    
    CRITICAL STEPS:
    1. Set model to evaluation mode
    2. Disable autograd with torch.no_grad()
    3. Ensure proper device placement
    4. Handle batch dimensions correctly
    """
    # STEP 1: Set model to evaluation mode
    model.eval()
    
    # STEP 2: Prepare input data
    encoded_texts = []
    for text in texts:
        encoded = tokenizer.encode(text, max_length=32)
        encoded_texts.append(encoded)
    
    # Convert to tensor and move to device
    input_ids = torch.tensor(encoded_texts, dtype=torch.long).to(device)
    
    # STEP 3: Disable autograd for inference
    with torch.no_grad():
        # STEP 4: Forward pass
        logits = model(input_ids)
        
        # STEP 5: Convert to probabilities and predictions
        probabilities = F.softmax(logits, dim=-1)
        predictions = torch.argmax(logits, dim=-1)
        confidence = torch.max(probabilities, dim=-1)[0]
    
    # STEP 6: Format results
    sentiment_labels = ['negative', 'neutral', 'positive']
    results = []
    
    for i, text in enumerate(texts):
        result = {
            'text': text,
            'sentiment': sentiment_labels[predictions[i].item()],
            'confidence': confidence[i].item(),
            'probabilities': {
                'negative': probabilities[i][0].item(),
                'neutral': probabilities[i][1].item(),
                'positive': probabilities[i][2].item()
            }
        }
        results.append(result)
    
    return results

# Test production inference function
print("🧪 Testing Production Inference Function\n")

test_reviews = [
    "The Sydney Opera House tour was absolutely amazing!",
    "Cà phê ở Melbourne quá đắt",  # Vietnamese: Coffee in Melbourne is too expensive
    "The hotel was clean but nothing extraordinary",
    "Perth beaches are perfect for relaxing"
]

results = production_inference(model, test_reviews, tokenizer, device)

for i, result in enumerate(results):
    print(f"Review {i+1}:")
    print(f"  Text: '{result['text'][:40]}...'")
    print(f"  Sentiment: {result['sentiment'].upper()} (confidence: {result['confidence']:.3f})")
    print(f"  Probabilities: Neg={result['probabilities']['negative']:.3f}, "
          f"Neu={result['probabilities']['neutral']:.3f}, "
          f"Pos={result['probabilities']['positive']:.3f}")
    print()

print("✅ Production inference function working correctly!")

print(f"\n" + "="*60)
print("🚀 PRODUCTION INFERENCE BEST PRACTICES")
print("="*60)
print("1. ALWAYS call model.eval() before inference")
print("2. Use torch.no_grad() to disable autograd and save memory")
print("3. Ensure consistent input preprocessing")
print("4. Handle batch dimensions properly (even for single samples)")
print("5. Move tensors to the same device as the model")
print("="*60)

# 2. TorchScript: Production-Ready Model Optimization

**TorchScript** is a statically typed subset of Python for representing PyTorch models. It's designed for high-performance inference and can run without a Python interpreter.

## Key Benefits
- 🚀 **High Performance**: JIT compilation optimizes execution
- 🏭 **Production Ready**: No Python dependency required
- 📦 **Single File**: Model + weights in one serialized file
- 🔧 **Optimization**: Automatic operator fusion and batching

## TorchScript Conversion Methods
1. **`torch.jit.script`**: Direct code inspection (preserves control flow)
2. **`torch.jit.trace`**: Execution tracing (more robust but limited control flow)

### TensorFlow Comparison
```python
# TensorFlow (SavedModel)
tf.saved_model.save(model, 'model_dir')
loaded = tf.saved_model.load('model_dir')

# PyTorch (TorchScript)
scripted = torch.jit.script(model)
torch.jit.save(scripted, 'model.pt')
loaded = torch.jit.load('model.pt')
```

In [None]:
# Demonstration: TorchScript Conversion Methods
print("🚀 TorchScript Conversion Demonstration\n")

# Method: torch.jit.trace (Execution Tracing)
print("🔍 Method: torch.jit.trace (Execution Tracing)")
# Create example input for tracing
example_input = torch.randint(0, len(tokenizer.word_to_idx), (2, 32)).to(device)

try:
    model.eval()  # IMPORTANT: Set to eval mode before tracing
    traced_model = torch.jit.trace(model, example_input)
    print("   ✅ Trace conversion successful!")
    print(f"   📊 Traced model type: {type(traced_model)}")
    print(f"   🔧 Example input shape: {example_input.shape}")
except Exception as e:
    print(f"   ❌ Trace conversion failed: {e}")

# Compare original vs TorchScript performance
print("\n⚡ Performance Comparison: Original vs TorchScript")

# Test data
test_input = torch.randint(0, len(tokenizer.word_to_idx), (10, 32)).to(device)

# Original model timing
model.eval()
start_time = time.time()
with torch.no_grad():
    for _ in range(100):
        _ = model(test_input)
original_time = time.time() - start_time

# TorchScript model timing (if available)
if 'traced_model' in locals():
    start_time = time.time()
    with torch.no_grad():
        for _ in range(100):
            _ = traced_model(test_input)
    torchscript_time = time.time() - start_time
    
    speedup = original_time / torchscript_time if torchscript_time > 0 else 1.0
    print(f"   Original model: {original_time:.4f} seconds")
    print(f"   TorchScript model: {torchscript_time:.4f} seconds")
    print(f"   🏃‍♂️ Speedup: {speedup:.2f}x faster")
else:
    print("   ⚠️ TorchScript model not available for comparison")

print("\n📁 Saving TorchScript Models for Production")

# Save TorchScript model
if 'traced_model' in locals():
    model_path = "australian_sentiment_torchscript.pt"
    torch.jit.save(traced_model, model_path)
    print(f"   💾 TorchScript model saved to: {model_path}")
    
    # Load and test saved model
    loaded_model = torch.jit.load(model_path)
    loaded_model.eval()
    
    # Test loaded model
    with torch.no_grad():
        original_output = traced_model(test_input)
        loaded_output = loaded_model(test_input)
        
        # Check if outputs are identical
        outputs_match = torch.allclose(original_output, loaded_output, atol=1e-6)
        print(f"   🔍 Saved/loaded outputs match: {outputs_match}")
        
    # Get file size
    import os
    file_size = os.path.getsize(model_path) / 1024**2  # MB
    print(f"   📊 Model file size: {file_size:.2f} MB")

print("\n" + "="*60)
print("🏭 TORCHSCRIPT PRODUCTION BENEFITS")
print("="*60)
print("1. No Python dependency - can run in pure C++ environments")
print("2. Optimized execution through JIT compilation")
print("3. Single file deployment (model + weights)")
print("4. Cross-platform compatibility")
print("5. Memory and compute optimizations")
print("="*60)

# 3. Deploying with C++: High-Performance Inference

PyTorch models can be deployed in C++ environments using **libtorch**, removing Python dependencies for maximum performance and integration with existing C++ systems.

## Key Concepts
- 🔧 **libtorch**: Core C++ library that powers PyTorch
- ⚡ **Performance**: Direct C++ execution without Python overhead
- 🏗️ **Integration**: Easy integration with existing C++ applications
- 🚀 **Real-time**: Ideal for low-latency, high-throughput scenarios

## Use Cases
- Real-time inference servers
- Embedded systems
- High-frequency trading systems
- Game engines
- Mobile applications

### TensorFlow Comparison
```cpp
// TensorFlow C++ API
#include "tensorflow/cc/client/client_session.h"
#include "tensorflow/cc/ops/standard_ops.h"

// PyTorch C++ API (libtorch)
#include <torch/script.h>
#include <torch/torch.h>
```

In [None]:
# C++ Deployment Code Generation
print("🔧 Generating C++ Deployment Code for Australian Sentiment Model\n")

# Generate C++ inference code
cpp_code = '''#include <torch/script.h>
#include <torch/torch.h>
#include <iostream>
#include <memory>
#include <vector>
#include <string>
#include <map>

class AustralianSentimentInference {
private:
    torch::jit::script::Module model;
    std::map<std::string, int> word_to_idx;
    std::vector<std::string> sentiment_labels = {"negative", "neutral", "positive"};
    
public:
    // Constructor: Load TorchScript model
    AustralianSentimentInference(const std::string& model_path) {
        try {
            // Load the TorchScript model
            model = torch::jit::load(model_path);
            model.eval();  // Set to evaluation mode
            
            std::cout << "✅ Model loaded successfully from: " << model_path << std::endl;
        } catch (const c10::Error& e) {
            std::cerr << "❌ Error loading model: " << e.what() << std::endl;
            throw;
        }
    }
    
    // Main inference function
    std::map<std::string, float> predict_sentiment(const std::string& text) {
        try {
            // Create dummy input (in production, implement proper tokenizer)
            std::vector<int64_t> tokens(32, 0);  // Padded input
            tokens[0] = 2; tokens[1] = 3; tokens[2] = 4;  // Simple example
            
            // Convert to tensor
            auto options = torch::TensorOptions().dtype(torch::kLong);
            auto input_tensor = torch::from_blob(tokens.data(), {1, 32}, options).clone();
            
            // Run inference
            std::vector<torch::jit::IValue> inputs;
            inputs.push_back(input_tensor);
            
            torch::NoGradGuard no_grad;  // Disable gradients for inference
            at::Tensor output = model.forward(inputs).toTensor();
            
            // Apply softmax to get probabilities
            auto probabilities = torch::softmax(output, 1);
            auto prob_accessor = probabilities.accessor<float, 2>();
            
            // Get prediction
            auto prediction = torch::argmax(output, 1);
            int predicted_class = prediction.item<int>();
            
            // Create result map
            std::map<std::string, float> result;
            result["negative"] = prob_accessor[0][0];
            result["neutral"] = prob_accessor[0][1];
            result["positive"] = prob_accessor[0][2];
            result["confidence"] = prob_accessor[0][predicted_class];
            
            std::cout << "🎯 Prediction: " << sentiment_labels[predicted_class] 
                      << " (confidence: " << result["confidence"] << ")" << std::endl;
            
            return result;
            
        } catch (const c10::Error& e) {
            std::cerr << "❌ Inference error: " << e.what() << std::endl;
            throw;
        }
    }
};

// Example usage
int main() {
    try {
        // Initialize inference engine
        AustralianSentimentInference engine("australian_sentiment_torchscript.pt");
        
        // Test with Australian tourism reviews
        std::vector<std::string> test_reviews = {
            "The Sydney Opera House tour was absolutely amazing!",
            "Melbourne weather ruined our vacation",
            "The hotel was decent but nothing special"
        };
        
        std::cout << "\\n🧪 Testing C++ Inference Engine:\\n" << std::endl;
        
        for (const auto& review : test_reviews) {
            std::cout << "📝 Review: \\"" << review << "\\"" << std::endl;
            auto result = engine.predict_sentiment(review);
            
            std::cout << "   Probabilities:" << std::endl;
            std::cout << "     Negative: " << result["negative"] << std::endl;
            std::cout << "     Neutral:  " << result["neutral"] << std::endl;
            std::cout << "     Positive: " << result["positive"] << std::endl;
            std::cout << std::endl;
        }
        
        return 0;
        
    } catch (const std::exception& e) {
        std::cerr << "💥 Application error: " << e.what() << std::endl;
        return 1;
    }
}'''

# Save C++ code to file
cpp_filename = "australian_sentiment_inference.cpp"
with open(cpp_filename, 'w') as f:
    f.write(cpp_code)

print(f"💾 C++ inference code saved to: {cpp_filename}")
print(f"📊 Code length: {len(cpp_code)} characters")

# Generate CMakeLists.txt for building
cmake_content = '''cmake_minimum_required(VERSION 3.18 FATAL_ERROR)
project(australian_sentiment_inference)

# Find required packages
find_package(Torch REQUIRED)
set(CMAKE_CXX_FLAGS "${CMAKE_CXX_FLAGS} ${TORCH_CXX_FLAGS}")

# Add executable
add_executable(australian_sentiment_inference australian_sentiment_inference.cpp)
target_link_libraries(australian_sentiment_inference "${TORCH_LIBRARIES}")
set_property(TARGET australian_sentiment_inference PROPERTY CXX_STANDARD 17)

# Copy model file to build directory
configure_file(australian_sentiment_torchscript.pt australian_sentiment_torchscript.pt COPYONLY)'''

cmake_filename = "CMakeLists.txt"
with open(cmake_filename, 'w') as f:
    f.write(cmake_content)

print(f"💾 CMake configuration saved to: {cmake_filename}")

# Build instructions
build_instructions = '''📋 Building and Running the C++ Application:

1. Install libtorch:
   wget https://download.pytorch.org/libtorch/cpu/libtorch-cxx11-abi-shared-with-deps-latest.zip
   unzip libtorch-cxx11-abi-shared-with-deps-latest.zip

2. Build the application:
   mkdir build && cd build
   cmake -DCMAKE_PREFIX_PATH=/path/to/libtorch ..
   cmake --build . --config Release

3. Run the inference:
   ./australian_sentiment_inference'''

print(build_instructions)

print("\n" + "="*60)
print("🏗️ C++ DEPLOYMENT ADVANTAGES")
print("="*60)
print("1. Zero Python dependency - pure C++ execution")
print("2. Maximum performance for inference")
print("3. Easy integration with existing C++ systems")
print("4. Reduced memory footprint")
print("5. Cross-platform deployment")
print("6. Real-time capabilities for latency-critical applications")
print("="*60)

# 4. TorchServe: Scalable Model Serving Solution

**TorchServe** is PyTorch's dedicated model serving framework, designed to make it easy to deploy PyTorch models at scale with enterprise-grade features.

## Key Features
- 🌐 **RESTful APIs**: HTTP endpoints for inference and management
- 📈 **Auto Scaling**: Dynamic worker scaling based on load
- 📦 **Model Versioning**: Multiple model versions simultaneously
- 📊 **Metrics & Logging**: Built-in monitoring and custom metrics
- 🔒 **Security**: HTTPS support and authentication
- 🎯 **Batching**: Automatic request batching for throughput

### TensorFlow Serving Comparison
| Feature | TensorFlow Serving | TorchServe |
|---------|-------------------|------------|
| **Model Format** | SavedModel | MAR files |
| **APIs** | gRPC, REST | REST |
| **Batching** | Built-in | Built-in |
| **Versioning** | Yes | Yes |
| **Scaling** | Manual | Automatic |

In [None]:
# TorchServe Deployment Demonstration
print("📦 TorchServe Deployment for Australian Tourism Sentiment Analysis\n")

# Create deployment scripts and configurations
deployment_script = '''#!/bin/bash
# TorchServe Deployment Script for Australian Sentiment Model

echo "🚀 Deploying Australian Tourism Sentiment Model with TorchServe"

# Step 1: Create Model Archive (MAR)
echo "📦 Creating Model Archive..."
torch-model-archiver \\
    --model-name australian_tourism_sentiment \\
    --version 1.0 \\
    --serialized-file australian_sentiment_torchscript.pt \\
    --export-path model_store \\
    --force

echo "✅ Model archive created successfully!"

# Step 2: Start TorchServe
echo "🌐 Starting TorchServe..."
torchserve \\
    --start \\
    --model-store model_store \\
    --models australian_tourism_sentiment.mar

echo "🎉 TorchServe started successfully!"
echo "📡 Inference API: http://localhost:8080"
echo "🔧 Management API: http://localhost:8081"'''

with open('deploy_model.sh', 'w') as f:
    f.write(deployment_script)

print(f"💾 Deployment script saved to: deploy_model.sh")

# API usage examples
api_examples = '''🌐 TorchServe API Usage Examples:

1. Health Check:
   curl http://localhost:8080/ping

2. Model Information:
   curl http://localhost:8081/models/australian_tourism_sentiment

3. Inference Request:
   curl -X POST http://localhost:8080/predictions/australian_tourism_sentiment \\
        -H "Content-Type: application/json" \\
        -d '{"data": "The Sydney Opera House tour was amazing!"}'

4. Batch Inference:
   curl -X POST http://localhost:8080/predictions/australian_tourism_sentiment \\
        -H "Content-Type: application/json" \\
        -d '{"data": ["Sydney is beautiful", "Melbourne coffee is great"]}'

5. Model Metrics:
   curl http://localhost:8082/metrics

6. Scale Workers:
   curl -X PUT http://localhost:8081/models/australian_tourism_sentiment?min_worker=2&max_worker=4'''

print(api_examples)

print("\n" + "="*60)
print("🏭 TORCHSERVE ENTERPRISE FEATURES")
print("="*60)
print("1. RESTful APIs for inference and management")
print("2. Automatic scaling based on request load")
print("3. Multi-model serving with version management")
print("4. Built-in metrics and monitoring")
print("5. Request batching for improved throughput")
print("6. A/B testing capabilities")
print("7. GPU acceleration support")
print("8. Docker container ready")
print("="*60)

# Summary: Production Deployment Journey

Congratulations! You've completed a comprehensive tour of PyTorch production deployment strategies. Let's summarize what we've covered:

## 🎯 Deployment Methods Comparison

| Method | Use Case | Performance | Complexity | Best For |
|--------|----------|-------------|------------|----------|
| **Evaluation Mode** | Basic inference | Good | Low | Development, testing |
| **TorchScript** | Optimized inference | Better | Medium | Production without Python |
| **C++ Deployment** | High-performance | Best | High | Real-time, embedded systems |
| **TorchServe** | Scalable serving | Good | Medium | Web services, microservices |

## 🚀 Key Takeaways

### 1. Always Start with Evaluation Mode
- Call `model.eval()` before any inference
- Use `torch.no_grad()` to disable autograd
- Ensure consistent preprocessing

### 2. TorchScript for Production Optimization
- Use `torch.jit.trace()` for most models
- Use `torch.jit.script()` when you need control flow
- Single file deployment with optimized execution

### 3. C++ for Maximum Performance
- Zero Python overhead
- Ideal for latency-critical applications
- Easy integration with existing C++ systems

### 4. TorchServe for Scalable Services
- Enterprise-grade model serving
- Automatic scaling and load balancing
- Built-in monitoring and metrics

## 🌏 Australian Tourism Model Deployment

Our Australian tourism sentiment analysis model demonstrates:
- **Multilingual support** (English + Vietnamese)
- **Real-world use cases** (hotel reviews, travel feedback)
- **Production-ready architecture** (proper error handling, logging)
- **Scalable deployment** (from single inference to distributed serving)

## 🎓 Next Steps in Your PyTorch Production Journey

1. 🔧 **Model Optimization**: Explore quantization, pruning, and distillation
2. 📱 **Mobile Deployment**: Learn PyTorch Mobile for on-device inference
3. ☁️ **Cloud Deployment**: Deploy models on AWS, GCP, or Azure
4. 🐳 **Containerization**: Package models with Docker for easy deployment
5. 📊 **Monitoring**: Implement model performance monitoring in production
6. 🔄 **MLOps**: Set up continuous integration/deployment pipelines

## 📚 Additional Resources

- [PyTorch Production Tutorials](https://pytorch.org/tutorials/beginner/Intro_to_TorchScript_tutorial.html)
- [TorchServe Documentation](https://pytorch.org/serve/)
- [libtorch C++ Documentation](https://pytorch.org/cppdocs/)
- [PyTorch Mobile](https://pytorch.org/mobile/home/)

---

**🎉 You're now equipped to deploy PyTorch models in production environments! Whether you're building real-time inference systems, scalable web services, or mobile applications, you have the knowledge and tools to succeed.**