# Saving and Loading DSPy Programs

This tutorial demonstrates how to save and load DSPy programs, including modules, optimized parameters, and complete pipelines.

## What You'll Learn:
- How to save and load DSPy modules
- Persisting optimized parameters
- Version management for DSPy programs
- Best practices for model serialization

In [None]:
# Install required packages
import sys
import subprocess

def install_package(package):
    subprocess.check_call([sys.executable, "-m", "pip", "install", package])

try:
    import dspy
except ImportError:
    install_package("dspy")
    import dspy

import os
import json
import pickle
from pathlib import Path
from typing import Any, Dict
import datetime

## Setup and Configuration

In [None]:
# Configure DSPy
lm = dspy.LM('openai/gpt-4o-mini', api_key=os.getenv('OPENAI_API_KEY'))
dspy.configure(lm=lm)

# Create a directory for saving models
models_dir = Path("saved_models")
models_dir.mkdir(exist_ok=True)

print(f"DSPy configured successfully!")
print(f"Models will be saved to: {models_dir.absolute()}")

## Creating a Sample DSPy Program

Let's create a complete DSPy program that we can save and load.

In [None]:
class QuestionAnswerSignature(dspy.Signature):
    """Answer a question with a comprehensive response."""
    
    question: str = dspy.InputField(desc="The question to answer")
    context: str = dspy.InputField(desc="Relevant context information")
    answer: str = dspy.OutputField(desc="A comprehensive answer")
    confidence: float = dspy.OutputField(desc="Confidence score (0.0-1.0)")

class FactCheckSignature(dspy.Signature):
    """Verify the accuracy of an answer."""
    
    question: str = dspy.InputField(desc="The original question")
    answer: str = dspy.InputField(desc="The answer to verify")
    is_accurate: bool = dspy.OutputField(desc="Whether the answer is accurate")
    explanation: str = dspy.OutputField(desc="Explanation of the verification")

class QuestionAnsweringSystem(dspy.Module):
    """A complete question answering system with fact-checking."""
    
    def __init__(self, use_fact_check: bool = True):
        super().__init__()
        self.use_fact_check = use_fact_check
        self.answerer = dspy.ChainOfThought(QuestionAnswerSignature)
        if use_fact_check:
            self.fact_checker = dspy.ChainOfThought(FactCheckSignature)
        
        # Store configuration
        self.config = {
            'use_fact_check': use_fact_check,
            'created_at': datetime.datetime.now().isoformat(),
            'version': '1.0'
        }
    
    def forward(self, question: str, context: str = "") -> dspy.Prediction:
        # Generate initial answer
        answer_result = self.answerer(
            question=question,
            context=context
        )
        
        # Fact check if enabled
        if self.use_fact_check:
            fact_check_result = self.fact_checker(
                question=question,
                answer=answer_result.answer
            )
            
            return dspy.Prediction(
                answer=answer_result.answer,
                confidence=float(answer_result.confidence),
                is_accurate=fact_check_result.is_accurate,
                fact_check_explanation=fact_check_result.explanation
            )
        else:
            return dspy.Prediction(
                answer=answer_result.answer,
                confidence=float(answer_result.confidence)
            )

# Create and test the system
qa_system = QuestionAnsweringSystem(use_fact_check=True)

test_question = "What is the capital of France?"
test_context = "France is a country in Western Europe."

result = qa_system(question=test_question, context=test_context)
print(f"Question: {test_question}")
print(f"Answer: {result.answer}")
print(f"Confidence: {result.confidence}")
print(f"Fact Check: {result.is_accurate}")
print(f"Explanation: {result.fact_check_explanation}")

## Basic Saving and Loading

DSPy provides built-in methods for saving and loading modules.

In [None]:
# Save the QA system using DSPy's built-in save method
basic_save_path = models_dir / "qa_system_basic.json"

# Save the module
qa_system.save(str(basic_save_path))
print(f"Model saved to: {basic_save_path}")

# Load the module
loaded_qa_system = QuestionAnsweringSystem(use_fact_check=True)
loaded_qa_system.load(str(basic_save_path))

# Test the loaded system
loaded_result = loaded_qa_system(
    question="What is the largest planet in our solar system?",
    context="The solar system contains eight planets."
)

print(f"\nLoaded model test:")
print(f"Answer: {loaded_result.answer}")
print(f"Confidence: {loaded_result.confidence}")
print(f"Model loaded successfully!")

## Advanced Saving with Metadata

For production use, you'll want to save additional metadata along with your models.

In [None]:
class ModelManager:
    """Manager for saving and loading DSPy models with metadata."""
    
    def __init__(self, base_dir: str = "saved_models"):
        self.base_dir = Path(base_dir)
        self.base_dir.mkdir(exist_ok=True)
    
    def save_model(self, model: dspy.Module, model_name: str, 
                   version: str = "1.0", metadata: Dict[str, Any] = None) -> str:
        """Save a DSPy model with metadata."""
        
        # Create model directory
        model_dir = self.base_dir / model_name / version
        model_dir.mkdir(parents=True, exist_ok=True)
        
        # Save the model
        model_path = model_dir / "model.json"
        model.save(str(model_path))
        
        # Prepare metadata
        full_metadata = {
            'model_name': model_name,
            'version': version,
            'saved_at': datetime.datetime.now().isoformat(),
            'model_class': model.__class__.__name__,
            'model_config': getattr(model, 'config', {}),
            'dspy_version': dspy.__version__ if hasattr(dspy, '__version__') else 'unknown'
        }
        
        if metadata:
            full_metadata.update(metadata)
        
        # Save metadata
        metadata_path = model_dir / "metadata.json"
        with open(metadata_path, 'w') as f:
            json.dump(full_metadata, f, indent=2)
        
        # Save model architecture (for reference)
        architecture_path = model_dir / "architecture.txt"
        with open(architecture_path, 'w') as f:
            f.write(f"Model Class: {model.__class__.__name__}\n")
            f.write(f"Module Attributes:\n")
            for attr_name in dir(model):
                if not attr_name.startswith('_') and hasattr(model, attr_name):
                    attr = getattr(model, attr_name)
                    if isinstance(attr, dspy.Module):
                        f.write(f"  - {attr_name}: {attr.__class__.__name__}\n")
        
        print(f"Model saved to: {model_dir}")
        return str(model_dir)
    
    def load_model(self, model_class, model_name: str, version: str = "latest") -> tuple:
        """Load a DSPy model with metadata."""
        
        model_base_dir = self.base_dir / model_name
        
        if version == "latest":
            # Find the latest version
            if not model_base_dir.exists():
                raise FileNotFoundError(f"Model {model_name} not found")
            
            versions = [d.name for d in model_base_dir.iterdir() if d.is_dir()]
            if not versions:
                raise FileNotFoundError(f"No versions found for model {model_name}")
            
            # Sort versions (simple string sort, could be improved)
            version = sorted(versions)[-1]
        
        model_dir = model_base_dir / version
        
        if not model_dir.exists():
            raise FileNotFoundError(f"Model {model_name} version {version} not found")
        
        # Load metadata
        metadata_path = model_dir / "metadata.json"
        if metadata_path.exists():
            with open(metadata_path, 'r') as f:
                metadata = json.load(f)
        else:
            metadata = {}
        
        # Initialize model (you need to provide the class)
        # In practice, you might store initialization parameters in metadata
        if hasattr(model_class, '__init__'):
            # Try to reconstruct from config if available
            config = metadata.get('model_config', {})
            model = model_class(**config) if config else model_class()
        else:
            model = model_class()
        
        # Load the model state
        model_path = model_dir / "model.json"
        model.load(str(model_path))
        
        print(f"Model loaded from: {model_dir}")
        return model, metadata
    
    def list_models(self) -> Dict[str, list]:
        """List all saved models and their versions."""
        models = {}
        
        for model_dir in self.base_dir.iterdir():
            if model_dir.is_dir():
                versions = [v.name for v in model_dir.iterdir() if v.is_dir()]
                models[model_dir.name] = sorted(versions)
        
        return models

# Test the model manager
manager = ModelManager()

# Save with metadata
metadata = {
    'description': 'Question answering system with fact checking',
    'training_data': 'General knowledge',
    'performance_metrics': {
        'accuracy': 0.85,
        'latency_ms': 1200
    },
    'author': 'DSPy Tutorial',
    'tags': ['qa', 'fact-check', 'production']
}

save_path = manager.save_model(
    model=qa_system,
    model_name="question_answering_system",
    version="1.0",
    metadata=metadata
)

# List saved models
print("\nSaved models:")
for model_name, versions in manager.list_models().items():
    print(f"  {model_name}: {versions}")

## Loading and Version Management

In [None]:
# Load the model
loaded_model, loaded_metadata = manager.load_model(
    model_class=QuestionAnsweringSystem,
    model_name="question_answering_system",
    version="latest"
)

print("Loaded model metadata:")
print(json.dumps(loaded_metadata, indent=2))

# Test the loaded model
test_result = loaded_model(
    question="Who invented the telephone?",
    context="The telephone was invented in the 19th century."
)

print(f"\nTest of loaded model:")
print(f"Answer: {test_result.answer}")
print(f"Confidence: {test_result.confidence}")

## Saving Optimized Models

When you've optimized a model using DSPy optimizers, you'll want to save the optimized parameters.

In [None]:
# Create some sample training data
sample_training_data = [
    dspy.Example(
        question="What is the capital of Italy?",
        context="Italy is a country in Southern Europe.",
        answer="Rome"
    ).with_inputs('question', 'context'),
    
    dspy.Example(
        question="What is the largest ocean?",
        context="Earth has several large bodies of water.",
        answer="Pacific Ocean"
    ).with_inputs('question', 'context'),
    
    dspy.Example(
        question="Who wrote Romeo and Juliet?",
        context="Romeo and Juliet is a famous play.",
        answer="William Shakespeare"
    ).with_inputs('question', 'context')
]

# Define a simple evaluation metric
def simple_accuracy(example, pred, trace=None):
    """Simple accuracy metric for demonstration."""
    return example.answer.lower() in pred.answer.lower()

# Create a new model for optimization
trainable_qa_system = QuestionAnsweringSystem(use_fact_check=False)  # Simpler for demo

# Set up optimizer (using a simple one for demo)
from dspy.teleprompt import BootstrapFewShot

optimizer = BootstrapFewShot(
    metric=simple_accuracy,
    max_bootstrapped_demos=2,
    max_labeled_demos=2
)

print("Optimizing model...")
try:
    optimized_qa_system = optimizer.compile(
        trainable_qa_system,
        trainset=sample_training_data
    )
    
    # Save the optimized model
    optimized_metadata = {
        'description': 'Optimized question answering system',
        'optimizer': 'BootstrapFewShot',
        'training_examples': len(sample_training_data),
        'optimization_date': datetime.datetime.now().isoformat(),
        'base_model': 'question_answering_system',
        'optimization_config': {
            'max_bootstrapped_demos': 2,
            'max_labeled_demos': 2,
            'metric': 'simple_accuracy'
        }
    }
    
    optimized_save_path = manager.save_model(
        model=optimized_qa_system,
        model_name="question_answering_system_optimized",
        version="1.0",
        metadata=optimized_metadata
    )
    
    print(f"Optimized model saved successfully!")
    
    # Test the optimized model
    optimized_result = optimized_qa_system(
        question="What is the smallest planet?",
        context="The solar system has planets of various sizes."
    )
    
    print(f"\nOptimized model test:")
    print(f"Answer: {optimized_result.answer}")
    print(f"Confidence: {optimized_result.confidence}")
    
except Exception as e:
    print(f"Optimization failed (this is expected in demo): {e}")
    print("In practice, you would have proper training data and evaluation setup.")

## Model Versioning and Deployment Pipeline

In [None]:
class ModelDeploymentPipeline:
    """Pipeline for model versioning and deployment."""
    
    def __init__(self, model_manager: ModelManager):
        self.manager = model_manager
        self.deployment_dir = self.manager.base_dir / "deployments"
        self.deployment_dir.mkdir(exist_ok=True)
    
    def create_deployment(self, model_name: str, version: str, 
                         deployment_name: str = "production") -> str:
        """Create a deployment of a specific model version."""
        
        # Create deployment directory
        deployment_path = self.deployment_dir / deployment_name
        deployment_path.mkdir(exist_ok=True)
        
        # Copy model files
        source_dir = self.manager.base_dir / model_name / version
        
        if not source_dir.exists():
            raise FileNotFoundError(f"Model {model_name} version {version} not found")
        
        # Copy files
        import shutil
        for file_path in source_dir.glob("*"):
            if file_path.is_file():
                shutil.copy2(file_path, deployment_path / file_path.name)
        
        # Create deployment manifest
        manifest = {
            'deployment_name': deployment_name,
            'model_name': model_name,
            'model_version': version,
            'deployed_at': datetime.datetime.now().isoformat(),
            'deployment_id': f"{model_name}_{version}_{deployment_name}"
        }
        
        manifest_path = deployment_path / "deployment_manifest.json"
        with open(manifest_path, 'w') as f:
            json.dump(manifest, f, indent=2)
        
        print(f"Deployment '{deployment_name}' created: {deployment_path}")
        return str(deployment_path)
    
    def load_deployment(self, deployment_name: str, model_class):
        """Load a deployed model."""
        
        deployment_path = self.deployment_dir / deployment_name
        
        if not deployment_path.exists():
            raise FileNotFoundError(f"Deployment '{deployment_name}' not found")
        
        # Load manifest
        manifest_path = deployment_path / "deployment_manifest.json"
        with open(manifest_path, 'r') as f:
            manifest = json.load(f)
        
        # Load metadata
        metadata_path = deployment_path / "metadata.json"
        with open(metadata_path, 'r') as f:
            metadata = json.load(f)
        
        # Load model
        config = metadata.get('model_config', {})
        model = model_class(**config) if config else model_class()
        
        model_path = deployment_path / "model.json"
        model.load(str(model_path))
        
        return model, manifest, metadata
    
    def list_deployments(self) -> Dict[str, Dict]:
        """List all deployments."""
        deployments = {}
        
        for deployment_dir in self.deployment_dir.iterdir():
            if deployment_dir.is_dir():
                manifest_path = deployment_dir / "deployment_manifest.json"
                if manifest_path.exists():
                    with open(manifest_path, 'r') as f:
                        manifest = json.load(f)
                    deployments[deployment_dir.name] = manifest
        
        return deployments

# Test deployment pipeline
deployment_pipeline = ModelDeploymentPipeline(manager)

# Create a production deployment
try:
    deployment_path = deployment_pipeline.create_deployment(
        model_name="question_answering_system",
        version="1.0",
        deployment_name="production"
    )
    
    # List deployments
    print("\nActive deployments:")
    for deployment_name, manifest in deployment_pipeline.list_deployments().items():
        print(f"  {deployment_name}: {manifest['model_name']} v{manifest['model_version']}")
    
    # Load and test production model
    prod_model, prod_manifest, prod_metadata = deployment_pipeline.load_deployment(
        "production",
        QuestionAnsweringSystem
    )
    
    print(f"\nProduction model loaded successfully!")
    print(f"Deployment ID: {prod_manifest['deployment_id']}")
    
    # Test production model
    prod_result = prod_model(
        question="What is photosynthesis?",
        context="Plants use sunlight to create energy."
    )
    
    print(f"\nProduction model test:")
    print(f"Answer: {prod_result.answer}")
    print(f"Confidence: {prod_result.confidence}")
    
except Exception as e:
    print(f"Deployment error: {e}")

## Best Practices for Model Persistence

In [None]:
class ProductionModelManager:
    """Production-ready model manager with additional features."""
    
    def __init__(self, base_dir: str = "saved_models"):
        self.manager = ModelManager(base_dir)
        self.config_file = Path(base_dir) / "manager_config.json"
        self.load_config()
    
    def load_config(self):
        """Load manager configuration."""
        if self.config_file.exists():
            with open(self.config_file, 'r') as f:
                self.config = json.load(f)
        else:
            self.config = {
                'retention_policy': {
                    'max_versions_per_model': 5,
                    'max_age_days': 90
                },
                'backup_enabled': True,
                'compression': True
            }
            self.save_config()
    
    def save_config(self):
        """Save manager configuration."""
        with open(self.config_file, 'w') as f:
            json.dump(self.config, f, indent=2)
    
    def save_model_with_validation(self, model: dspy.Module, model_name: str, 
                                 version: str, metadata: Dict[str, Any] = None,
                                 test_cases: list = None) -> str:
        """Save model with validation and testing."""
        
        # Validate model before saving
        if test_cases:
            print("Validating model before saving...")
            validation_results = []
            
            for i, test_case in enumerate(test_cases):
                try:
                    result = model(**test_case['inputs'])
                    validation_results.append({
                        'test_case': i,
                        'success': True,
                        'result': str(result)[:100] + "..." if len(str(result)) > 100 else str(result)
                    })
                except Exception as e:
                    validation_results.append({
                        'test_case': i,
                        'success': False,
                        'error': str(e)
                    })
            
            # Add validation results to metadata
            if metadata is None:
                metadata = {}
            metadata['validation_results'] = validation_results
            
            # Check if validation passed
            success_rate = sum(1 for r in validation_results if r['success']) / len(validation_results)
            if success_rate < 0.8:  # Require 80% success rate
                raise ValueError(f"Model validation failed. Success rate: {success_rate:.2%}")
            
            print(f"Validation passed with {success_rate:.2%} success rate")
        
        # Save model
        save_path = self.manager.save_model(model, model_name, version, metadata)
        
        # Clean up old versions if needed
        self.cleanup_old_versions(model_name)
        
        return save_path
    
    def cleanup_old_versions(self, model_name: str):
        """Clean up old model versions based on retention policy."""
        max_versions = self.config['retention_policy']['max_versions_per_model']
        
        model_dir = self.manager.base_dir / model_name
        if not model_dir.exists():
            return
        
        versions = [d for d in model_dir.iterdir() if d.is_dir()]
        
        if len(versions) > max_versions:
            # Sort by creation time and keep only the newest ones
            versions.sort(key=lambda x: x.stat().st_ctime)
            versions_to_remove = versions[:-max_versions]
            
            for version_dir in versions_to_remove:
                print(f"Removing old version: {version_dir}")
                import shutil
                shutil.rmtree(version_dir)
    
    def backup_models(self, backup_dir: str):
        """Create a backup of all models."""
        if not self.config['backup_enabled']:
            return
        
        import shutil
        import zipfile
        
        backup_path = Path(backup_dir)
        backup_path.mkdir(exist_ok=True)
        
        timestamp = datetime.datetime.now().strftime("%Y%m%d_%H%M%S")
        backup_file = backup_path / f"models_backup_{timestamp}.zip"
        
        with zipfile.ZipFile(backup_file, 'w', zipfile.ZIP_DEFLATED) as zipf:
            for file_path in self.manager.base_dir.rglob("*"):
                if file_path.is_file():
                    arcname = file_path.relative_to(self.manager.base_dir)
                    zipf.write(file_path, arcname)
        
        print(f"Backup created: {backup_file}")
        return str(backup_file)

# Test production model manager
prod_manager = ProductionModelManager()

# Define test cases for validation
test_cases = [
    {
        'inputs': {
            'question': 'What is 2+2?',
            'context': 'Basic arithmetic'
        }
    },
    {
        'inputs': {
            'question': 'What is the capital of Japan?',
            'context': 'Geography question'
        }
    }
]

# Save model with validation
try:
    validated_save_path = prod_manager.save_model_with_validation(
        model=qa_system,
        model_name="qa_system_validated",
        version="1.0",
        metadata={'validation': True, 'production_ready': True},
        test_cases=test_cases
    )
    
    print(f"Model saved with validation: {validated_save_path}")
    
    # Create backup
    backup_file = prod_manager.backup_models("backups")
    
except Exception as e:
    print(f"Validation or saving failed: {e}")

## Summary and Best Practices

This tutorial covered comprehensive model saving and loading in DSPy. Here are the key takeaways:

### Core Concepts:
1. **Basic Save/Load**: Use DSPy's built-in `.save()` and `.load()` methods
2. **Metadata Management**: Store important information about models
3. **Version Control**: Implement proper versioning for model evolution
4. **Deployment Pipeline**: Create systematic deployment processes

### Best Practices:

1. **Always include metadata**: Store model version, creation date, performance metrics
2. **Validate before saving**: Test models with sample inputs before persistence
3. **Version management**: Implement proper versioning and cleanup policies
4. **Backup strategy**: Regular backups of important models
5. **Deployment isolation**: Separate development and production model storage
6. **Documentation**: Include architecture descriptions and usage examples

### Production Considerations:

- **Security**: Encrypt sensitive model files
- **Access control**: Implement proper permissions
- **Monitoring**: Track model performance in production
- **Rollback capability**: Keep previous versions for quick rollback
- **Storage optimization**: Compress large models and clean up old versions

### File Structure Example:
```
saved_models/
├── model_name/
│   ├── 1.0/
│   │   ├── model.json
│   │   ├── metadata.json
│   │   └── architecture.txt
│   └── 1.1/
│       ├── model.json
│       ├── metadata.json
│       └── architecture.txt
├── deployments/
│   ├── production/
│   └── staging/
└── backups/
```

This approach ensures your DSPy models are properly managed, versioned, and ready for production deployment.