# Unit 4

## Finishing Optimization: Saving and Loading DSPy Programs

# Introduction to Saving and Loading in DSPy

Welcome to our final lesson on optimizing with **DSPy**\! Throughout this course, you've learned how to enhance your DSPy programs by tuning prompts, instructions, and model weights using techniques like **Few-Shot Learning**, **Instruction Optimization**, and **Automatic Finetuning**.

After spending time and computational resources optimizing a program (especially with expensive optimizers like **BootstrapFewShotWithRandomSearch** or **MIPROv2**), **persistence** is crucial. DSPy offers straightforward mechanisms to save and load your optimized programs, making your work reusable and persistent.

### Why Save Optimized Programs?

Saving your optimized programs serves several important purposes in an ML workflow:

  * **Preserves Work:** Allows you to close your environment and return later without losing optimization progress.
  * **Enables Sharing/Deployment:** Makes it possible to share optimized programs with teammates or deploy them to production.
  * **Creates Checkpoints:** Allows you to compare different optimization approaches across various checkpoints.
  * **Saves Computational Resources:** Eliminates the need to re-run time-consuming optimization processes.

Regardless of how an optimizer modifies your program—whether by adding **Few-Shot Learning examples**, refining **Instruction Optimization** text, or updating **Finetuning** model weights—DSPy's saving and loading functionality preserves all these changes.

-----

## Saving Optimized DSPy Programs

DSPy provides a simple **`.save()`** method that can be called on any optimized program instance.

### The `.save()` Method

This single line of code saves your entire optimized program to a file, typically in **plain-text JSON format**.

```python
# Assuming you have an optimized program from any optimizer
optimized_program.save("my_optimized_program.json")
```

The saved JSON file is transparent and contains all the learned parameters and program structure, including:

  * The program's class name and structure.
  * All **optimized prompts** with their instructions and examples (e.g., bootstrapped examples from `BootstrapFewShot`).
  * The **optimized instructions** (e.g., from `COPRO` or `MIPROv2`).
  * Any configuration parameters specific to your program.
  * The signatures of each module.

### Best Practices for Saving

1.  **Consistent Naming Convention:** Include metadata in the filename to easily identify and compare versions.

    ```python
    # Good naming convention
    filename = f"qa_program_miprov2_acc{accuracy:.2f}_{date.today().strftime('%Y%m%d')}.json"
    optimized_program.save(filename)
    # Result: qa_program_miprov2_acc0.87_20230615.json
    ```

2.  **Structured Directory Organization:** Use dedicated folders for projects, optimizers, or program types.

    ```python
    import os
    save_dir = "saved_programs/qa_system/miprov2"
    os.makedirs(save_dir, exist_ok=True)
    optimized_program.save(os.path.join(save_dir, "optimized_qa.json"))
    ```

3.  **Saving Metadata:** Use a separate file to store additional context about the optimization run.

    ```python
    import json
    metadata = {
        "optimizer": "MIPROv2",
        "dataset_size": len(trainset),
        "optimization_time_minutes": 120,
        "notes": "Used auto='light' setting with 3 bootstrapped demos"
    }
    with open(os.path.join(save_dir, "optimized_qa_metadata.json"), "w") as f:
        json.dump(metadata, f, indent=2)
    ```

-----

## Loading Previously Optimized Programs

Loading is done using the **`.load()`** method on a newly initialized program instance.

### The `.load()` Method

You must first initialize an instance of the original program class so DSPy knows the structure, and then load the optimized parameters into that instance.

```python
# 1. Initialize an instance of your program class
loaded_program = YOUR_PROGRAM_CLASS()

# 2. Load the saved program parameters into that instance
loaded_program.load(path="my_optimized_program.json") 
# Output: Loaded program from my_optimized_program.json
```

Once loaded, the program is fully optimized and ready for inference, behaving exactly as it did when it was saved.

### Troubleshooting and Best Practices for Loading

| Issue | Solution | Code Example |
| :--- | :--- | :--- |
| **Wrong Class Structure** | Ensure you initialize the **exact same program class** that was saved. | `loaded_program = YourOriginalClass()` |
| **Incorrect File Path** | Always verify the path; use full paths if running in a different environment. | `loaded_program.load(path="/absolute/path/to/file.json")` |
| **Finetuned Model** | If the program used a finetuned model, ensure the model is loaded and manually assigned to the program's predictors. | `p.lm = dspy.HFModel(...)` |
| **DSPy Version Mismatch** | Use the same DSPy version for saving and loading to avoid compatibility issues. | (Use consistent `pip install dspy` version) |

-----

## Next Steps and Conclusion

Optimization in DSPy is an **iterative process**. As you continue, remember to ask critical questions:

  * Is the **task definition** clear?
  * Do you need **more data**?
  * Is the **metric** truly capturing the performance goal?
  * Would a more **sophisticated optimizer** (like `MIPROv2`) or a **more complex program structure** help?
  * Could you **combine multiple optimizers** in sequence?

By mastering the saving and loading mechanisms, you ensure your refined and optimized DSPy programs are **persistent** and ready for continuous improvement and practical deployment. Congratulations\! 🛠️

## Saving Your DSPy Optimization Work

Now that you understand how saving optimized programs is a crucial part of the DSPy workflow, let's put this knowledge into practice! In this exercise, you'll work with a simple DSPy program that has already been "optimized" (for demonstration purposes).

Your task is to implement the final step in the optimization workflow — saving the program to a persistent file. Specifically, you need to:

Add the code to save the optimized program to a file named optimized_program.json.
Uncomment the verification code to check that the file was created successfully.
Uncomment the code that validates the file contains proper JSON data.
This exercise reinforces the importance of persistence in your optimization workflow. By saving your optimized programs, you ensure that all your hard optimization work can be reused without having to repeat computationally expensive processes.

```python
import os
import json
import dspy

# Define a simple DSPy program
class SimpleQA(dspy.Module):
    def __init__(self):
        super().__init__()
        self.generate_answer = dspy.ChainOfThought("question -> answer")
    
    def forward(self, question):
        return self.generate_answer(question=question)

# Set up the language model
lm = dspy.LM('openai/gpt-4o-mini', api_key=os.environ['OPENAI_API_KEY'], api_base=os.environ['OPENAI_BASE_URL'])
dspy.configure(lm=lm)

# Create and "optimize" the program
# (In a real scenario, you would use an optimizer like BootstrapFewShot or MIPRO)
program = SimpleQA()
optimized_program = program  # Pretend this is optimized

# TODO: Add code to save the optimized program to a file named "optimized_program.json"


# Verify the file was created
# TODO: Uncomment the code below to verify the file was created
# if os.path.exists("optimized_program.json"):
#     print("✓ File was created successfully!")
# else:
#     print("✗ File was not created.")

# Verify the file contains valid JSON
# TODO: Uncomment the code below to verify the file contains valid JSON
# try:
#     with open("optimized_program.json", "r") as f:
#         json_content = json.load(f)
#     print("✓ File contains valid JSON data!")
# except json.JSONDecodeError:
#     print("✗ File does not contain valid JSON data.")
# except Exception as e:
#     print(f"✗ Error when checking file: {e}")

# Print a success message
print("\nGreat job! You've successfully saved your optimized DSPy program.")
print("This file can now be loaded in future sessions using the .load() method.")

```

It's essential to persist your hard-earned optimization work. The final step in the workflow is to use the **`.save()`** method on your optimized DSPy program.

Here is the completed code implementing the saving logic and enabling the verification steps.

```python
import os
import json
import dspy

# Define a simple DSPy program
class SimpleQA(dspy.Module):
    def __init__(self):
        super().__init__()
        self.generate_answer = dspy.ChainOfThought("question -> answer")
    
    def forward(self, question):
        return self.generate_answer(question=question)

# Set up the language model
try:
    # Use dspy.LM for correct class name and set API keys
    lm = dspy.LM('openai/gpt-4o-mini', api_key=os.environ['OPENAI_API_KEY'], api_base=os.environ['OPENAI_BASE_URL'])
    dspy.configure(lm=lm)
except Exception as e:
    print(f"LM Configuration Error: {e}. Using a mock LM for code structure review.")
    # Use a generic predictor for simulation if LM fails
    class MockPredictor(dspy.Module):
        def forward(self, question):
            return dspy.Prediction(answer="Simulated Answer")
    class SimpleQA(dspy.Module):
        def __init__(self):
            super().__init__()
            self.generate_answer = MockPredictor()
        def forward(self, question):
            return self.generate_answer(question=question)

# Create and "optimize" the program
# (In a real scenario, you would use an optimizer like BootstrapFewShot or MIPRO)
program = SimpleQA()
optimized_program = program  # Pretend this is optimized

# TODO: Add code to save the optimized program to a file named "optimized_program.json"
FILENAME = "optimized_program.json"
optimized_program.save(FILENAME)


# Verify the file was created
# TODO: Uncomment the code below to verify the file was created
if os.path.exists(FILENAME):
    print("✓ File was created successfully!")
else:
    print("✗ File was not created.")

# Verify the file contains valid JSON
# TODO: Uncomment the code below to verify the file contains valid JSON
try:
    with open(FILENAME, "r") as f:
        json_content = json.load(f)
    print("✓ File contains valid JSON data!")
except json.JSONDecodeError:
    print("✗ File does not contain valid JSON data.")
except Exception as e:
    print(f"✗ Error when checking file: {e}")

# Print a success message
print("\nGreat job! You've successfully saved your optimized DSPy program.")
print("This file can now be loaded in future sessions using the .load() method.")
```

## Loading Optimized DSPy Programs

## Quiz on DSPy Optimization Steps