# DSPy

## What?
Framework to algorithmically optimizing LM prompts and weights.

## Why?
Using LMs to build a complex system we generally have to: 
- Break the problem down into steps
- Prompt your LM well until each step works well in isolation
- Tweak the steps to work well together
- Generate synthetic examples to tune each step
- Use these examples to finetune smaller LMs to cut costs.

Currently, this is hard and messy: every time you change your pipeline, your LM, or your data, all prompts (or finetuning steps) may need to change.

## How?
DSPy breaks this process into following three abstractions:
- **Signatures:** Abstract the input and output behaviour
- **Modules:** Defines the flow of your program and sets up a pipeline. E.g of modules: `Predict`, `ChainOfThought`, `ProgramOfThought`, `MultiChainComparison` and `React`.
- **Optimizers:** Also known as `Teleprompters`, it takes the program, a training set and an evaluation metric and returns a new optimized program for the required use-case. Used to train smaller LMs (student) using larger LMs (teacher).

> Training Set can be small or have incomplete examples or without labels unless needed to be used in metric.
Metric can be `Exact Match (EM)` or `F1` or any custom defined metric

## DSPy Architecture

![dspy_arch](./assets/dspy_arch.png)

### Signature:
Use signature to tell, `what to do`, instead of `how to do`. Need not write huge prompts.
DSPy supports inline short strings as signatures, can always write custom classes for the same.

Some signatures available in DSPy:

| Task                        | Signature                      |
|-----------------------------|--------------------------------|
| Question-Answering         | "question -> answer"           |
| Summarization               | "document -> summary"          |
| Sentiment classification    | "sentence -> sentiment"        |
| RAG                         | "context, question -> answer"  |
| MCQs with Reasoning        | "question, choices -> reasoning, selection" |


### Module:
Takes the signature and converts it into a sophisticated prompt, based on a given technique and LLM used. Can be thought of as a model layer defined in Pytorch that learns from data (input/output).

### Optimizer:
DSPy optimizer can optimize 3 things
- LM weights
- Instructions (Prompt/Signature)
- Demonstrations of Input-Ouput Behaviour

Current available optimizers: [https://dspy-docs.vercel.app/docs/building-blocks/optimizers#what-dspy-optimizers-are-currently-available](https://dspy-docs.vercel.app/docs/building-blocks/optimizers#what-dspy-optimizers-are-currently-available)

#### Which optimizer should I use?
As a rule of thumb, if you don't know where to start, use `BootstrapFewShotWithRandomSearch`.

Here's the general guidance on getting started:
- If you have `very little data`, e.g. 10 examples of your task, use `BootstrapFewShot`.
- If you have `slightly more data`, e.g. 50 examples of your task, use `BootstrapFewShotWithRandomSearch`.
- If you have `more data than that`, e.g. 300 examples or more, use `MIPRO`.
- If you have been able to use one of these with a `large LM` (e.g., 7B parameters or above) and need a very efficient program, `compile` that down to a `small LM` with `BootstrapFinetune`.

## General Workflow
Whatever the task, the general workflow is:

- Collect a little bit of data.
- Define examples of the inputs and outputs of your program (e.g., questions and their answers). This could just be a handful of quick examples you wrote down. If large datasets exist, the more the merrier!
- Define the modules (i.e., sub-tasks) of your program and the way they should interact together to solve your task.
- Define some validation logic. What makes for a good run of your program? Maybe the answers need to have a certain length or stick to a particular format? Specify the logic that checks that.
- Compile! Ask DSPy to compile your program using your data. The compiler will use your data and validation logic to optimize your program (e.g., prompts and modules) so it's efficient and effective!Iterate.
- Repeat the process by improving your data, program, validation, or by using more advanced features of the DSPy compiler.

## Demo - RAG (Unoptimized)

We shall try RAG on truefoundry docs that are ingested in local docker based Qdrant deployment 

In [1]:
# First need the embedding
from embeddings.mixedbread import MixBreadEmbeddings
embedding_model = MixBreadEmbeddings(
    model_name="mixedbread-ai/mxbai-embed-large-v1"
)

In [2]:
# Set up Retriever
from vectordb.qdrant import CustomQdrantRetriever, QdrantClient

qdrant_client = QdrantClient(url="http://localhost:6333")
retriever = CustomQdrantRetriever(
    qdrant_collection_name="tfdocs", 
    qdrant_client=qdrant_client, 
    embedding_model=embedding_model,
    k=5,
)


In [3]:
# also have other keys like metadata & score
retriever(["What is TrueFoundry", "What is a service"], k=2).passages

  0%|          | 0/1 [00:00<?, ?it/s]

  0%|          | 0/1 [00:00<?, ?it/s]

['---\ntitle: "About TrueFoundry"\nslug: "introduction"\nexcerpt: ""\nhidden: false\ncreatedAt: "Mon Oct 17 2022 11:07:50 GMT+0000 (Coordinated Universal Time)"\nupdatedAt: "Wed Dec 06 2023 11:53:56 GMT+0000 (Coordinated Universal Time)"\n---\n**TrueFoundry** is a platform on Kubernetes that makes it really easy to build, track, and deploy models without having a detailed understanding of Kubernetes. TrueFoundry is deployed on a cluster on your own cloud, so the data never leaves your environment, and you don\'t incur any data egress costs.',
 '---\ntitle: "Introduction to a Service"\nslug: "introduction-to-a-service"\nexcerpt: ""\nhidden: false\ncreatedAt: "Thu Oct 26 2023 02:02:12 GMT+0000 (Coordinated Universal Time)"\nupdatedAt: "Thu Dec 07 2023 20:15:11 GMT+0000 (Coordinated Universal Time)"\n---\nA Truefoundry Service represents a continuously running application that typically provides a set of APIs for interaction. Services can be dynamically scaled based on incoming traffic or

In [4]:
# Set up LLM
from dspy import OllamaLocal

llm = OllamaLocal(
    model="mistral:latest", 
    model_type="text", 
    max_tokens=1024, 
    top_p=1, 
    top_k=20, 
    base_url="http://localhost:11434", 
)

In [5]:
# Configure the settings
import dspy
dspy.configure(lm=llm)

In [6]:
# Generate Signature for Input
class GenerateAnswer(dspy.Signature):
    """Answer only based on the given context"""

    context = dspy.InputField(desc="Contains relevant context to answer the question")
    question = dspy.InputField()
    answer = dspy.OutputField(desc="Detailed answer with respect to given question and context")

In [7]:
# Create a RAG module
class RAG(dspy.Module):

    def __init__(self, retriever: dspy.Retrieve, k: int = 5):
        super().__init__()
        
        self.k = k
        self.retriever = retriever
        self.generate_answer = dspy.ChainOfThought(signature=GenerateAnswer)

    def forward(self, question, k=None):
        retrieved = self.retriever(question, k)
        prediction = self.generate_answer(context=retrieved.passages, question=question)

        # Return passages, answer, metadata and score
        return dspy.Prediction(context=retrieved.passages, answer=prediction.answer, metadata=retrieved.metadata, score=retrieved.probs)

In [8]:
# Ask any question you like to this simple RAG program.
my_question = "What is a Truefoundry service?"
uncompiled_rag = RAG(retriever=retriever, k=2)
# Get the prediction. This contains `pred.context` and `pred.answer`.
prediction = uncompiled_rag(my_question)

print(f"Answer: {prediction.answer}")
print("====================================\n\n")

for context, metadata in zip(prediction.context, prediction.metadata):
    print(f"Context: {context}")
    print("------------------------------------")
    print(f"Metadata: {metadata}")
    print("\n\n====================================")


  0%|          | 0/1 [00:00<?, ?it/s]

Answer: A Truefoundry service is an application or model that has been deployed on the TrueFoundry platform using Kubernetes for easy building, tracking, and deployment without requiring extensive knowledge of Kubernetes. It is hosted on a cluster in your own cloud, ensuring data security and eliminating data egress costs. The logs, metrics, and events provided by TrueFoundry are used to monitor these services and identify and debug issues.


Context: ---
title: "Introduction to a Service"
slug: "introduction-to-a-service"
excerpt: ""
hidden: false
createdAt: "Thu Oct 26 2023 02:02:12 GMT+0000 (Coordinated Universal Time)"
updatedAt: "Thu Dec 07 2023 20:15:11 GMT+0000 (Coordinated Universal Time)"
---
A Truefoundry Service represents a continuously running application that typically provides a set of APIs for interaction. Services can be dynamically scaled based on incoming traffic or resource demands.
Services are perfect for scenarios where real-time responses are essential, such as:

In [9]:
# Track llm history
llm.inspect_history(n=1)





Answer only based on the given context

---

Follow the following format.

Context: Contains relevant context to answer the question

Question: ${question}

Reasoning: Let's think step by step in order to ${produce the answer}. We ...

Answer: Detailed answer with respect to given question and context

---

Context:
[1] «---
title: "Introduction to a Service"
slug: "introduction-to-a-service"
excerpt: ""
hidden: false
createdAt: "Thu Oct 26 2023 02:02:12 GMT+0000 (Coordinated Universal Time)"
updatedAt: "Thu Dec 07 2023 20:15:11 GMT+0000 (Coordinated Universal Time)"
---
A Truefoundry Service represents a continuously running application that typically provides a set of APIs for interaction. Services can be dynamically scaled based on incoming traffic or resource demands.
Services are perfect for scenarios where real-time responses are essential, such as:
- Hosting Real-time Model Inference (e.g., Flask, FastAPI)
- Fueling Dynamic Website Backends
- Creating Model Demos (e.g., Stre

## Demo - RAG (Optimized)

In [10]:
# Create Train and Test examples - Can also use HF Datasets
train = [
    {
        "question" : "What factors influence the dynamic adjustment of the number of replicas in autoscaling, and how is the autoscaling strategy determined based on the Queue Backlog?",
        "answer" : "The dynamic adjustment of the number of replicas in autoscaling is influenced by the fluctuation in traffic or resource usage. Specifically, the autoscaling strategy is determined based on the Queue Backlog. When traffic or resource usage varies, the autoscaling strategy evaluates the Queue Backlog to decide the appropriate number of replicas between the defined minimum and maximum replica counts."
    }, 
    {
        "question" : "What role do autoscaling metrics play in dynamically adjusting resource allocation for an async service?",
        "answer" : "Autoscaling metrics play a crucial role in dynamically adjusting resource allocation for an async service by monitoring and responding to changing demands while maintaining optimal performance. These metrics, such as AWS SQS Average Backlog, provide insights into the queue length and help the autoscaler determine the appropriate resource allocation to handle incoming requests efficiently."
    },
    {
        "question" : "What is the purpose of creating teams within the Truefoundry platform, and how does it streamline resource management?",
        "answer" : "Creating teams within the Truefoundry platform serves to streamline resource management by simplifying access control and allocation processes. Teams allow users to group individuals with similar responsibilities or access requirements, reducing the need to individually assign permissions for each resource. Additionally, teams facilitate efficient collaboration by providing a structured framework for managing access permissions across multiple resources."
    },
    {
        "question" : "How does the Truefoundry platform support role-based access control (RBAC) for managing user permissions?",
        "answer" : "The Truefoundry platform supports role-based access control (RBAC) by assigning specific roles to users based on their responsibilities and access requirements. Each role defines a set of permissions that determine the actions users can perform on resources within the platform. By assigning roles to users, administrators can enforce security policies, restrict unauthorized access, and ensure compliance with data protection regulations."
    },
    {
        "question" : "What are the key benefits of using the Truefoundry platform for managing cloud resources?",
        "answer" : "The Truefoundry platform offers several key benefits for managing cloud resources, including centralized resource management, automated provisioning, and enhanced security features. By providing a unified interface to monitor and control cloud resources, Truefoundry simplifies resource allocation, reduces operational overhead, and improves scalability. Additionally, automated provisioning capabilities streamline resource deployment processes, while security features such as RBAC and encryption enhance data protection and compliance."
    },
    {
        "question" : "What are the steps involved in creating your account on TrueFoundry?",
        "answer" : "Navigate to the `create your account` page on the TrueFoundry website. Fill out the registration form with your company name, work email, username, and password. Click the `Create Account` button to submit the form."
    },
    {
        "question" : "What cloud providers are supported for creating Kubernetes clusters on TrueFoundry, and what are the recommended options for accessing all platform features?",
        "answer" : "TrueFoundry supports AWS EKS, GCP GKE, and Azure AKS for creating Kubernetes clusters. For accessing all platform features, it is recommended to use one of these major cloud providers. Note that kind and minikube, while supported for local clusters, may not support all platform features."
    },
    {
        "question" : "What are TrueFoundry jobs, and in what scenarios are they particularly well-suited?",
        "answer" : "TrueFoundry jobs are task-oriented workloads designed to run for a specific duration to complete a task and then terminate, releasing the resources. They are well-suited for scenarios such as model training on large datasets, routine maintenance tasks like data backups and report generation, and large-scale batch inference tasks."
    },
    {
        "question" : "What is an MLRepo in Truefoundry, and how does it differ from Git repositories?",
        "answer" : "An MLRepo in Truefoundry serves the purpose of versioning ML models, artifacts, and metadata, similar to how Git repositories version code. However, MLRepos are specifically tailored for ML assets, and access to them can be granted to workspaces, enabling secure and controlled access to ML assets across teams and applications."
    }
]


test = [
    {
        "question" : "How can you view the details of a job run in TrueFoundry?",
        "answer" : "You can view the details of a job run in TrueFoundry by accessing the Job Run section. This section provides information about the status and progress of the job run"
    },
    {
        "question" : "What are key design principles of truefoundry?",
        "answer" : "The key design principles of TrueFoundry are: Cloud Native: TrueFoundry operates on Kubernetes, allowing it to function on various cloud providers or on-premises environments. ML Inherits the same SRE principles as the rest of the infrastructure: TrueFoundry seamlessly integrates with your existing software stack, providing ML teams with the same SRE (Site Reliability Engineering), security, and cost optimization features. No Vendor Lockin: TrueFoundry is designed to avoid vendor lock-in. It ensures easy migration by providing accessible APIs and exposing all Kubernetes manifests generated, enabling smooth transition if needed."
    },
    {
        "question" : "What architecture does TrueFoundry follow, and what benefits does it offer?",
        "answer" : "TrueFoundry follows a split-plane architecture, enabling both on-premises deployment and ensuring that service reliability does not rely solely on TrueFoundry. This architecture enhances reliability and flexibility while allowing for customization based on specific organizational needs."
    },
    {
        "question" : "How is the organization of workspaces typically structured within a cluster?",
        "answer" : "Workspaces within a cluster can be organized based on teams, applications, and environments. For example, different teams may manage various applications, each with its own set of environments such as development, staging, and production."
    },
    {
        "question" : "How can a user create a workspace in Truefoundry?",
        "answer" : "To create a workspace in Truefoundry, users can navigate to the Workspace tab in the platform and click on the `Create Workspace` button. Once created, users can obtain the Fully Qualified Name (FQN) of the workspace from the FQN button."
    },
    {
        "question" : "What is the process for creating an ML Repo in Truefoundry?",
        "answer" : "To create an ML Repo in Truefoundry, users need to have at least one Storage Integration configured. They can then access the list of storage integrations from the dropdown menu and select one to associate with the ML Repo. After selecting a storage integration, users can create an ML Repo from the ML Repo's tab in the platform."
    },
]

In [11]:
from dspy import Example

trainset = [Example(**data) for data in train]
testset = [Example(**data) for data in test]

In [12]:
trainset[0]

Example({'question': 'What factors influence the dynamic adjustment of the number of replicas in autoscaling, and how is the autoscaling strategy determined based on the Queue Backlog?', 'answer': 'The dynamic adjustment of the number of replicas in autoscaling is influenced by the fluctuation in traffic or resource usage. Specifically, the autoscaling strategy is determined based on the Queue Backlog. When traffic or resource usage varies, the autoscaling strategy evaluates the Queue Backlog to decide the appropriate number of replicas between the defined minimum and maximum replica counts.'}) (input_keys=None)

In [13]:
# Tell DSPy that the 'question' field is the input. Any other fields are labels and/or metadata.
trainset = [x.with_inputs('question') for x in trainset]
devset = [x.with_inputs('question') for x in testset]

In [14]:
trainset[0]

Example({'question': 'What factors influence the dynamic adjustment of the number of replicas in autoscaling, and how is the autoscaling strategy determined based on the Queue Backlog?', 'answer': 'The dynamic adjustment of the number of replicas in autoscaling is influenced by the fluctuation in traffic or resource usage. Specifically, the autoscaling strategy is determined based on the Queue Backlog. When traffic or resource usage varies, the autoscaling strategy evaluates the Queue Backlog to decide the appropriate number of replicas between the defined minimum and maximum replica counts.'}) (input_keys={'question'})

In [15]:
len(trainset), len(testset)

(9, 6)

In [18]:
from dspy.teleprompt import BootstrapFewShot

# Validation logic: check that the predicted answer is correct.
# Also check that the retrieved context does actually contain that answer.
def validate_context_and_answer(example, pred, trace=None):
    answer_EM = dspy.evaluate.answer_exact_match(example, pred)
    answer_PM = dspy.evaluate.answer_passage_match(example, pred)
    return answer_EM and answer_PM

# Set up a basic optimizer, which will compile our RAG program.
optimizer = BootstrapFewShot(metric=validate_context_and_answer)

# Compile!
compiled_rag = optimizer.compile(RAG(retriever=retriever), trainset=trainset)

TypeError: cannot pickle '_thread.RLock' object