# DSPy

## What?
Framework to algorithmically optimizing LM prompts and weights.

## Why?
Using LMs to build a complex system we generally have to: 
- Break the problem down into steps
- Prompt your LM well until each step works well in isolation
- Tweak the steps to work well together
- Generate synthetic examples to tune each step
- Use these examples to finetune smaller LMs to cut costs.

Currently, this is hard and messy: every time you change your pipeline, your LM, or your data, all prompts (or finetuning steps) may need to change.

## How?
DSPy breaks this process into following three abstractions:
- **Signatures:** Abstract the input and output behaviour
- **Modules:** Defines the flow of your program and sets up a pipeline. E.g of modules: `Predict`, `ChainOfThought`, `ProgramOfThought`, `MultiChainComparison` and `React`.
- **Optimizers:** Also known as `Teleprompters`, it takes the program, a training set and an evaluation metric and returns a new optimized program for the required use-case. Used to train smaller LMs (student) using larger LMs (teacher).

> Training Set can be small or have incomplete examples or without labels unless needed to be used in metric.
Metric can be `Exact Match (EM)` or `F1` or any custom defined metric

## DSPy Architecture

![dspy_arch](./assets/dspy_arch.png)

### Signature:
Use signature to tell, `what to do`, instead of `how to do`. Need not write huge prompts.
DSPy supports inline short strings as signatures, can always write custom classes for the same.

Some signatures available in DSPy:

| Task                        | Signature                      |
|-----------------------------|--------------------------------|
| Question-Answering         | "question -> answer"           |
| Summarization               | "document -> summary"          |
| Sentiment classification    | "sentence -> sentiment"        |
| RAG                         | "context, question -> answer"  |
| MCQs with Reasoning        | "question, choices -> reasoning, selection" |


### Module:
Takes the signature and converts it into a sophisticated prompt, based on a given technique and LLM used. Can be thought of as a model layer defined in Pytorch that learns from data (input/output).

### Optimizer:
DSPy optimizer can optimize 3 things
- LM weights
- Instructions (Prompt/Signature)
- Demonstrations of Input-Ouput Behaviour

Current available optimizers: [https://dspy-docs.vercel.app/docs/building-blocks/optimizers#what-dspy-optimizers-are-currently-available](https://dspy-docs.vercel.app/docs/building-blocks/optimizers#what-dspy-optimizers-are-currently-available)

#### Which optimizer should I use?
As a rule of thumb, if you don't know where to start, use `BootstrapFewShotWithRandomSearch`.

Here's the general guidance on getting started:
- If you have `very little data`, e.g. 10 examples of your task, use `BootstrapFewShot`.
- If you have `slightly more data`, e.g. 50 examples of your task, use `BootstrapFewShotWithRandomSearch`.
- If you have `more data than that`, e.g. 300 examples or more, use `MIPRO`.
- If you have been able to use one of these with a `large LM` (e.g., 7B parameters or above) and need a very efficient program, `compile` that down to a `small LM` with `BootstrapFinetune`.

## General Workflow
Whatever the task, the general workflow is:

- Collect a little bit of data.
- Define examples of the inputs and outputs of your program (e.g., questions and their answers). This could just be a handful of quick examples you wrote down. If large datasets exist, the more the merrier!
- Define the modules (i.e., sub-tasks) of your program and the way they should interact together to solve your task.
- Define some validation logic.Â What makes for a good run of your program? Maybe the answers need to have a certain length or stick to a particular format? Specify the logic that checks that.
- Compile!Â AskÂ DSPyÂ toÂ compileÂ your program using your data. The compiler will use your data and validation logic to optimize your program (e.g., prompts and modules) so it's efficient and effective!Iterate.
- Repeat the process by improving your data, program, validation, or by using more advanced features of theÂ DSPyÂ compiler.

## Demo - RAG (Unoptimized)

We shall try RAG on truefoundry docs that are ingested in local docker based Qdrant deployment 

In [1]:
# Imports
from embeddings import MixBreadEmbeddings
from vectordb.qdrant import CustomQdrantRetriever, QdrantClient
from reranker import MxBaiReranker
from dspy import OllamaLocal
import dspy

In [2]:
# First need the embedding
embedding_model = MixBreadEmbeddings(
    model_name="mixedbread-ai/mxbai-embed-large-v1"
)

In [3]:
# Set up Retriever
qdrant_client = QdrantClient(url="http://localhost:6333")
retriever = CustomQdrantRetriever(
    qdrant_collection_name="tfdocs", 
    qdrant_client=qdrant_client, 
    embedding_model=embedding_model,
    k=5,
)


In [4]:
# also have other keys like metadata & score
retriever("What is service", k=2)

  0%|          | 0/1 [00:00<?, ?it/s]

[{'long_text': "# Service\n### Properties\n| ports           | \\[[Port](doc:service-1#port)]            | true     | Specify the ports you want the service to be exposed to                                                                                                                      |\n| liveness_probe  | [HealthProbe](doc:service-1#healthprobe) | false    | Describes the configuration for the Health Probe's<br>To learn more you can go [here](doc:add-health-checks-to-deployments)                                                  |\n| readiness_probe | [HealthProbe](doc:service-1#healthprobe) | false    | Describes the configuration for the Health Probe's<br>To learn more you can go [here](doc:add-health-checks-to-deployments)                                                  |\n| service_account | string                                   | false    | Service account that this workload should use                                                                                       

In [5]:
# Set up LLM
llm = OllamaLocal(
    model="llama3:8b-instruct-q5_1", 
    model_type="chat", 
    max_tokens=1024, 
    top_p=1, 
    top_k=20, 
    base_url="http://localhost:11434", 
    frequency_penalty=0.9,
    presence_penalty=2,
)

In [6]:
# Configure the settings
dspy.configure(lm=llm, rm=retriever)

In [7]:
class GenerateAnswer(dspy.Signature):
    """Answer the question in detail based on the given context."""

    context = dspy.InputField(desc="Contains relevant facts to answer the question")
    question = dspy.InputField()
    answer = dspy.OutputField(desc="Detailed answer with respect to given question and context")


In [8]:
class RAG(dspy.Module):

    def __init__(self, k: int = 15, reranker_model="mixedbread-ai/mxbai-rerank-xsmall-v1", top_k: int = 5):
        super().__init__()
        
        self.k = k
        self.top_k = top_k
        self.retriever = dspy.Retrieve(k=self.k)
        self.reranker = MxBaiReranker(model_name=reranker_model, k=self.top_k)
        self.generate_answer = dspy.Predict(signature=GenerateAnswer)
        # can also use CoT
        # self.generate_answer = dspy.ChainOfThought(signature=GenerateAnswer)

    def forward(self, question, k=None, top_k=None):
        passages = self.retriever(question, k).passages
        reranked_passages = self.reranker(question, top_k, documents=passages)
        prediction = self.generate_answer(context=reranked_passages, question=question)
        return dspy.Prediction(context=passages, answer=prediction.answer)
    

In [9]:
# Ask any question you like to this simple RAG program.
uncompiled_rag = RAG()
my_question = "What is a service in Truefoundry?"
prediction = uncompiled_rag(my_question, k=10, top_k=5)
print("====================================")
print(f"Answer: {prediction.answer}")
print("====================================")

  0%|          | 0/1 [00:00<?, ?it/s]

Batches:   0%|          | 0/1 [00:00<?, ?it/s]

Answer: According to the provided text, a Service in Truefoundry refers to an application or function that can be deployed using their platform. It's described as "a single unit of deployment" which can contain multiple functions and dependencies.

In other words, it represents a container for your code (functions) along with its dependencies (e.g., libraries), allowing you to deploy them together in one go.


In [10]:
# Track llm history
llm.inspect_history(n=1)





Answer the question in detail based on the given context.

---

Follow the following format.

Context: Contains relevant facts to answer the question
Question: ${question}
Answer: Detailed answer with respect to given question and context

---

Context:
[1] Â«{'long_text': '---\ntitle: "Introduction to a Service"\nslug: "introduction-to-a-service"\nexcerpt: ""\nhidden: false\ncreatedAt: "Thu Oct 26 2023 02:02:12 GMT+0000 (Coordinated Universal Time)"\nupdatedAt: "Thu Dec 07 2023 20:15:11 GMT+0000 (Coordinated Universal Time)"\n---\nA Truefoundry Service represents a continuously running application that typically provides a set of APIs for interaction. Services can be dynamically scaled based on incoming traffic or resource demands.\nServices are perfect for scenarios where real-time responses are essential, such as:\n- Hosting Real-time Model Inference (e.g., Flask, FastAPI)\n- Fueling Dynamic Website Backends\n- Creating Model Demos (e.g., Streamlit, Gradio)'}Â»
[2] Â«{'long_text