[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/weaviate/recipes/blob/main/integrations/llm-agent-frameworks/dspy/generative_search_cohere_dspy_optimized.ipynb)

# Setup


There are 4 basic steps to compile an LLM program with DSPy.
1. Connect DSPy to any LLMs or tools, such as Weaviate :), that you want to use.
2. Load your dataset, wrapping each example in a `dspy.Example` object.
3. Define your LLM program.
4. Define your Metric.

## 1. Connect DSPy to Command R and Weaviate

In [6]:
import dspy
from dspy.retrieve.weaviate_rm import WeaviateRM
import weaviate

# Connect LMs
command_r = dspy.Cohere(model="command-r", max_tokens=4000, api_key=cohere_api_key)
gpt4 = dspy.OpenAI(model="gpt-4", max_tokens=4000, model_type="chat")

# Connect to Weaviate
weaviate_client = weaviate.connect_to_local()
retriever_model = WeaviateRM("WeaviateBlogChunk", weaviate_client=weaviate_client)

# Set defaults in DSPy
dspy.settings.configure(lm=command_r, rm=retriever_model)

## 2. Load your Dataset

In [7]:
import json

file_path = './WeaviateBlogRAG-0-0-0.json'
with open(file_path, 'r') as file:
    dataset = json.load(file)

gold_answers = []
queries = []

for row in dataset:
    gold_answers.append(row["gold_answer"])
    queries.append(row["query"])
    
data = []

for i in range(len(gold_answers)):
    data.append(dspy.Example(gold_answer=gold_answers[i], question=queries[i]).with_inputs("question"))

trainset, devset, testset = data[:25], data[25:35], data[35:]

## 3. Define your LLM Program 

In [8]:
class GenerateAnswer(dspy.Signature):
    """Assess the the context and answer the question."""

    context = dspy.InputField(desc="Helpful information for answering the question.")
    question = dspy.InputField()
    answer = dspy.OutputField(desc="A detailed answer that is supported by the context.")
    
class RAG(dspy.Module):
    def __init__(self, k=3):
        super().__init__()
        
        self.retrieve = dspy.Retrieve(k=k)
        self.generate_answer = dspy.Predict(GenerateAnswer)
    
    def forward(self, question):
        context = self.retrieve(question).passages
        pred = self.generate_answer(context=context, question=question).answer
        return dspy.Prediction(context=context, answer=pred, question=question)

## 4. Define your Metric

In [9]:
class TypedEvaluator(dspy.Signature):
    """Evaluate the quality of a system's answer to a question according to a given criterion."""
    
    criterion: str = dspy.InputField(desc="The evaluation criterion.")
    question: str = dspy.InputField(desc="The question asked to the system.")
    ground_truth_answer: str = dspy.InputField(desc="An expert written Ground Truth Answer to the question.")
    predicted_answer: str = dspy.InputField(desc="The system's answer to the question.")
    rating: float = dspy.OutputField(desc="A float rating between 1 and 5")


def MetricWrapper(gold, pred, trace=None):
    alignment_criterion = "How aligned is the predicted_answer with the ground_truth?"
    return dspy.TypedPredictor(TypedEvaluator)(criterion=alignment_criterion,
                                          question=gold.question,
                                          ground_truth_answer=gold.gold_answer,
                                          predicted_answer=pred.answer).rating

# Great! You are all setup to use DSPy's Compilers

In [10]:
from dspy.teleprompt import COPRO

COPRO_teleprompter = COPRO(prompt_model=gpt4,
                          metric=MetricWrapper,
                          breadth=3,
                          depth=2,
                          init_temperature=0.7,
                          verbose=False,
                          track_stats=True)
kwargs = dict(num_threads=1, display_progress=False)

COPRO_compiled_RAG = COPRO_teleprompter.compile(RAG(), trainset=trainset[:3], eval_kwargs=kwargs)

Iteration Depth: 1/2.
At Depth 1/2, Evaluating Prompt Candidate #1/3 for Predictor 1 of 1.
Average Metric: 13.5 / 3  (450.0%)
At Depth 1/2, Evaluating Prompt Candidate #2/3 for Predictor 1 of 1.
Average Metric: 14.0 / 3  (466.7%)
At Depth 1/2, Evaluating Prompt Candidate #3/3 for Predictor 1 of 1.
Average Metric: 12.5 / 3  (416.7%)
Iteration Depth: 2/2.
At Depth 2/2, Evaluating Prompt Candidate #1/3 for Predictor 1 of 1.
Average Metric: 13.0 / 3  (433.3%)
At Depth 2/2, Evaluating Prompt Candidate #2/3 for Predictor 1 of 1.
Average Metric: 14.0 / 3  (466.7%)
At Depth 2/2, Evaluating Prompt Candidate #3/3 for Predictor 1 of 1.
Average Metric: 14.5 / 3  (483.3%)


# Save the Compiled Program

In [11]:
save_path = "RAG-with-Command-R-Example.json"
COPRO_compiled_RAG.save(save_path)

# Load and Print the Optimized RAG Instructions

In [41]:
def GenerateAnswer_instruction_from_dspy_json(file_path):
    with open(file_path, "r") as file:
        data = json.load(file)
    return data["generate_answer"]["signature_instructions"]

task_description = GenerateAnswer_instruction_from_dspy_json("RAG-with-Command-R-Example.json")
print(task_description)

Carefully examine the context provided, identify the key points and themes, comprehend the nuances, and then formulate a precise, comprehensive, and accurate answer to the question, ensuring that your response is directly supported by the information in the context.


# Query in Weaviate

In [42]:
query = "What is Product Quantization? Why is it helpful for Vector Databases?"
task_description += "\nQuery: {query}\nAnswer:"

weaviate_blogs = weaviate_client.collections.get("WeaviateBlogChunk")

response = weaviate_blogs.generate.near_text(
    query = query,
    limit = 3,
    grouped_task=task_description
)

print(response.generated)

The provided context discusses Product Quantization (PQ) as a technique for compressing vectors to reduce memory requirements. Here are the key points and themes:

- Product Quantization involves chopping up a high-dimensional vector into smaller segments and then compressing each segment independently. This allows for significant memory savings.
- The trade-off with PQ is that it can decrease recall due to the loss of information during compression. The "rescoring trick" mentioned in the context is likely a technique to mitigate this issue and improve the recall of the compressed vectors.
- The performance and efficiency of PQ depend on various factors, including the number of centroids used and the cost of fitting the KMeans clustering algorithm.
- The provided content includes a detailed explanation of how PQ works, along with visual aids, and also highlights the potential memory savings achieved through PQ.
- The context concludes with a set of highlights, including plans to discus