[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/weaviate/recipes/blob/main/integrations/operations/langwatch/weaviate_dspy_visualization.ipynb)

# DSPy with Weaviate + LangWatch DSPy Visualizer

This notebook shows an example of DSPy RAG program using Weaviate as the vector database and LangWatch for visualization of the DSPy optimization process.

In [None]:
# Install weaviate and dspy along with langwatch for the visualization
%pip install weaviate-client "dspy-ai[weaviate]" langwatch

## 1. Load Data into Weaviate

You need a running Weaviate cluster with data:

1. Learn about the installation options [here](https://weaviate.io/developers/weaviate/installation), or use the `./docker-compose.yml` file, which uses Cohere for embeddings
3. Import your data:

    a. You can follow the [Weaviate-Import.ipynb](../../llm-frameworks/dspy/Weaviate-Import.ipynb) notebook to load in the Weaviate blogs
  
    b. Or follow this [Quickstart Guide](https://weaviate.io/developers/weaviate/quickstart)


## 2. Prepare the LLM and Retriever

In [1]:
import os
from getpass import getpass

os.environ["OPENAI_API_KEY"] = getpass("Enter your OPENAI_API_KEY: ")

import dspy
from dspy.retrieve.weaviate_rm import WeaviateRM
import weaviate

llm = dspy.OpenAI(
    model="gpt-4o-mini",
    max_tokens=4096,
    temperature=0,
    api_key=os.environ["OPENAI_API_KEY"]
)

print("LLM test response:", llm("hello there"))

weaviate_client = weaviate.connect_to_local()
retriever_model = WeaviateRM("WeaviateBlogChunk", weaviate_client=weaviate_client)

print("Retriever test response:", retriever_model("LLMs")[0])

dspy.settings.configure(lm=llm, rm=retriever_model)

LLM test response: ['Hello! How can I assist you today?']
Retriever test response: {'long_text': "LLMs are a versatile tool that is seen in many applications like chatbots, content creation, and much more. Despite being a powerful tool, LLMs have the drawback of being too general. Reasoning: Let's think step by step in order to **produce the query. We need to identify the unique aspects of the document that would allow us to formulate a question that this document can answer. The document seems to focus on the combination of LangChain and Weaviate, mentioning the benefits of LangChain in overcoming limitations of LLMs such as hallucination and limited input lengths."}


I0000 00:00:1721929852.765644 4075415 config.cc:230] gRPC experiments enabled: call_status_override_on_cancellation, event_engine_dns, event_engine_listener, http2_stats_fix, monitoring_experiment, pick_first_new, trace_record_callops, work_serializer_clears_time_cache


## 3. Prepare the Dataset

In [2]:
import httpx

dataset = httpx.get("https://raw.githubusercontent.com/weaviate/recipes/main/integrations/llm-agent-frameworks/dspy/WeaviateBlogRAG-0-0-0.json").json()

gold_answers = []
queries = []

for row in dataset:
    gold_answers.append(row["gold_answer"])
    queries.append(row["query"])

data = []

for i in range(len(gold_answers)):
    data.append(dspy.Example(gold_answer=gold_answers[i], question=queries[i]).with_inputs("question"))

trainset, devset = data[:30], data[30:50]

len(trainset), len(devset)

(30, 20)

## 4. Define the RAG model

In [3]:
class GenerateAnswer(dspy.Signature):
    """Assess the the context and answer the question."""

    context = dspy.InputField(desc="Helpful information for answering the question.")
    question = dspy.InputField()
    answer = dspy.OutputField(desc="A detailed answer that is supported by the context.")


class RAG(dspy.Module):
    def __init__(self, k=3):
        super().__init__()

        self.retrieve = dspy.Retrieve(k=k)
        self.generate_answer = dspy.Predict(GenerateAnswer)

    def forward(self, question):
        context = self.retrieve(question).passages
        pred = self.generate_answer(context=context, question=question).answer
        return dspy.Prediction(context=context, answer=pred, question=question)


dev_example = devset[0]
print(f"[Devset] Question: {dev_example.question}")
print(f"[Devset] Answer: {dev_example.gold_answer}")

generate_answer = RAG()

pred = generate_answer(question=dev_example.question)

# Print the input and the prediction.
print(f"[Prediction] Question: {dev_example.question}")
print(f"[Prediction] Predicted Answer: {pred.answer}")

[Devset] Question: What is the strategy for chunking text for vectorization when dealing with a 512 token length limit?
[Devset] Answer: The strategy for chunking text for vectorization when dealing with a 512 token length limit involves using a Large Language Model to identify suitable places to cut up text chunks. This process, known as "chunking", breaks down long documents into smaller sections, each containing an important piece of information. This approach not only helps to stay within the LLMs token limit but also enhances the retrieval of information. It's important to note that the chunking should be done thoughtfully, not just splitting a list of items into 2 chunks because the first half fell into the tail end of a chunk[:512] loop.
[Prediction] Question: What is the strategy for chunking text for vectorization when dealing with a 512 token length limit?
[Prediction] Predicted Answer: Context: The provided context discusses the limitations of vector embedding models, partic

## 5. Define your Metric

In [110]:
class TypedEvaluator(dspy.Signature):
    """Evaluate the quality of a system's answer to a question according to a given criterion.
    Please be a bit harsh, only give a 5 to a truly above and beyond answer.
    """

    criterion: str = dspy.InputField(desc="The evaluation criterion.")
    question: str = dspy.InputField(desc="The question asked to the system.")
    ground_truth_answer: str = dspy.InputField(desc="An expert written Ground Truth Answer to the question.")
    predicted_answer: str = dspy.InputField(desc="The system's answer to the question.")
    rating: float = dspy.OutputField(desc="A float rating between 1 and 5")


def MetricWrapper(gold, pred, trace=None):
    alignment_criterion = "How aligned is the predicted_answer with the ground_truth?"
    return dspy.TypedPredictor(TypedEvaluator)(criterion=alignment_criterion,
                                          question=gold.question,
                                          ground_truth_answer=gold.gold_answer,
                                          predicted_answer=pred.answer).rating

from dspy.evaluate.evaluate import Evaluate

evaluate = Evaluate(devset=devset, num_threads=4, display_progerss=False)

uncomplied_score = evaluate(RAG(), metric=MetricWrapper)
uncomplied_score

422.5

## 6. Connect to LangWatch

In [4]:
import langwatch

langwatch.login()

Please go to https://app.langwatch.ai/authorize to get your API key
LangWatch API key set


## 7. Start Training Session!

This will cost around $0.40

In [None]:
from dspy.teleprompt import MIPROv2
import dspy.evaluate

# use gpt-4o as the prompt model to teach gpt4-mini
teacher = dspy.OpenAI(
    model="gpt-4o", max_tokens=4096, temperature=0, api_key=os.environ["OPENAI_API_KEY"]
)

# Set up a MIPROv2 optimizer, which will compile our RAG program.
optimizer = MIPROv2(
    metric=MetricWrapper,
    prompt_model=teacher,
    task_model=llm,
    num_candidates=3,
    init_temperature=0.7,
)

# Initialize langwatch for this run, to track the optimizer compilation
langwatch.dspy.init(experiment="weaviate-blog-rag-experiment", optimizer=optimizer)

# Compile
compiled_rag = optimizer.compile(
    RAG(),
    trainset=trainset,
    num_batches=10,
    max_bootstrapped_demos=5,
    max_labeled_demos=5,
    eval_kwargs=dict(num_threads=16, display_progress=True, display_table=0),
)

Screenshot:

![optimization screenshot](./optimization_screenshot.png)

In [115]:
complied_score = evaluate(compiled_rag, metric=MetricWrapper)
print(complied_score)

print(f"Congratulations! We optimized the RAG program and bumped the score from {uncomplied_score} to {complied_score}!")

430.0
Congratulations! We optimized the RAG program and bumped the score from 422.5 to 430.0!


## 8. Save the optimized RAG

You can now use your optimized RAG program for inference

In [None]:
compiled_rag.save("optimized_rag.json")

## 9. Final Step: Instrument your DSPy program for Production

Now that you have your optimized RAG, you are ready to deploy, but usage in production can also be unpredictable.

To keep track of which documents are being retrieved and used by your RAG in production, you can use the `langwatch.trace()` decorator to instrument your DSPy program ([docs for more details](https://docs.langwatch.ai/integration/python/guide#capturing-llm-spans))

In [5]:
import langwatch

compiled_rag = RAG()
compiled_rag.load("optimized_rag.json")

@langwatch.trace()
def generate_response(question: str):
  langwatch.get_current_trace().autotrack_dspy()
  public_url = langwatch.get_current_trace().share()
  print(f"Trace Public URL: {public_url}")

  return compiled_rag(question=question)


generate_response(dev_example.question)

Trace Public URL: https://app.langwatch.ai/share/iNwGfWN3E3EzkoNjQxwK1


Prediction(
    context=['We can then vectorize this text description using off-the-shelf models from OpenAI, Cohere, HuggingFace, and others to unlock semantic search. We recently presented an example of this idea for [AirBnB listings](https://weaviate.io/blog/generative-feedback-loops-with-llms), translating tabular data about each property’s price, neighborhood, and more into a text description. Huge thanks to Svitlana Smolianova for creating the following animation of the concept. <img\n    src={require(\'./img/gen-example.gif\').default}\n    alt="alt"\n    style={{ width: "100%" }}\n/>\n\n### Text Chunking\nSimilarly related to the 512 token length for vectorizing text chunks, we may consider using the Large Language Model to identify good places to cut up text chunks. For example, if we have a list of items, it might not be best practice to separate the list into 2 chunks because the first half fell into the tail end of a chunk[:512] loop.', '### Summarization Indexes\nVector em

Screenshot:

![Tracing Screenshot](./tracing_screenshot.png)