[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/weaviate/recipes/blob/main/integrations/llm-agent-frameworks/dspy/1.Getting-Started-with-RAG-in-DSPy.ipynb)

# Getting Started with RAG in DSPy

This notebook will show you how to use DSPy to compile a RAG program! DSPy compilation is a fairly new tool for LLM developers, so let's start with an overview of the concept. By `compiling`, we mean finding the prompts that elicit the behavior we want from LLMs when connected in some kind of pipeline.

For example, RAG is a very common LLM pipeline. In it's simplest form, RAG consists of 2 steps, (1) Retrieve and (2) Answer a Question. Part (2), Answering a Question, has an associated prompt, for example, people generally use:

```
--

Please answer the question based on the following context.

context  {context}

question {question}

--
```

This prompt may be a good initial point for an LLM to understand the task. However, it is not the *optimal* prompt. DSPy optimizes the prompt for you by jointly (1) tweaking the instructions, such as rewriting an initial prompt like: 

```
Please answer the question based on the following context.
```

to 

```
Assess the context and answer the given questions that are predominantly about software usage, process optimization, and troubleshooting. Focus on providing accurate information related to tech or software-related queries.
```

Further, DSPy (2) finds examples of desired input-outputs in the prompt to further improve performance, also known as `In-Context Learning`. In this example, we will begin with the simple prompt: `Please answer the question based on the following context.` and end up with:

```

```

In order to leverage black-box optimization techniques like random search, bayesian optimization, or evolutionary algorithms, we need a metric. Coming up with metrics to describe desired system behavior has been a longstanding challenge in Machine Learning research. Excitingly, LLMs have made amazing progress. For example, we can evaluate a RAG answer by prompting an LLM with, `Is the assessed text grounded in the context? Say no if it includes significant facts not in the context`. We then optimize the RAG program to increase the metric LLM's assessment of answer quality.

This example contains 4 parts:

- 0: DSPy Settings and Installation
- 1: DSPy Datasets with `dspy.Example`
- 2: LLM Metrics in DSPy
- 3: LLM Programming with `dspy.Module`
- 4: Optimization with `BootstrapFewShot`, `BootstrapFewShotRandomSearch`, and `BayesianSignatureOptimizer`.


We are using 2 datasets for this example. Firstly, we have an index of the Weaviate Blog Posts. We will use the Weaviate Blog Posts as the retrieved context to help with our second dataset, the Weaviate FAQs. The Weaviate FAQs consists of 44 question-answer pairs of frequently asked Weaviate questions such as: `Do I need to know about Docker (Compose) to use Weaviate?`

We isolate 10 examples to use as our test set and optimize our program with the remaining 34.

Our uncompiled RAG program achieves a score of 270 on the held-out test set.

Our RAG program compiled with the `BayesianSignatureOptimizer` achieves a score of 340! A ~30% improvement!

# 0: DSPy Settings and Installations

In [None]:
!pip install dspy-ai==2.1.9 weaviate-client==3.26.2 > /dev/null

In [1]:
import openai
openai.api_key = "sk-foobar"

In [4]:
# Connect to Weaviate Retriever and configure LLM
import dspy
from dspy.retrieve.weaviate_rm import WeaviateRM
import weaviate
import openai


llm = dspy.OpenAI(model="gpt-3.5-turbo")

# ollamaLLM = dspy.OpenAI(api_base="http://localhost:11434/v1/", api_key="ollama", model="mistral-7b-instruct-v0.2-q6_K", stop='\n\n', model_type='chat')
# Thanks Knox! https://twitter.com/JR_Knox1977/status/1756018720818794916/photo/1

weaviate_client = weaviate.Client("http://localhost:8080")
retriever_model = WeaviateRM("WeaviateBlogChunk", weaviate_client=weaviate_client)
# Assumes the Weaviate collection has a text key `content`
dspy.settings.configure(lm=llm, rm=retriever_model)

            Please consider upgrading to the latest version. See https://weaviate.io/developers/weaviate/client-libraries/python for details.


In [6]:
print(dspy.settings.lm("Write a 3 line poem about neural networks."))
context_example = dspy.OpenAI(model="gpt-4")

with dspy.context(llm=context_example):
    print(context_example("Write a 3 line poem about neural networks."))

['Neurons intertwine,\nSynapses ignite, design,\nMinds in code align.']
['In the realm of silicon thought,\nNeural networks, with knowledge fraught.\nA dance of data, endlessly taught.']


# 1. DSPy Datasets with `dspy.Example`

Our retrieval engine is filled with chunks from the Weaviate Blogs.

Please see weaviate/recipes/integrations/dspy/Weaviate-Import.ipynb for a full tutorial.

# Import FAQs from a markdown file

In [7]:
# Load FAQs
import re

f = open("faq.md")
markdown_content = f.read()

def parse_questions(markdown_content):
    # Regular expression pattern for finding questions
    question_pattern = r'#### Q: (.+?)\n'

    # Finding all questions
    questions = re.findall(question_pattern, markdown_content, re.DOTALL)

    return questions

# Parsing the markdown content to get only questions
questions = parse_questions(markdown_content)

# Displaying the first few extracted questions
questions[:5]  # Displaying only the first few for brevity

['Why would I use Weaviate as my vector database?',
 'What is the difference between Weaviate and for example Elasticsearch?',
 'Do you offer Weaviate as a managed service?',
 'How should I configure the size of my instance?',
 'Do I need to know about Docker (Compose) to use Weaviate?']

In [8]:
len(questions)

44

# Wrap each FAQ into an `Example` object

The dspy `Example` object optionally lets you attach metadata, or additional labels, to input/output pairs.

For example, you may want to jointly supervise the answer as well as the context the retrieval system produced to feed into the answer generator.

In [9]:
# Load into dspy datasets
import dspy

# ToDo, add random splitting -- maybe wrap this entire thing in a cross-validation loop
trainset = questions[:20] # 20 examples for training
devset = questions[20:30] # 10 examples for development
testset = questions[30:] # 14 examples for testing

trainset = [dspy.Example(question=question).with_inputs("question") for question in trainset]
devset = [dspy.Example(question=question).with_inputs("question") for question in devset]
testset = [dspy.Example(question=question).with_inputs("question") for question in testset]

In [12]:
devset[0]

Example({'question': 'Is there support to multiple versions of the query/document embedding models to co-exist at a given time? (helps with live experiments of new model versions)'}) (input_keys={'question'})

# 2. LLM Metrics

Define a Metric for Performance.

In [17]:
# This is a WIP, the next step is to optimize this metric as itself a DSPy module (pretty meta)

# Reference - https://github.com/stanfordnlp/dspy/blob/main/examples/tweets/tweet_metric.py

metricLM = dspy.OpenAI(model='gpt-4', max_tokens=1000, model_type='chat')

# Signature for LLM assessments.

class Assess(dspy.Signature):
    """Assess the quality of an answer to a question."""
    
    context = dspy.InputField(desc="The context for answering the question.")
    assessed_question = dspy.InputField(desc="The evaluation criterion.")
    assessed_answer = dspy.InputField(desc="The answer to the question.")
    assessment_answer = dspy.OutputField(desc="A rating between 1 and 5. Only output the rating and nothing else.")

def llm_metric(gold, pred, trace=None):
    predicted_answer = pred.answer
    question = gold.question
    
    print(f"Test Question: {question}")
    print(f"Predicted Answer: {predicted_answer}")
    
    detail = "Is the assessed answer detailed?"
    faithful = "Is the assessed text grounded in the context? Say no if it includes significant facts not in the context."
    overall = f"Please rate how well this answer answers the question, `{question}` based on the context.\n `{predicted_answer}`"
    
    with dspy.context(lm=metricLM):
        context = dspy.Retrieve(k=5)(question).passages
        detail = dspy.ChainOfThought(Assess)(context="N/A", assessed_question=detail, assessed_answer=predicted_answer)
        faithful = dspy.ChainOfThought(Assess)(context=context, assessed_question=faithful, assessed_answer=predicted_answer)
        overall = dspy.ChainOfThought(Assess)(context=context, assessed_question=overall, assessed_answer=predicted_answer)
    
    print(f"Faithful: {faithful.assessment_answer}")
    print(f"Detail: {detail.assessment_answer}")
    print(f"Overall: {overall.assessment_answer}")
    
    
    total = float(detail.assessment_answer) + float(faithful.assessment_answer)*2 + float(overall.assessment_answer)
    
    return total / 5.0

## Inspect the metric

In [18]:
test_example = dspy.Example(question="What do cross encoders do?")
test_pred = dspy.Example(answer="They re-rank documents.")

type(llm_metric(test_example, test_pred))

Test Question: What do cross encoders do?
Predicted Answer: predicted_answer
Faithful: 5
Detail: 1
Overall: 5


float

In [19]:
test_example = dspy.Example(question="What do cross encoders do?")
test_pred = dspy.Example(answer="They index data.")

type(llm_metric(test_example, test_pred))

Test Question: What do cross encoders do?
Predicted Answer: predicted_answer
Faithful: 1
Detail: 1
Overall: 1


float

In [20]:
metricLM.inspect_history(n=3)





Assess the quality of an answer to a question.

---

Follow the following format.

Context: The context for answering the question.

Assessed Question: The evaluation criterion.

Assessed Answer: The answer to the question.

Reasoning: Let's think step by step in order to ${produce the assessment_answer}. We ...

Assessment Answer: A rating between 1 and 5. Only output the rating and nothing else.

---

Context: N/A

Assessed Question: Is the assessed answer detailed?

Assessed Answer: They index data.

Reasoning: Let's think step by step in order to[32m produce the assessment answer. We need to consider if the answer provides enough detail to fully answer the question. In this case, the answer "They index data" is very vague and does not provide any specific details about who "they" are or what it means to "index data". 

Assessment Answer: 1[0m







Assess the quality of an answer to a question.

---

Follow the following format.

Context: The context for answering the questi

# 3. The DSPy Programming Model

This block of first code will initilaize the `GenerateAnswer` signature.

Then we will compose a `dspy.Module` consisting of:
- Retrieve
- GenerateAnswer

The DSPy programming model is one of the most powerful aspects of DSPy, we get:
- An intuitive interface to compose prompts into programs.
- A clean way to organize prompts into Signatures.
- Structured output parsing with `dspy.OutputField`
- Built-in prompt extensions such as `ChainOfThought`, `ReAct`, and more!

In [23]:
class GenerateAnswer(dspy.Signature):
    """Answer questions based on the context."""
    
    context = dspy.InputField(desc="may contain relevant facts")
    question = dspy.InputField()
    answer = dspy.OutputField()

In [24]:
class RAG(dspy.Module):
    def __init__(self, num_passages=3):
        super().__init__()
        
        self.retrieve = dspy.Retrieve(k=num_passages)
        self.generate_answer = dspy.ChainOfThought(GenerateAnswer)
    
    def forward(self, question):
        context = self.retrieve(question).passages
        prediction = self.generate_answer(context=context, question=question)
        return dspy.Prediction(answer=prediction.answer)

# A little more info on built-in dspy modules

The DSPy programming model gives you a lot of cool features out of the box. Observe how different modules implement signatures with additional prompting techniques like `ChainOfThought` and `ReAct`. `Predict` is the base class to observe what a standrd prompt looks like without the module extensions.

### dspy.Predict

In [30]:
dspy.Predict(GenerateAnswer)(question="What are Cross Encoders?")
llm.inspect_history(n=1)





Answer questions based on the context.

---

Follow the following format.

Context: may contain relevant facts
Question: ${question}
Answer: ${answer}

---

Question: What are Cross Encoders?
Answer:[32m Context: Cross Encoders are a type of neural network model used in natural language processing tasks. They are designed to encode pairs of sentences and generate a similarity score between them.

Question: How do Cross Encoders work?
Answer: Cross Encoders work by taking two sentences as input and encoding them into fixed-length vectors. These vectors are then compared using a similarity metric to determine the similarity score between the sentences.

Question: What is the purpose of Cross Encoders?
Answer: The purpose of Cross Encoders is to measure the semantic similarity between pairs of sentences. They can be used in various applications such as question answering, text classification, and information retrieval.

Question: How are Cross Encoders different from other neural net

### dspy.ChainOfThought

In [31]:
dspy.ChainOfThought(GenerateAnswer)(question="What are Cross Encoders?")
llm.inspect_history(n=1)





Answer questions based on the context.

---

Follow the following format.

Context: may contain relevant facts

Question: ${question}

Reasoning: Let's think step by step in order to ${produce the answer}. We ...

Answer: ${answer}

---

Context:
produce the answer. We will first define what cross encoders are and then explain their purpose.

Answer: Cross Encoders are a type of neural network model that are used in natural language processing tasks. They are designed to encode pairs of sentences or documents and produce a similarity score between them. The purpose of cross encoders is to capture the semantic relationship between two pieces of text and determine how similar or related they are.

Question: What are Cross Encoders?

Reasoning: Let's think step by step in order to[32m produce the answer. We will first define what cross encoders are and then explain their purpose.

Answer: Cross Encoders are a type of neural network model that are used in natural language processing t

### dspy.ReAct

In [32]:
dspy.ReAct(GenerateAnswer, tools=[dspy.settings.rm])(question="What are cross encoders?")
llm.inspect_history(n=1)





You will be given `context`, `question` and you will respond with `answer`.

To do this, you will interleave Thought, Action, and Observation steps.

Thought can reason about the current situation, and Action can be the following types:

(1) Search[query], which takes a search query and returns one or more potentially relevant passages from a corpus
(2) Finish[answer], which returns the final `answer` and finishes the task

---

Follow the following format.

Context: may contain relevant facts

Question: ${question}

Thought 1: next steps to take based on last observation

Action 1: always either Search[query] or, when done, Finish[answer]

Observation 1: observations based on action

Thought 2: next steps to take based on last observation

Action 2: always either Search[query] or, when done, Finish[answer]

---

Context:
Cross encoders are a type of neural network model used in natural language processing tasks. They are designed to encode pairs of sentences or documents and captu

# Initialize DSPy Program

In [33]:
uncompiled_rag = RAG()

# Test uncompiled inference 

In [34]:
print(uncompiled_rag("What are re-rankers in search engines?").answer)

Re-rankers in search engines are algorithms or models that are used to reorder or re-rank search results based on additional criteria or features. They can be used to improve the relevance and personalization of search results by considering factors such as user preferences, document features, metadata, and context.


# Check the last call to the LLM

In [35]:
llm.inspect_history(n=1)





Answer questions based on the context.

---

Follow the following format.

Context: may contain relevant facts

Question: ${question}

Reasoning: Let's think step by step in order to ${produce the answer}. We ...

Answer: ${answer}

---

Context:
[1] «They offer the advantage of further reasoning about the relevance of results without needing specialized training. Cross Encoders can be interfaced with Weaviate to re-rank search results, trading off performance for slower search speed. * **Metadata Rankers** are context-based re-rankers that use symbolic features to rank relevance. They take into account user and document features, such as age, gender, location, preferences, release year, genre, and box office, to predict the relevance of candidate documents. By incorporating metadata features, these rankers offer a more personalized and context-aware search experience.»
[2] «Taken directly from the paper, “Our findings indicate that cross-encoder re-rankers can efficiently be impro

# 4. DSPy Optimization

# Evaluate our RAG Program before it is compiled

In [38]:
# Reminder our dataset looks like this:

devset[0]

Example({'question': 'Is there support to multiple versions of the query/document embedding models to co-exist at a given time? (helps with live experiments of new model versions)'}) (input_keys={'question'})

In [39]:
from dspy.evaluate.evaluate import Evaluate

evaluate = Evaluate(devset=devset, num_threads=1, display_progress=True, display_table=5)

evaluate(RAG(), metric=llm_metric)


  0%|                                                    | 0/10 [00:00<?, ?it/s]

Test Question: Is there support to multiple versions of the query/document embedding models to co-exist at a given time? (helps with live experiments of new model versions)
Predicted Answer: predicted_answer


Average Metric: 2.0 / 1  (200.0):  10%|█         | 1/10 [00:00<00:04,  2.03it/s]

Faithful: 1
Detail: 3
Overall: 5
Test Question: How can I retrieve the total object count in a class?
Predicted Answer: predicted_answer


Average Metric: 3.4 / 2  (170.0):  20%|██        | 2/10 [00:00<00:03,  2.28it/s]

Faithful: 1
Detail: 4
Overall: 1
Test Question: How do I get the cosine similarity from Weaviate's certainty?
Predicted Answer: predicted_answer


Average Metric: 5.0 / 3  (166.7):  30%|███       | 3/10 [00:01<00:03,  2.05it/s]

Faithful: 1
Detail: 5
Overall: 1


Average Metric: 7.6 / 4  (190.0):  40%|████      | 4/10 [00:02<00:04,  1.24it/s]

Test Question: The quality of my search results change depending on the specified limit. Why? How can I fix this?
Predicted Answer: predicted_answer
Faithful: 2
Detail: 5
Overall: 4
Test Question: Why did you use GraphQL instead of SPARQL?
Predicted Answer: predicted_answer


Average Metric: 9.6 / 5  (192.0):  50%|█████     | 5/10 [00:03<00:04,  1.09it/s]

Faithful: 1
Detail: 3
Overall: 5
Test Question: What is the best way to iterate through objects? Can I do paginated API calls?
Predicted Answer: predicted_answer


Average Metric: 11.2 / 6  (186.7):  60%|█████▍   | 6/10 [00:04<00:03,  1.30it/s]

Faithful: 1
Detail: 1
Overall: 5
Test Question: What is best practice for updating data?
Predicted Answer: predicted_answer


Average Metric: 14.6 / 7  (208.6):  70%|██████▎  | 7/10 [00:04<00:01,  1.56it/s]

Faithful: 5
Detail: 3
Overall: 4
Test Question: Can I connect my own module?
Predicted Answer: predicted_answer


Average Metric: 18.6 / 8  (232.5):  80%|███████▏ | 8/10 [00:05<00:01,  1.76it/s]

Faithful: 5
Detail: 5
Overall: 5
Test Question: Can I train my own text2vec-contextionary vectorizer module?
Predicted Answer: predicted_answer


Average Metric: 22.6 / 9  (251.1):  90%|████████ | 9/10 [00:05<00:00,  1.84it/s]

Faithful: 5
Detail: 5
Overall: 5
Test Question: Does Weaviate use Hnswlib?
Predicted Answer: predicted_answer


Average Metric: 24.400000000000002 / 10  (244.0): 100%|█| 10/10 [00:06<00:00,  1

Faithful: 1
Detail: 2
Overall: 5
Average Metric: 24.400000000000002 / 10  (244.0%)



  df = df.applymap(truncate_cell)


Unnamed: 0,question,answer,llm_metric
0,Is there support to multiple versions of the query/document embedding models to co-exist at a given time? (helps with live experiments of new model versions),No specific information is provided in the given context about support for multiple versions of the query/document embedding models to co-exist at a given time.,2.0
1,How can I retrieve the total object count in a class?,"To retrieve the total object count in a class, you can use the ""count"" function provided by the programming language or framework you are using....",1.4
2,How do I get the cosine similarity from Weaviate's certainty?,"Weaviate does not directly provide the cosine similarity from its certainty value. The certainty value in Weaviate represents the confidence level of the result, but...",1.6
3,The quality of my search results change depending on the specified limit. Why? How can I fix this?,The quality of search results changes depending on the specified limit because language models are constrained by input length and can only provide a limited...,2.6
4,Why did you use GraphQL instead of SPARQL?,"The given context does not provide any information about the use of GraphQL instead of SPARQL. Therefore, we cannot determine the reason for using GraphQL...",2.0


244.0

# Metric Analysis

The maximum value per rating is (5 + 5*2 + 5) / 5 = 4

4 * 10 test questions = 40

In [40]:
llm.inspect_history(n=1)





Answer questions based on the context.

---

Follow the following format.

Context: may contain relevant facts

Question: ${question}

Reasoning: Let's think step by step in order to ${produce the answer}. We ...

Answer: ${answer}

---

Context:
[1] «There is no performance gain if we have more parallelization than there are available CPUs. However, each thread needs additional memory. So with 32 parallel imports, we had the worst of both worlds: High memory usage and no performance gains beyond 8. With the fix, even if you import from multiple clients, Weaviate automatically handles the parallelization to ensure that it does not exceed the number of CPU cores. As a result, you get the maximum performance without "unnecessary" memory usage. ### HNSW optimization

Next, we optimized memory allocations for the HNSW (vector) index.»
[2] «The past for vector searching definitely was not a “simpler time”, and the appeal of modern vector databases like Weaviate is pretty clear given thi

In [41]:
metricLM.inspect_history(n=3)





Assess the quality of an answer to a question.

---

Follow the following format.

Context: The context for answering the question.

Assessed Question: The evaluation criterion.

Assessed Answer: The answer to the question.

Reasoning: Let's think step by step in order to ${produce the assessment_answer}. We ...

Assessment Answer: A rating between 1 and 5. Only output the rating and nothing else.

---

Context: N/A

Assessed Question: Is the assessed answer detailed?

Assessed Answer: No information is provided in the given context about whether Weaviate uses Hnswlib or not.

Reasoning: Let's think step by step in order to[32m produce the assessment answer. We can see that the answer is not detailed. It simply states that there is no information provided in the given context, but it does not provide any additional information or insight into the topic at hand. 

Assessment Answer: 2[0m







Assess the quality of an answer to a question.

---

Follow the following format.

Cont

# BootstrapFewShot

In [42]:
from dspy.teleprompt import BootstrapFewShot

teleprompter = BootstrapFewShot(metric=llm_metric, max_labeled_demos=8, max_rounds=3)

# also common to init here, e.g. Rag()
compiled_rag = teleprompter.compile(uncompiled_rag, trainset=trainset)


  0%|                                                    | 0/20 [00:00<?, ?it/s]

Test Question: Why would I use Weaviate as my vector database?
Predicted Answer: predicted_answer



  5%|██▏                                         | 1/20 [00:19<06:15, 19.74s/it]

Faithful: 5
Detail: 5
Overall: 5
Test Question: What is the difference between Weaviate and for example Elasticsearch?
Predicted Answer: predicted_answer



 10%|████▍                                       | 2/20 [00:40<06:02, 20.17s/it]

Faithful: No
Detail: 1
Overall: 5
Failed to run or to evaluate example Example({'question': 'What is the difference between Weaviate and for example Elasticsearch?'}) (input_keys={'question'}) with <function llm_metric at 0x281e95360> due to could not convert string to float: 'No'.
Test Question: Do you offer Weaviate as a managed service?
Predicted Answer: predicted_answer



 15%|██████▌                                     | 3/20 [01:12<07:15, 25.64s/it]

Faithful: 1
Detail: 5
Overall: 1
Test Question: How should I configure the size of my instance?
Predicted Answer: predicted_answer



 20%|████████▊                                   | 4/20 [01:30<06:04, 22.79s/it]

Faithful: 1
Detail: 2
Overall: 5
Test Question: Do I need to know about Docker (Compose) to use Weaviate?
Predicted Answer: predicted_answer


 25%|███████████                                 | 5/20 [01:51<05:34, 22.32s/it]


Faithful: 5
Detail: 5
Overall: 5


  0%|                                                    | 0/20 [00:00<?, ?it/s]
  0%|                                                    | 0/20 [00:00<?, ?it/s]

Bootstrapped 4 full traces after 1 examples in round 2.





### Inspect the compiled prompt

In [48]:
compiled_rag("What do cross encoders do?").answer

'Cross encoders are ranking models used for content-based re-ranking. They take a query and a document as input and output a score indicating the relevance of the document to the query. They can be interfaced with Weaviate to re-rank search results.'

In [49]:
llm.inspect_history(n=1)





Answer questions based on the context.

---

Follow the following format.

Context: may contain relevant facts

Question: ${question}

Reasoning: Let's think step by step in order to ${produce the answer}. We ...

Answer: ${answer}

---

Context:
[1] «Then we will see how the text vectorization process can be tweaked, before wrapping up by discussing a few considerations also. ## Background

I often find myself saying that Weaviate makes it fast and easy to produce a vector database from text. But it can be easy to forget just how fast and how easy it can make things. It is true that even in the “old days” of say, five to ten years ago, producing a database with vector capabilities was technically possible. You *simply* had to (*inhales deeply*) develop a vectorization algorithm, vectorize the data, build a vector index, build a database with the underlying data, integrate the vector index with the database, then forward results from a vector index query to the database and combine

### Evaluate the Compiled RAG Program

In [52]:
evaluate(compiled_rag, metric=llm_metric)



  0%|                                                    | 0/10 [00:00<?, ?it/s][A[A

Test Question: Is there support to multiple versions of the query/document embedding models to co-exist at a given time? (helps with live experiments of new model versions)
Predicted Answer: predicted_answer




Average Metric: 2.2 / 1  (220.0):   0%|                  | 0/10 [00:00<?, ?it/s][A[A

Average Metric: 2.2 / 1  (220.0):  10%|█         | 1/10 [00:00<00:03,  2.46it/s][A[A

Faithful: 1
Detail: 4
Overall: 5
Test Question: How can I retrieve the total object count in a class?
Predicted Answer: predicted_answer




Average Metric: 5.6 / 2  (280.0):  10%|█         | 1/10 [00:00<00:03,  2.46it/s][A[A

Average Metric: 5.6 / 2  (280.0):  20%|██        | 2/10 [00:00<00:03,  2.01it/s][A[A

Faithful: 5
Detail: 2
Overall: 5
Test Question: How do I get the cosine similarity from Weaviate's certainty?
Predicted Answer: predicted_answer




Average Metric: 7.199999999999999 / 3  (240.0):  20%|▏| 2/10 [00:01<00:03,  2.01[A[A

Average Metric: 7.199999999999999 / 3  (240.0):  30%|▎| 3/10 [00:01<00:03,  1.91[A[A

Faithful: 1
Detail: 1
Overall: 5
Test Question: The quality of my search results change depending on the specified limit. Why? How can I fix this?
Predicted Answer: predicted_answer




Average Metric: 11.2 / 4  (280.0):  30%|██▋      | 3/10 [00:02<00:03,  1.91it/s][A[A

Average Metric: 11.2 / 4  (280.0):  40%|███▌     | 4/10 [00:02<00:03,  1.95it/s][A[A

Faithful: 5
Detail: 5
Overall: 5
Test Question: Why did you use GraphQL instead of SPARQL?
Predicted Answer: predicted_answer




Average Metric: 12.799999999999999 / 5  (256.0):  40%|▍| 4/10 [00:02<00:03,  1.9[A[A

Average Metric: 12.799999999999999 / 5  (256.0):  50%|▌| 5/10 [00:02<00:03,  1.4[A[A

Faithful: 1
Detail: 1
Overall: 5
Test Question: What is the best way to iterate through objects? Can I do paginated API calls?
Predicted Answer: predicted_answer




Average Metric: 14.6 / 6  (243.3):  50%|████▌    | 5/10 [00:13<00:03,  1.48it/s][A[A

Average Metric: 14.6 / 6  (243.3):  60%|█████▍   | 6/10 [00:13<00:15,  3.90s/it][A[A

Faithful: 1
Detail: 2
Overall: 5
Test Question: What is best practice for updating data?
Predicted Answer: predicted_answer




Average Metric: 16.6 / 7  (237.1):  60%|█████▍   | 6/10 [00:35<00:15,  3.90s/it][A[A

Average Metric: 16.6 / 7  (237.1):  70%|██████▎  | 7/10 [00:35<00:30, 10.01s/it][A[A

Faithful: 2
Detail: 1
Overall: 5
Test Question: Can I connect my own module?
Predicted Answer: predicted_answer




Average Metric: 19.8 / 8  (247.5):  70%|██████▎  | 7/10 [01:03<00:30, 10.01s/it][A[A

Average Metric: 19.8 / 8  (247.5):  80%|███████▏ | 8/10 [01:03<00:31, 15.61s/it][A[A

Faithful: 4
Detail: 5
Overall: 3
Test Question: Can I train my own text2vec-contextionary vectorizer module?
Predicted Answer: predicted_answer




Average Metric: 23.8 / 9  (264.4):  80%|███████▏ | 8/10 [01:22<00:31, 15.61s/it][A[A

Average Metric: 23.8 / 9  (264.4):  90%|████████ | 9/10 [01:22<00:16, 16.87s/it][A[A

Faithful: 5
Detail: 5
Overall: 5
Test Question: Does Weaviate use Hnswlib?
Predicted Answer: predicted_answer




Average Metric: 27.400000000000002 / 10  (274.0):  90%|▉| 9/10 [01:50<00:16, 16.[A[A

Average Metric: 27.400000000000002 / 10  (274.0): 100%|█| 10/10 [01:50<00:00, 11[A[A

Faithful: 4
Detail: 5
Overall: 5
Average Metric: 27.400000000000002 / 10  (274.0%)





Unnamed: 0,question,answer,llm_metric
0,Is there support to multiple versions of the query/document embedding models to co-exist at a given time? (helps with live experiments of new model versions),The given context does not provide any information about whether there is support for multiple versions of the query/document embedding models to co-exist at a...,2.2
1,How can I retrieve the total object count in a class?,The given context does not provide any information about how to retrieve the total object count in a class.,3.4
2,How do I get the cosine similarity from Weaviate's certainty?,The given context does not provide any information about how to get the cosine similarity from Weaviate's certainty.,1.6
3,The quality of my search results change depending on the specified limit. Why? How can I fix this?,The quality of search results can change depending on the specified limit because language models are constrained by input length. They can only provide a...,4.0
4,Why did you use GraphQL instead of SPARQL?,The given context does not provide any information about why GraphQL was used instead of SPARQL.,1.6


274.0

# BootstrapFewShotWithRandomSearch

In [None]:
# Accidentally spent $12 on this with `num_candidate_programs=20`, caution!

In [58]:
from dspy.teleprompt import BootstrapFewShotWithRandomSearch

teleprompter = BootstrapFewShotWithRandomSearch(metric=llm_metric, 
                                                max_bootstrapped_demos=4,
                                                max_labeled_demos=4, 
                                                max_rounds=1,
                                                num_candidate_programs=2,
                                                num_threads=2)

# also common to init here, e.g. Rag()
second_compiled_rag = teleprompter.compile(uncompiled_rag, trainset=trainset)

Going to sample between 1 and 4 traces per predictor.
Will attempt to train 2 candidate sets.




  0%|                                                    | 0/20 [00:00<?, ?it/s][A[A

Test Question: Why would I use Weaviate as my vector database?
Predicted Answer: predicted_answer
Test Question: What is the difference between Weaviate and for example Elasticsearch?
Predicted Answer: predicted_answer




Average Metric: 0.0 / 1  (0.0):   0%|                    | 0/20 [00:00<?, ?it/s][A[A

Average Metric: 0.0 / 1  (0.0):   5%|▌           | 1/20 [00:00<00:08,  2.20it/s][A[A

Average Metric: 4.0 / 2  (200.0):   5%|▌         | 1/20 [00:00<00:08,  2.20it/s][A[A

Average Metric: 4.0 / 2  (200.0):  10%|█         | 2/20 [00:00<00:04,  3.98it/s][A[A

Faithful: No
Detail: 1
Overall: 5
Error for example in dev set: 		 could not convert string to float: 'No'
Faithful: 5
Detail: 5
Overall: 5
Test Question: How should I configure the size of my instance?
Predicted Answer: predicted_answer
Test Question: Do you offer Weaviate as a managed service?
Predicted Answer: predicted_answer




Average Metric: 5.8 / 3  (193.3):  10%|█         | 2/20 [00:00<00:04,  3.98it/s][A[A

Average Metric: 5.8 / 3  (193.3):  15%|█▌        | 3/20 [00:00<00:05,  3.13it/s][A[A

Faithful: 1
Detail: 2
Overall: 5
Test Question: Do I need to know about Docker (Compose) to use Weaviate?
Predicted Answer: predicted_answer




Average Metric: 9.8 / 4  (245.0):  15%|█▌        | 3/20 [00:01<00:05,  3.13it/s][A[A

Average Metric: 9.8 / 4  (245.0):  20%|██        | 4/20 [00:01<00:05,  2.70it/s][A[A

Average Metric: 11.4 / 5  (228.0):  20%|█▊       | 4/20 [00:01<00:05,  2.70it/s][A[A

Faithful: 5
Detail: 5
Overall: 5
Faithful: 1
Detail: 5
Overall: 1




Average Metric: 15.4 / 6  (256.7):  25%|██▎      | 5/20 [00:01<00:05,  2.70it/s][A[A

Test Question: Are there any 'best practices' or guidelines to consider when designing a schema?
Predicted Answer: predicted_answer
Test Question: What happens when the Weaviate Docker container restarts? Is my data in the Weaviate database lost?
Predicted Answer: predicted_answer
Faithful: 5
Detail: 5
Overall: 5




Average Metric: 15.4 / 6  (256.7):  30%|██▋      | 6/20 [00:01<00:04,  3.47it/s][A[A

Test Question: Should I use references in my schema?
Predicted Answer: predicted_answer




Average Metric: 19.0 / 7  (271.4):  30%|██▋      | 6/20 [00:02<00:04,  3.47it/s][A[A

Average Metric: 19.0 / 7  (271.4):  35%|███▏     | 7/20 [00:02<00:05,  2.59it/s][A[A

Faithful: 4
Detail: 5
Overall: 5




Average Metric: 20.8 / 8  (260.0):  35%|███▏     | 7/20 [00:02<00:05,  2.59it/s][A[A

Average Metric: 20.8 / 8  (260.0):  40%|███▌     | 8/20 [00:02<00:04,  2.78it/s][A[A

Test Question: Is it possible to create one-to-many relationships in the schema?
Predicted Answer: predicted_answer
Faithful: 1
Detail: 5
Overall: 2




Average Metric: 22.8 / 9  (253.3):  40%|███▌     | 8/20 [00:03<00:04,  2.78it/s][A[A

Average Metric: 22.8 / 9  (253.3):  45%|████     | 9/20 [00:03<00:03,  2.89it/s][A[A

Average Metric: 24.400000000000002 / 10  (244.0):  45%|▍| 9/20 [00:03<00:03,  2.[A[A

Average Metric: 24.400000000000002 / 10  (244.0):  50%|▌| 10/20 [00:03<00:02,  3[A[A

Test Question: What is the difference between `text` and `string` and `valueText` and `valueString`?
Predicted Answer: predicted_answer
Faithful: 1
Detail: 3
Overall: 5
Faithful: 1
Detail: 1
Overall: 5




Average Metric: 26.200000000000003 / 11  (238.2):  50%|▌| 10/20 [00:03<00:02,  3[A[A

Average Metric: 26.200000000000003 / 11  (238.2):  55%|▌| 11/20 [00:03<00:02,  3[A[A

Test Question: Do Weaviate classes have namespaces?
Predicted Answer: predicted_answer
Test Question: Are there restrictions on UUID formatting? Do I have to adhere to any standards?
Predicted Answer: predicted_answer
Faithful: 1
Detail: 2
Overall: 5




Average Metric: 28.200000000000003 / 12  (235.0):  55%|▌| 11/20 [00:03<00:02,  3[A[A

Average Metric: 28.200000000000003 / 12  (235.0):  60%|▌| 12/20 [00:03<00:01,  4[A[A

Faithful: 1
Detail: 3
Overall: 5
Test Question: If I do not specify a UUID during adding data objects, will Weaviate create one automatically?
Predicted Answer: predicted_answer




Average Metric: 30.400000000000002 / 13  (233.8):  60%|▌| 12/20 [00:03<00:01,  4[A[A

Average Metric: 30.400000000000002 / 13  (233.8):  65%|▋| 13/20 [00:03<00:01,  4[A[A

Average Metric: 32.400000000000006 / 14  (231.4):  65%|▋| 13/20 [00:04<00:01,  4[A[A

Average Metric: 32.400000000000006 / 14  (231.4):  70%|▋| 14/20 [00:04<00:01,  4[A[A

Faithful: 2
Detail: 5
Overall: 2
Test Question: Can I use Weaviate to create a traditional knowledge graph?
Predicted Answer: predicted_answer
Faithful: 1
Detail: 3
Overall: 5
Test Question: Why does Weaviate have a schema and not an ontology?
Predicted Answer: predicted_answer
Test Question: What is the difference between a Weaviate data schema, ontologies and taxonomies?
Predicted Answer: predicted_answer




Average Metric: 36.00000000000001 / 15  (240.0):  70%|▋| 14/20 [00:04<00:01,  4.[A[A

Average Metric: 36.00000000000001 / 15  (240.0):  75%|▊| 15/20 [00:04<00:01,  4.[A[A

Average Metric: 38.400000000000006 / 16  (240.0):  75%|▊| 15/20 [00:04<00:01,  4[A[A

Faithful: 5
Detail: 5
Overall: 3
Faithful: 2
Detail: 5
Overall: 3
Test Question: How to deal with custom terminology?
Predicted Answer: predicted_answer




Average Metric: 40.2 / 17  (236.5):  80%|█████▌ | 16/20 [00:04<00:00,  4.04it/s][A[A

Average Metric: 40.2 / 17  (236.5):  85%|█████▉ | 17/20 [00:04<00:00,  4.68it/s][A[A

Test Question: How can you index data near-realtime without losing semantic meaning?
Predicted Answer: predicted_answer
Faithful: 1
Detail: 2
Overall: 5




Average Metric: 44.2 / 18  (245.6):  85%|█████▉ | 17/20 [00:04<00:00,  4.68it/s][A[A

Average Metric: 44.2 / 18  (245.6):  90%|██████▎| 18/20 [00:04<00:00,  4.97it/s][A[A

Faithful: 5
Detail: 5
Overall: 5
Test Question: Why isn't there a text2vec-contextionary in my language?
Predicted Answer: predicted_answer




Average Metric: 45.800000000000004 / 19  (241.1):  90%|▉| 18/20 [00:05<00:00,  4[A[A

Average Metric: 45.800000000000004 / 19  (241.1):  95%|▉| 19/20 [00:05<00:00,  4[A[A

Test Question: How do you deal with words that have multiple meanings?
Predicted Answer: predicted_answer
Faithful: 1
Detail: 1
Overall: 5




Average Metric: 48.6 / 20  (243.0):  95%|██████▋| 19/20 [00:05<00:00,  4.32it/s][A[A

Average Metric: 48.6 / 20  (243.0): 100%|███████| 20/20 [00:05<00:00,  3.70it/s][A[A


Faithful: 5
Detail: 2
Overall: 2
Average Metric: 48.6 / 20  (243.0%)
Score: 243.0 for set: [0]
New best score: 243.0 for seed -3
Scores so far: [243.0]
Best score: 243.0




  0%|                                                    | 0/20 [00:00<?, ?it/s][A[A

Average Metric: 4.0 / 1  (400.0):   0%|                  | 0/20 [00:00<?, ?it/s][A[A

Average Metric: 4.0 / 1  (400.0):   5%|▌         | 1/20 [00:00<00:07,  2.61it/s][A[A

Test Question: Why would I use Weaviate as my vector database?
Predicted Answer: predicted_answer
Faithful: 5
Detail: 5
Overall: 5
Test Question: What is the difference between Weaviate and for example Elasticsearch?
Predicted Answer: predicted_answer
Test Question: Do you offer Weaviate as a managed service?
Predicted Answer: predicted_answer




Average Metric: 5.6 / 2  (280.0):   5%|▌         | 1/20 [00:00<00:07,  2.61it/s][A[A

Average Metric: 5.6 / 2  (280.0):  10%|█         | 2/20 [00:00<00:07,  2.35it/s][A[A

Average Metric: 5.6 / 3  (186.7):  10%|█         | 2/20 [00:00<00:07,  2.35it/s][A[A

Faithful: NoFaithful: 1
Detail: 5
Overall: 1

Detail: 1
Overall: 5
Error for example in dev set: 		 could not convert string to float: 'No'




Average Metric: 7.3999999999999995 / 4  (185.0):  15%|▏| 3/20 [00:01<00:07,  2.3[A[A

Average Metric: 7.3999999999999995 / 4  (185.0):  20%|▏| 4/20 [00:01<00:04,  3.5[A[A

Test Question: How should I configure the size of my instance?
Predicted Answer: predicted_answer
Test Question: Do I need to know about Docker (Compose) to use Weaviate?
Predicted Answer: predicted_answer
Faithful: 1
Detail: 2
Overall: 5




Average Metric: 11.399999999999999 / 5  (228.0):  20%|▏| 4/20 [00:01<00:04,  3.5[A[A

Average Metric: 11.399999999999999 / 5  (228.0):  25%|▎| 5/20 [00:01<00:04,  3.3[A[A

Average Metric: 14.999999999999998 / 6  (250.0):  25%|▎| 5/20 [00:01<00:04,  3.3[A[A

Average Metric: 14.999999999999998 / 6  (250.0):  30%|▎| 6/20 [00:01<00:03,  4.0[A[A

Test Question: What happens when the Weaviate Docker container restarts? Is my data in the Weaviate database lost?
Predicted Answer: predicted_answer
Faithful: 5
Detail: 5
Overall: 5
Faithful: 4
Detail: 5
Overall: 5
Test Question: Are there any 'best practices' or guidelines to consider when designing a schema?
Predicted Answer: predicted_answer




Average Metric: 19.0 / 7  (271.4):  30%|██▋      | 6/20 [00:02<00:03,  4.09it/s][A[A

Average Metric: 19.0 / 7  (271.4):  35%|███▏     | 7/20 [00:02<00:03,  3.71it/s][A[A

Test Question: Should I use references in my schema?Faithful: 5
Detail: 5
Overall: 5

Predicted Answer: predicted_answer
Test Question: Is it possible to create one-to-many relationships in the schema?
Predicted Answer: predicted_answer




Average Metric: 20.8 / 8  (260.0):  35%|███▏     | 7/20 [00:02<00:03,  3.71it/s][A[A

Average Metric: 20.8 / 8  (260.0):  40%|███▌     | 8/20 [00:02<00:03,  3.71it/s][A[A

Average Metric: 22.8 / 9  (253.3):  40%|███▌     | 8/20 [00:02<00:03,  3.71it/s][A[A

Faithful: 1
Detail: 5
Overall: 2
Faithful: 1
Detail: 3
Overall: 5




Average Metric: 24.400000000000002 / 10  (244.0):  45%|▍| 9/20 [00:02<00:02,  3.[A[A

Average Metric: 24.400000000000002 / 10  (244.0):  50%|▌| 10/20 [00:02<00:02,  4

Test Question: What is the difference between `text` and `string` and `valueText` and `valueString`?
Predicted Answer: predicted_answer
Test Question: Do Weaviate classes have namespaces?
Predicted Answer: predicted_answer
Faithful: 1
Detail: 1
Overall: 5


[A[A

Average Metric: 26.200000000000003 / 11  (238.2):  50%|▌| 10/20 [00:02<00:02,  4[A[A

Faithful: 1
Detail: 2
Overall: 5
Test Question: Are there restrictions on UUID formatting? Do I have to adhere to any standards?
Predicted Answer: predicted_answer




Average Metric: 28.200000000000003 / 12  (235.0):  55%|▌| 11/20 [00:03<00:02,  4[A[A

Average Metric: 28.200000000000003 / 12  (235.0):  60%|▌| 12/20 [00:03<00:01,  4[A[A

Test Question: If I do not specify a UUID during adding data objects, will Weaviate create one automatically?
Predicted Answer: predicted_answer
Faithful: 1
Detail: 3
Overall: 5




Average Metric: 30.400000000000002 / 13  (233.8):  60%|▌| 12/20 [00:03<00:01,  4[A[A

Faithful: 2
Detail: 5
Overall: 2
Test Question: Can I use Weaviate to create a traditional knowledge graph?
Predicted Answer: predicted_answer




Average Metric: 32.400000000000006 / 14  (231.4):  65%|▋| 13/20 [00:03<00:01,  4[A[A

Average Metric: 32.400000000000006 / 14  (231.4):  70%|▋| 14/20 [00:03<00:01,  4[A[A

Average Metric: 36.00000000000001 / 15  (240.0):  70%|▋| 14/20 [00:03<00:01,  4.[A[A

Test Question: Why does Weaviate have a schema and not an ontology?
Predicted Answer: predicted_answer
Faithful: 1
Detail: 3
Overall: 5
Faithful: 5
Detail: 5
Overall: 3
Test Question: What is the difference between a Weaviate data schema, ontologies and taxonomies?
Predicted Answer: predicted_answer
Test Question: How to deal with custom terminology?
Predicted Answer: predicted_answer




Average Metric: 38.400000000000006 / 16  (240.0):  75%|▊| 15/20 [00:04<00:01,  4[A[A

Average Metric: 38.400000000000006 / 16  (240.0):  80%|▊| 16/20 [00:04<00:00,  4[A[A

Average Metric: 40.2 / 17  (236.5):  80%|█████▌ | 16/20 [00:04<00:00,  4.04it/s][A[A

Faithful: 2
Detail: 5
Overall: 3
Faithful: 1
Detail: 5
Overall: 2




Average Metric: 44.2 / 18  (245.6):  85%|█████▉ | 17/20 [00:04<00:00,  4.04it/s][A[A

Average Metric: 44.2 / 18  (245.6):  90%|██████▎| 18/20 [00:04<00:00,  4.26it/s][A[A

Test Question: How can you index data near-realtime without losing semantic meaning?
Predicted Answer: predicted_answer
Test Question: Why isn't there a text2vec-contextionary in my language?
Predicted Answer: predicted_answer
Faithful: 5
Detail: 5
Overall: 5




Average Metric: 45.800000000000004 / 19  (241.1):  90%|▉| 18/20 [00:04<00:00,  4[A[A

Faithful: 1
Detail: 1
Overall: 5
Test Question: How do you deal with words that have multiple meanings?
Predicted Answer: predicted_answer




Average Metric: 48.6 / 20  (243.0):  95%|██████▋| 19/20 [00:04<00:00,  4.26it/s][A[A

Average Metric: 48.6 / 20  (243.0): 100%|███████| 20/20 [00:05<00:00,  3.99it/s][A[A


Faithful: 5
Detail: 2
Overall: 2
Average Metric: 48.6 / 20  (243.0%)
Score: 243.0 for set: [4]
Scores so far: [243.0, 243.0]
Best score: 243.0




  0%|                                                    | 0/20 [00:00<?, ?it/s][A[A

Test Question: Why would I use Weaviate as my vector database?
Predicted Answer: predicted_answer




  5%|██▏                                         | 1/20 [00:00<00:07,  2.38it/s][A[A

Faithful: 5
Detail: 5
Overall: 5
Test Question: What is the difference between Weaviate and for example Elasticsearch?
Predicted Answer: predicted_answer




 10%|████▍                                       | 2/20 [00:00<00:08,  2.07it/s][A[A

Faithful: No
Detail: 1
Overall: 5
Failed to run or to evaluate example Example({'question': 'What is the difference between Weaviate and for example Elasticsearch?'}) (input_keys={'question'}) with <function llm_metric at 0x281e95360> due to could not convert string to float: 'No'.




 15%|██████▌                                     | 3/20 [00:01<00:07,  2.25it/s][A[A

Test Question: Do you offer Weaviate as a managed service?
Predicted Answer: predicted_answer
Faithful: 1
Detail: 5
Overall: 1




 20%|████████▊                                   | 4/20 [00:01<00:07,  2.23it/s][A[A

Test Question: How should I configure the size of my instance?
Predicted Answer: predicted_answer
Faithful: 1
Detail: 2
Overall: 5
Test Question: Do I need to know about Docker (Compose) to use Weaviate?
Predicted Answer: predicted_answer




 25%|███████████                                 | 5/20 [00:02<00:06,  2.24it/s][A[A


Faithful: 5
Detail: 5
Overall: 5
Bootstrapped 4 full traces after 6 examples in round 0.




  0%|                                                    | 0/20 [00:00<?, ?it/s][A[A

Average Metric: 0.0 / 1  (0.0):   0%|                    | 0/20 [00:01<?, ?it/s][A[A

Average Metric: 0.0 / 1  (0.0):   5%|▌           | 1/20 [00:01<00:35,  1.87s/it][A[A

Test Question: What is the difference between Weaviate and for example Elasticsearch?
Predicted Answer: predicted_answer
Faithful: No
Detail: 1
Overall: 5
Error for example in dev set: 		 could not convert string to float: 'No'
Test Question: Why would I use Weaviate as my vector database?
Predicted Answer: predicted_answer




Average Metric: 4.0 / 2  (200.0):   5%|▌         | 1/20 [00:02<00:35,  1.87s/it][A[A

Average Metric: 4.0 / 2  (200.0):  10%|█         | 2/20 [00:02<00:21,  1.21s/it][A[A

Faithful: 5
Detail: 5
Overall: 5
Test Question: How should I configure the size of my instance?
Predicted Answer: predicted_answer
Test Question: Do you offer Weaviate as a managed service?
Predicted Answer: predicted_answer




Average Metric: 5.8 / 3  (193.3):  10%|█         | 2/20 [00:04<00:21,  1.21s/it][A[A

Average Metric: 5.8 / 3  (193.3):  15%|█▌        | 3/20 [00:04<00:27,  1.61s/it][A[A

Average Metric: 7.4 / 4  (185.0):  15%|█▌        | 3/20 [00:04<00:27,  1.61s/it][A[A

Faithful: 1
Detail: 2
Overall: 5
Faithful: 1
Detail: 5
Overall: 1


Average Metric: 16.000000000000004 / 12  (133.3):  60%|▌| 12/20 [19:00<12:40, 95
Average Metric: 12.799999999999999 / 5  (256.0):  50%|▌| 5/10 [09:08<09:08, 109.


Average Metric: 11.4 / 5  (228.0):  20%|█▊       | 4/20 [00:06<00:25,  1.61s/it][A[A

Average Metric: 11.4 / 5  (228.0):  25%|██▎      | 5/20 [00:06<00:18,  1.26s/it][A[A

Test Question: Do I need to know about Docker (Compose) to use Weaviate?
Predicted Answer: predicted_answer
Faithful: 5
Detail: 5
Overall: 5
Test Question: What happens when the Weaviate Docker container restarts? Is my data in the Weaviate database lost?
Predicted Answer: predicted_answer
Test Question: Are there any 'best practices' or guidelines to consider when designing a schema?
Predicted Answer: predicted_answer




Average Metric: 15.4 / 6  (256.7):  25%|██▎      | 5/20 [00:35<00:18,  1.26s/it][A[A

Average Metric: 15.4 / 6  (256.7):  30%|██▋      | 6/20 [00:35<02:06,  9.05s/it][A[A

Faithful: 5
Detail: 5
Overall: 5




Average Metric: 19.2 / 7  (274.3):  30%|██▋      | 6/20 [00:35<02:06,  9.05s/it][A[A

Average Metric: 19.2 / 7  (274.3):  35%|███▏     | 7/20 [00:35<01:27,  6.70s/it][A[A

Faithful: 5
Detail: 4
Overall: 5
Test Question: Should I use references in my schema?
Predicted Answer: predicted_answer
Test Question: Is it possible to create one-to-many relationships in the schema?
Predicted Answer: predicted_answer




Average Metric: 22.0 / 8  (275.0):  35%|███▏     | 7/20 [00:59<01:27,  6.70s/it][A[A

Average Metric: 22.0 / 8  (275.0):  40%|███▌     | 8/20 [00:59<02:20, 11.67s/it][A[A

Faithful: 2
Detail: 5
Overall: 5
Test Question: What is the difference between `text` and `string` and `valueText` and `valueString`?
Predicted Answer: predicted_answer




Average Metric: 26.0 / 9  (288.9):  40%|███▌     | 8/20 [01:02<02:20, 11.67s/it][A[A

Average Metric: 26.0 / 9  (288.9):  45%|████     | 9/20 [01:02<01:40,  9.10s/it][A[A

Faithful: 5
Detail: 5
Overall: 5
Test Question: Do Weaviate classes have namespaces?
Predicted Answer: predicted_answer




Average Metric: 27.8 / 10  (278.0):  45%|███▌    | 9/20 [01:04<01:40,  9.10s/it][A[A

Average Metric: 27.8 / 10  (278.0):  50%|███▌   | 10/20 [01:04<01:09,  6.99s/it][A[A

Faithful: 1
Detail: 2
Overall: 5
Test Question: Are there restrictions on UUID formatting? Do I have to adhere to any standards?
Predicted Answer: predicted_answer




Average Metric: 29.400000000000002 / 11  (267.3):  50%|▌| 10/20 [01:24<01:09,  6[A[A

Average Metric: 29.400000000000002 / 11  (267.3):  55%|▌| 11/20 [01:24<01:37, 10[A[A

Faithful: 1
Detail: 1
Overall: 5




Average Metric: 31.8 / 12  (265.0):  55%|███▊   | 11/20 [01:26<01:37, 10.79s/it][A[A

Average Metric: 31.8 / 12  (265.0):  60%|████▏  | 12/20 [01:26<01:05,  8.20s/it][A[A

Faithful: 1
Detail: 5
Overall: 5
Test Question: If I do not specify a UUID during adding data objects, will Weaviate create one automatically?
Predicted Answer: predicted_answer
Test Question: Can I use Weaviate to create a traditional knowledge graph?
Predicted Answer: predicted_answer




Average Metric: 34.0 / 13  (261.5):  60%|████▏  | 12/20 [01:51<01:05,  8.20s/it][A[A

Average Metric: 34.0 / 13  (261.5):  65%|████▌  | 13/20 [01:51<01:33, 13.35s/it][A[A

Faithful: 1
Detail: 5
Overall: 4
Test Question: Why does Weaviate have a schema and not an ontology?
Predicted Answer: predicted_answer




Average Metric: 37.2 / 14  (265.7):  65%|████▌  | 13/20 [01:55<01:33, 13.35s/it][A[A

Average Metric: 37.2 / 14  (265.7):  70%|████▉  | 14/20 [01:55<01:02, 10.49s/it][A[A

Faithful: 4
Detail: 5
Overall: 3
Test Question: What is the difference between a Weaviate data schema, ontologies and taxonomies?
Predicted Answer: predicted_answer




Average Metric: 38.800000000000004 / 15  (258.7):  70%|▋| 14/20 [02:09<01:02, 10[A[A

Average Metric: 38.800000000000004 / 15  (258.7):  75%|▊| 15/20 [02:09<00:57, 11[A[A

Faithful: 1
Detail: 1
Overall: 5
Test Question: How to deal with custom terminology?
Predicted Answer: predicted_answer




Average Metric: 40.6 / 16  (253.8):  75%|█████▎ | 15/20 [02:11<00:57, 11.56s/it][A[A

Average Metric: 40.6 / 16  (253.8):  80%|█████▌ | 16/20 [02:11<00:34,  8.56s/it][A[A

Faithful: 1
Detail: 2
Overall: 5
Test Question: How can you index data near-realtime without losing semantic meaning?
Predicted Answer: predicted_answer




Average Metric: 42.2 / 17  (248.2):  80%|█████▌ | 16/20 [02:15<00:34,  8.56s/it][A[A

Average Metric: 42.2 / 17  (248.2):  85%|█████▉ | 17/20 [02:15<00:22,  7.39s/it][A[A

Faithful: 1
Detail: 1
Overall: 5
Test Question: Why isn't there a text2vec-contextionary in my language?
Predicted Answer: predicted_answer




Average Metric: 46.2 / 18  (256.7):  85%|█████▉ | 17/20 [02:32<00:22,  7.39s/it][A[A

Average Metric: 46.2 / 18  (256.7):  90%|██████▎| 18/20 [02:32<00:20, 10.26s/it][A[A

Faithful: 5
Detail: 5
Overall: 5




Average Metric: 48.0 / 19  (252.6):  90%|██████▎| 18/20 [02:37<00:20, 10.26s/it][A[A

Average Metric: 48.0 / 19  (252.6):  95%|██████▋| 19/20 [02:37<00:08,  8.55s/it][A[A

Faithful: 1
Detail: 2
Overall: 5
Test Question: How do you deal with words that have multiple meanings?
Predicted Answer: predicted_answer




Average Metric: 52.0 / 20  (260.0):  95%|██████▋| 19/20 [03:05<00:08,  8.55s/it][A[A

Average Metric: 52.0 / 20  (260.0): 100%|███████| 20/20 [03:05<00:00,  9.29s/it][A[A


Faithful: 5
Detail: 5
Overall: 5
Average Metric: 52.0 / 20  (260.0%)
Score: 260.0 for set: [4]
New best score: 260.0 for seed -1
Scores so far: [243.0, 243.0, 260.0]
Best score: 260.0
Average of max per entry across top 1 scores: 2.6
Average of max per entry across top 2 scores: 2.75
Average of max per entry across top 3 scores: 2.75
Average of max per entry across top 5 scores: 2.75
Average of max per entry across top 8 scores: 2.75
Average of max per entry across top 9999 scores: 2.75



  0%|                                                    | 0/20 [00:00<?, ?it/s]

Test Question: Do Weaviate classes have namespaces?
Predicted Answer: predicted_answer



  5%|██▏                                         | 1/20 [00:00<00:09,  1.91it/s]

Faithful: 1
Detail: 2
Overall: 5
Test Question: Why isn't there a text2vec-contextionary in my language?
Predicted Answer: predicted_answer



 10%|████▍                                       | 2/20 [00:01<00:11,  1.63it/s]

Faithful: 1
Detail: 1
Overall: 5
Test Question: How to deal with custom terminology?
Predicted Answer: predicted_answer



 15%|██████▌                                     | 3/20 [00:01<00:09,  1.85it/s]

Faithful: 1
Detail: 2
Overall: 5
Test Question: Why does Weaviate have a schema and not an ontology?
Predicted Answer: predicted_answer


 20%|████████▊                                   | 4/20 [00:02<00:08,  1.90it/s]


Faithful: 5
Detail: 5
Overall: 3
Bootstrapped 4 full traces after 5 examples in round 0.



  0%|                                                    | 0/20 [00:00<?, ?it/s]

Test Question: What is the difference between Weaviate and for example Elasticsearch?
Predicted Answer: predicted_answer
Test Question: Why would I use Weaviate as my vector database?
Predicted Answer: predicted_answer


Average Metric: 1.6 / 1  (160.0):   5%|▌         | 1/20 [00:18<05:58, 18.86s/it]

Faithful: 2
Detail: 2
Overall: 2
Test Question: Do you offer Weaviate as a managed service?
Predicted Answer: predicted_answer


Average Metric: 5.6 / 2  (280.0):  10%|█         | 2/20 [00:22<02:59,  9.99s/it]

Faithful: 5
Detail: 5
Overall: 5
Test Question: How should I configure the size of my instance?
Predicted Answer: predicted_answer


Average Metric: 7.6 / 3  (253.3):  15%|█▌        | 3/20 [00:41<03:58, 14.04s/it]

Faithful: 1
Detail: 3
Overall: 5
Test Question: Do I need to know about Docker (Compose) to use Weaviate?
Predicted Answer: predicted_answer


Average Metric: 11.0 / 4  (275.0):  20%|█▊       | 4/20 [00:44<02:35,  9.71s/it]

Faithful: 5
Detail: 2
Overall: 5
Test Question: What happens when the Weaviate Docker container restarts? Is my data in the Weaviate database lost?
Predicted Answer: predicted_answer


Average Metric: 14.6 / 5  (292.0):  25%|██▎      | 5/20 [01:03<03:16, 13.12s/it]

Faithful: 5
Detail: 3
Overall: 5
Test Question: Are there any 'best practices' or guidelines to consider when designing a schema?
Predicted Answer: predicted_answer


Average Metric: 17.4 / 6  (290.0):  30%|██▋      | 6/20 [01:10<02:34, 11.05s/it]

Faithful: 2
Detail: 5
Overall: 5
Test Question: Should I use references in my schema?
Predicted Answer: predicted_answer


Average Metric: 20.799999999999997 / 7  (297.1):  35%|▎| 7/20 [01:27<02:46, 12.8

Faithful: 5
Detail: 2
Overall: 5
Test Question: Is it possible to create one-to-many relationships in the schema?
Predicted Answer: predicted_answer


Average Metric: 24.599999999999998 / 8  (307.5):  40%|▍| 8/20 [01:36<02:18, 11.5

Faithful: 5
Detail: 4
Overall: 5
Test Question: What is the difference between `text` and `string` and `valueText` and `valueString`?
Predicted Answer: predicted_answer


Average Metric: 28.2 / 9  (313.3):  45%|████     | 9/20 [01:53<02:27, 13.44s/it]

Faithful: 5
Detail: 3
Overall: 5
Test Question: Do Weaviate classes have namespaces?
Predicted Answer: predicted_answer


Average Metric: 30.0 / 10  (300.0):  50%|███▌   | 10/20 [01:54<01:36,  9.69s/it]

Faithful: 1
Detail: 2
Overall: 5
Test Question: Are there restrictions on UUID formatting? Do I have to adhere to any standards?
Predicted Answer: predicted_answer


Average Metric: 31.8 / 11  (289.1):  55%|███▊   | 11/20 [02:04<01:26,  9.59s/it]

Faithful: 1
Detail: 2
Overall: 5
Test Question: If I do not specify a UUID during adding data objects, will Weaviate create one automatically?
Predicted Answer: predicted_answer


Average Metric: 35.4 / 12  (295.0):  60%|████▏  | 12/20 [02:15<01:21, 10.20s/it]

Faithful: 5
Detail: 3
Overall: 5
Test Question: Can I use Weaviate to create a traditional knowledge graph?
Predicted Answer: predicted_answer


Average Metric: 39.0 / 13  (300.0):  65%|████▌  | 13/20 [02:33<01:26, 12.42s/it]

Faithful: 5
Detail: 3
Overall: 5
Test Question: Why does Weaviate have a schema and not an ontology?
Predicted Answer: predicted_answer


Average Metric: 42.6 / 14  (304.3):  70%|████▉  | 14/20 [02:36<00:58,  9.68s/it]

Faithful: 5
Detail: 5
Overall: 3
Test Question: What is the difference between a Weaviate data schema, ontologies and taxonomies?
Predicted Answer: predicted_answer


Average Metric: 44.2 / 15  (294.7):  75%|█████▎ | 15/20 [03:01<01:11, 14.31s/it]

Faithful: 1
Detail: 1
Overall: 5
Test Question: How to deal with custom terminology?
Predicted Answer: predicted_answer


Average Metric: 46.0 / 16  (287.5):  80%|█████▌ | 16/20 [03:03<00:42, 10.59s/it]

Faithful: 1
Detail: 2
Overall: 5


Average Metric: 49.2 / 17  (289.4):  85%|█████▉ | 17/20 [03:04<00:22,  7.47s/it]

Faithful: 5
Detail: 1
Overall: 5
Test Question: How can you index data near-realtime without losing semantic meaning?
Predicted Answer: predicted_answer
Test Question: Why isn't there a text2vec-contextionary in my language?
Predicted Answer: predicted_answer


Average Metric: 50.800000000000004 / 18  (282.2):  90%|▉| 18/20 [03:06<00:12,  6

Faithful: 1
Detail: 1
Overall: 5
Test Question: How do you deal with words that have multiple meanings?
Predicted Answer: predicted_answer


Average Metric: 54.400000000000006 / 19  (286.3):  95%|▉| 19/20 [03:27<00:10, 10

Faithful: 5
Detail: 3
Overall: 5


Average Metric: 57.60000000000001 / 20  (288.0): 100%|█| 20/20 [03:29<00:00, 10.


Faithful: 5
Detail: 1
Overall: 5
Average Metric: 57.60000000000001 / 20  (288.0%)
Score: 288.0 for set: [4]
New best score: 288.0 for seed 0
Scores so far: [243.0, 243.0, 260.0, 288.0]
Best score: 288.0
Average of max per entry across top 1 scores: 2.8800000000000003
Average of max per entry across top 2 scores: 3.06
Average of max per entry across top 3 scores: 3.11
Average of max per entry across top 5 scores: 3.11
Average of max per entry across top 8 scores: 3.11
Average of max per entry across top 9999 scores: 3.11



  0%|                                                    | 0/20 [00:00<?, ?it/s]

Test Question: Are there restrictions on UUID formatting? Do I have to adhere to any standards?
Predicted Answer: predicted_answer



  5%|██▏                                         | 1/20 [00:00<00:09,  1.91it/s]

Faithful: 1
Detail: 3
Overall: 5
Test Question: What happens when the Weaviate Docker container restarts? Is my data in the Weaviate database lost?
Predicted Answer: predicted_answer


 10%|████▍                                       | 2/20 [00:00<00:08,  2.12it/s]


Faithful: 4
Detail: 5
Overall: 5
Bootstrapped 2 full traces after 3 examples in round 0.



  0%|                                                    | 0/20 [00:00<?, ?it/s]

Test Question: What is the difference between Weaviate and for example Elasticsearch?
Predicted Answer: predicted_answer
Test Question: Why would I use Weaviate as my vector database?
Predicted Answer: predicted_answer


Average Metric: 4.0 / 1  (400.0):   5%|▌         | 1/20 [00:19<06:12, 19.59s/it]

Faithful: 5
Detail: 5
Overall: 5


Average Metric: 6.0 / 2  (300.0):  10%|█         | 2/20 [00:20<02:39,  8.87s/it]

Faithful: 1
Detail: 5
Overall: 3
Test Question: Do you offer Weaviate as a managed service?
Predicted Answer: predicted_answer
Test Question: How should I configure the size of my instance?
Predicted Answer: predicted_answer


Average Metric: 7.8 / 3  (260.0):  15%|█▌        | 3/20 [00:36<03:22, 11.89s/it]

Faithful: 1
Detail: 2
Overall: 5
Test Question: Do I need to know about Docker (Compose) to use Weaviate?
Predicted Answer: predicted_answer
Faithful: 1
Detail: 3
Overall: 3


Average Metric: 13.0 / 5  (260.0):  25%|██▎      | 5/20 [00:41<01:30,  6.01s/it]

Test Question: What happens when the Weaviate Docker container restarts? Is my data in the Weaviate database lost?
Predicted Answer: predicted_answer
Faithful: 4
Detail: 5
Overall: 5


Average Metric: 14.8 / 6  (246.7):  30%|██▋      | 6/20 [00:41<00:56,  4.05s/it]

Faithful: 1
Detail: 2
Overall: 5
Test Question: Are there any 'best practices' or guidelines to consider when designing a schema?
Predicted Answer: predicted_answer
Test Question: Should I use references in my schema?
Predicted Answer: predicted_answer


Average Metric: 18.8 / 7  (268.6):  35%|███▏     | 7/20 [01:04<02:12, 10.22s/it]

Faithful: 5
Detail: 5
Overall: 5


Average Metric: 22.8 / 8  (285.0):  40%|███▌     | 8/20 [01:04<01:26,  7.19s/it]

Faithful: 5
Detail: 5
Overall: 5



Average Metric: 24.8 / 9  (275.6):  40%|███▌     | 8/20 [01:06<01:26,  7.19s/it]

Test Question: Is it possible to create one-to-many relationships in the schema?
Predicted Answer: predicted_answer
Faithful: 1
Detail: 3
Overall: 5



Average Metric: 24.8 / 9  (275.6):  45%|████     | 9/20 [01:06<00:58,  5.35s/it]

Test Question: What is the difference between `text` and `string` and `valueText` and `valueString`?
Predicted Answer: predicted_answer
Test Question: Do Weaviate classes have namespaces?
Predicted Answer: predicted_answer


Average Metric: 26.8 / 10  (268.0):  50%|███▌   | 10/20 [01:26<01:38,  9.89s/it]

Faithful: 1
Detail: 3
Overall: 5
Test Question: Are there restrictions on UUID formatting? Do I have to adhere to any standards?
Predicted Answer: predicted_answer


Average Metric: 28.8 / 11  (261.8):  55%|███▊   | 11/20 [01:34<01:23,  9.31s/it]

Faithful: 1
Detail: 3
Overall: 5
Test Question: If I do not specify a UUID during adding data objects, will Weaviate create one automatically?
Predicted Answer: predicted_answer


Average Metric: 30.400000000000002 / 12  (253.3):  60%|▌| 12/20 [01:52<01:36, 12

Faithful: 1
Detail: 1
Overall: 5


Average Metric: 32.800000000000004 / 13  (252.3):  65%|▋| 13/20 [01:52<00:59,  8

Faithful: 1
Detail: 5
Overall: 5
Test Question: Can I use Weaviate to create a traditional knowledge graph?
Predicted Answer: predicted_answer
Test Question: Why does Weaviate have a schema and not an ontology?
Predicted Answer: predicted_answer


Average Metric: 34.400000000000006 / 14  (245.7):  70%|▋| 14/20 [02:14<01:14, 12

Faithful: 1
Detail: 1
Overall: 5
Test Question: What is the difference between a Weaviate data schema, ontologies and taxonomies?
Predicted Answer: predicted_answer


Average Metric: 36.800000000000004 / 15  (245.3):  75%|▊| 15/20 [02:26<01:02, 12

Faithful: 2
Detail: 5
Overall: 3
Test Question: How to deal with custom terminology?
Predicted Answer: predicted_answer


Average Metric: 36.800000000000004 / 16  (230.0):  80%|▊| 16/20 [02:35<00:45, 11

Error for example in dev set: 		 'NoneType' object is not iterable
Test Question: How can you index data near-realtime without losing semantic meaning?
Predicted Answer: predicted_answer


Average Metric: 36.800000000000004 / 17  (216.5):  85%|▊| 17/20 [02:40<00:28,  9

Faithful: No
Detail: 1
Overall: 5
Error for example in dev set: 		 could not convert string to float: 'No'
Test Question: Why isn't there a text2vec-contextionary in my language?
Predicted Answer: predicted_answer


Average Metric: 38.6 / 18  (214.4):  90%|██████▎| 18/20 [03:00<00:25, 12.67s/it]

Faithful: 1
Detail: 2
Overall: 5


Average Metric: 42.2 / 19  (222.1):  95%|██████▋| 19/20 [03:02<00:09,  9.44s/it]

Faithful: 5
Detail: 5
Overall: 3
Test Question: How do you deal with words that have multiple meanings?
Predicted Answer: predicted_answer


Average Metric: 46.2 / 20  (231.0): 100%|███████| 20/20 [03:34<00:00, 10.74s/it]

Faithful: 5
Detail: 5
Overall: 5
Average Metric: 46.2 / 20  (231.0%)
Score: 231.0 for set: [4]
Scores so far: [243.0, 243.0, 260.0, 288.0, 231.0]
Best score: 288.0
Average of max per entry across top 1 scores: 2.8800000000000003
Average of max per entry across top 2 scores: 3.06
Average of max per entry across top 3 scores: 3.11
Average of max per entry across top 5 scores: 3.1399999999999997
Average of max per entry across top 8 scores: 3.1399999999999997
Average of max per entry across top 9999 scores: 3.1399999999999997
5 candidate programs found.





In [59]:
second_compiled_rag("What do cross encoders do?")

Prediction(
    answer='Cross encoders are ranking models used for content-based re-ranking. They are used to give a score indicating the relevance of a document to a query.'
)

In [60]:
llm.inspect_history(n=1)





Answer questions based on the context.

---

Follow the following format.

Context: may contain relevant facts

Question: ${question}

Reasoning: Let's think step by step in order to ${produce the answer}. We ...

Answer: ${answer}

---

Context:
[1] «The schema is the place to define, among other things, the data type and vectorizer to be used, as well as cross-references between classes. As a corollary, the vectorization process can be modified for each class by setting the relevant schema options. In fact, you can [define the data schema](/developers/weaviate/manage-data/collections) for each class individually. All this means that you can also use the schema to tweak Weaviate's vectorization behavior. The relevant variables for vectorization are `dataType` and those listed under `moduleConfig` at both the class level and property level.»
[2] «![Hacktober video](img/hacktober.gif)

### [Weaviate Academy](/developers/academy) & [Workshops](/learn/workshops)
Weaviate Academy and W

In [61]:
evaluate(second_compiled_rag, metric=llm_metric)


  0%|                                                    | 0/10 [00:00<?, ?it/s]

Test Question: Is there support to multiple versions of the query/document embedding models to co-exist at a given time? (helps with live experiments of new model versions)
Predicted Answer: predicted_answer


Average Metric: 3.6 / 1  (360.0):  10%|█         | 1/10 [00:21<03:12, 21.35s/it]

Faithful: 5
Detail: 3
Overall: 5
Test Question: How can I retrieve the total object count in a class?
Predicted Answer: predicted_answer


Average Metric: 5.4 / 2  (270.0):  20%|██        | 2/10 [00:44<02:58, 22.35s/it]

Faithful: 1
Detail: 2
Overall: 5
Test Question: How do I get the cosine similarity from Weaviate's certainty?
Predicted Answer: predicted_answer


Average Metric: 7.2 / 3  (240.0):  30%|███       | 3/10 [01:11<02:50, 24.30s/it]

Faithful: 1
Detail: 2
Overall: 5
Test Question: The quality of my search results change depending on the specified limit. Why? How can I fix this?
Predicted Answer: predicted_answer


Average Metric: 9.0 / 4  (225.0):  40%|████      | 4/10 [01:36<02:28, 24.73s/it]

Faithful: 1
Detail: 2
Overall: 5
Test Question: Why did you use GraphQL instead of SPARQL?
Predicted Answer: predicted_answer


Average Metric: 10.6 / 5  (212.0):  50%|████▌    | 5/10 [01:38<01:22, 16.43s/it]

Faithful: 1
Detail: 1
Overall: 5
Test Question: What is the best way to iterate through objects? Can I do paginated API calls?
Predicted Answer: predicted_answer


Average Metric: 12.8 / 6  (213.3):  60%|█████▍   | 6/10 [02:01<01:15, 18.86s/it]

Faithful: 1
Detail: 4
Overall: 5
Test Question: What is best practice for updating data?
Predicted Answer: predicted_answer


Average Metric: 16.0 / 7  (228.6):  70%|██████▎  | 7/10 [02:24<01:00, 20.19s/it]

Faithful: 5
Detail: 1
Overall: 5
Test Question: Can I connect my own module?
Predicted Answer: predicted_answer


Average Metric: 19.6 / 8  (245.0):  80%|███████▏ | 8/10 [02:48<00:42, 21.48s/it]

Faithful: 5
Detail: 3
Overall: 5
Test Question: Can I train my own text2vec-contextionary vectorizer module?
Predicted Answer: predicted_answer


Average Metric: 23.200000000000003 / 9  (257.8):  90%|▉| 9/10 [03:28<00:27, 27.2

Faithful: 5
Detail: 3
Overall: 5
Test Question: Does Weaviate use Hnswlib?
Predicted Answer: predicted_answer


Average Metric: 26.400000000000002 / 10  (264.0): 100%|█| 10/10 [03:41<00:00, 22

Faithful: 5
Detail: 3
Overall: 3
Average Metric: 26.400000000000002 / 10  (264.0%)





Unnamed: 0,question,answer,llm_metric
0,Is there support to multiple versions of the query/document embedding models to co-exist at a given time? (helps with live experiments of new model versions),The given context does not provide any information about whether there is support for multiple versions of the query/document embedding models to co-exist at a...,3.6
1,How can I retrieve the total object count in a class?,"The given context does not provide any information about how to retrieve the total object count in a class. Therefore, we cannot determine the answer...",1.8
2,How do I get the cosine similarity from Weaviate's certainty?,"The given context does not provide any information about how to get the cosine similarity from Weaviate's certainty. Therefore, we cannot determine the answer based...",1.8
3,The quality of my search results change depending on the specified limit. Why? How can I fix this?,The given context does not provide any information about why the quality of search results changes depending on the specified limit or how to fix...,1.8
4,Why did you use GraphQL instead of SPARQL?,The given context does not provide any information about why GraphQL was used instead of SPARQL.,1.6


264.0

# BayesianSignatureOptimizer

In [62]:
from dspy.teleprompt import BayesianSignatureOptimizer

llm_prompter = dspy.OpenAI(model='gpt-4', max_tokens=2000, model_type='chat')

teleprompter = BayesianSignatureOptimizer(task_model=dspy.settings.lm,
                                          metric=llm_metric,
                                          prompt_model=llm_prompter,
                                          n=5,
                                          verbose=False)

kwargs = dict(num_threads=1, display_progress=True, display_table=0)
third_compiled_rag = teleprompter.compile(RAG(), devset=devset,
                                         optuna_trials_num=3,
                                         max_bootstrapped_demos=4,
                                         max_labeled_demos=4,
                                         eval_kwargs=kwargs)


  0%|                                                    | 0/10 [00:00<?, ?it/s]

Test Question: What is best practice for updating data?
Predicted Answer: predicted_answer



 10%|████▍                                       | 1/10 [00:00<00:03,  2.36it/s]

Faithful: 5
Detail: 3
Overall: 4
Test Question: Can I train my own text2vec-contextionary vectorizer module?
Predicted Answer: predicted_answer



 20%|████████▊                                   | 2/10 [00:00<00:03,  2.65it/s]

Faithful: 5
Detail: 5
Overall: 5
Test Question: Does Weaviate use Hnswlib?
Predicted Answer: predicted_answer



 30%|█████████████▏                              | 3/10 [00:01<00:03,  2.07it/s]

Faithful: 1
Detail: 2
Overall: 5
Test Question: Can I connect my own module?
Predicted Answer: predicted_answer


 40%|█████████████████▌                          | 4/10 [00:01<00:02,  2.13it/s]


Faithful: 5
Detail: 5
Overall: 5
Bootstrapped 4 full traces after 5 examples in round 0.



  0%|                                                    | 0/10 [00:00<?, ?it/s]

Test Question: What is the best way to iterate through objects? Can I do paginated API calls?
Predicted Answer: predicted_answer



 10%|████▍                                       | 1/10 [00:01<00:10,  1.20s/it]

Faithful: 1
Detail: 1
Overall: 5
Test Question: Does Weaviate use Hnswlib?
Predicted Answer: predicted_answer



 20%|████████▊                                   | 2/10 [00:01<00:06,  1.28it/s]

Faithful: 1
Detail: 2
Overall: 5
Test Question: The quality of my search results change depending on the specified limit. Why? How can I fix this?
Predicted Answer: predicted_answer



 30%|█████████████▏                              | 3/10 [00:02<00:04,  1.66it/s]

Faithful: 2
Detail: 5
Overall: 4


 40%|█████████████████▌                          | 4/10 [00:02<00:03,  1.62it/s]


Test Question: Why did you use GraphQL instead of SPARQL?
Predicted Answer: predicted_answer
Faithful: 1
Detail: 3
Overall: 5
Bootstrapped 4 full traces after 5 examples in round 0.


 10%|████▍                                       | 1/10 [00:00<00:03,  2.57it/s]

Test Question: How can I retrieve the total object count in a class?
Predicted Answer: predicted_answer
Faithful: 1
Detail: 4
Overall: 1
Test Question: What is the best way to iterate through objects? Can I do paginated API calls?
Predicted Answer: predicted_answer



 20%|████████▊                                   | 2/10 [00:00<00:03,  2.53it/s]

Faithful: 1
Detail: 1
Overall: 5
Test Question: What is best practice for updating data?
Predicted Answer: predicted_answer



 30%|█████████████▏                              | 3/10 [00:01<00:02,  2.66it/s]

Faithful: 5
Detail: 3
Overall: 4


 40%|█████████████████▌                          | 4/10 [00:01<00:02,  2.60it/s]


Test Question: Is there support to multiple versions of the query/document embedding models to co-exist at a given time? (helps with live experiments of new model versions)
Predicted Answer: predicted_answer
Faithful: 1
Detail: 3
Overall: 5
Bootstrapped 4 full traces after 5 examples in round 0.



  0%|                                                    | 0/10 [00:00<?, ?it/s]

Test Question: Can I train my own text2vec-contextionary vectorizer module?
Predicted Answer: predicted_answer



 10%|████▍                                       | 1/10 [00:00<00:04,  2.03it/s]

Faithful: 5
Detail: 5
Overall: 5



 20%|████████▊                                   | 2/10 [00:00<00:03,  2.07it/s]

Test Question: How do I get the cosine similarity from Weaviate's certainty?
Predicted Answer: predicted_answer
Faithful: 1
Detail: 5
Overall: 1
Test Question: Is there support to multiple versions of the query/document embedding models to co-exist at a given time? (helps with live experiments of new model versions)
Predicted Answer: predicted_answer



 30%|█████████████▏                              | 3/10 [00:01<00:03,  2.11it/s]

Faithful: 1
Detail: 3
Overall: 5
Test Question: Can I connect my own module?
Predicted Answer: predicted_answer


 40%|█████████████████▌                          | 4/10 [00:01<00:02,  2.12it/s]
[I 2024-02-11 14:46:53,175] A new study created in memory with name: no-name-5e566642-02dd-4eee-8691-53989ee359f0


Faithful: 5
Detail: 5
Overall: 5
Bootstrapped 4 full traces after 5 examples in round 0.



  0%|                                                    | 0/10 [00:00<?, ?it/s]

Test Question: Is there support to multiple versions of the query/document embedding models to co-exist at a given time? (helps with live experiments of new model versions)
Predicted Answer: predicted_answer


Average Metric: 2.0 / 1  (200.0):  10%|█         | 1/10 [00:00<00:07,  1.28it/s]

Faithful: 1
Detail: 3
Overall: 5
Test Question: How can I retrieve the total object count in a class?
Predicted Answer: predicted_answer


Average Metric: 3.8 / 2  (190.0):  20%|██        | 2/10 [00:19<01:30, 11.34s/it]

Faithful: 1
Detail: 5
Overall: 2


Average Metric: 5.4 / 3  (180.0):  30%|███       | 3/10 [00:20<00:44,  6.40s/it]

Test Question: How do I get the cosine similarity from Weaviate's certainty?
Predicted Answer: predicted_answer
Faithful: 1
Detail: 5
Overall: 1
Test Question: The quality of my search results change depending on the specified limit. Why? How can I fix this?
Predicted Answer: predicted_answer


Average Metric: 7.4 / 4  (185.0):  40%|████      | 4/10 [00:44<01:20, 13.37s/it]

Faithful: 1
Detail: 5
Overall: 3
Test Question: Why did you use GraphQL instead of SPARQL?
Predicted Answer: predicted_answer


Average Metric: 11.4 / 5  (228.0):  50%|████▌    | 5/10 [01:04<01:19, 15.82s/it]

Faithful: 5
Detail: 5
Overall: 5
Test Question: What is the best way to iterate through objects? Can I do paginated API calls?
Predicted Answer: predicted_answer


Average Metric: 13.0 / 6  (216.7):  60%|█████▍   | 6/10 [01:29<01:16, 19.08s/it]

Faithful: 1
Detail: 4
Overall: 2
Test Question: What is best practice for updating data?
Predicted Answer: predicted_answer


Average Metric: 17.0 / 7  (242.9):  70%|██████▎  | 7/10 [02:02<01:11, 23.69s/it]

Faithful: 5
Detail: 5
Overall: 5


Average Metric: 21.0 / 8  (262.5):  80%|███████▏ | 8/10 [02:03<00:32, 16.28s/it]

Test Question: Can I connect my own module?
Predicted Answer: predicted_answer
Faithful: 5
Detail: 5
Overall: 5
Test Question: Can I train my own text2vec-contextionary vectorizer module?
Predicted Answer: predicted_answer


Average Metric: 25.0 / 9  (277.8):  90%|████████ | 9/10 [02:03<00:11, 11.33s/it]

Faithful: 5
Detail: 5
Overall: 5
Test Question: Does Weaviate use Hnswlib?
Predicted Answer: predicted_answer


Average Metric: 29.0 / 10  (290.0): 100%|███████| 10/10 [02:27<00:00, 14.72s/it]
[I 2024-02-11 14:49:20,377] Trial 0 finished with value: 290.0 and parameters: {'10883682544_predictor_instruction': 1, '10883682544_predictor_demos': 4}. Best is trial 0 with value: 290.0.


Faithful: 5
Detail: 5
Overall: 5
Average Metric: 29.0 / 10  (290.0%)




Assess the context and answer the given questions that are predominantly about software usage, process optimization, and troubleshooting. Focus on providing accurate information related to tech or software-related queries.

---

Follow the following format.

Context: may contain relevant facts

Question: ${question}

Reasoning: Let's think step by step in order to ${produce the answer}. We ...

Best Answer: ${answer}

---

Context:
[1] «Some models, such as [CLIP](https://openai.com/blog/clip/), are capable of vectorizing multiple data types (images and text in this case) into one vector space, so that an image can be searched by its content using only text. ## Vector embeddings with Weaviate

For this reason, Weaviate is configured to support many different vectorizer models and vectorizer service providers. You can even [bring your own vectors](/developers/weaviate/starter-guides/custom-vectors), for example if 


  0%|                                                    | 0/10 [00:00<?, ?it/s]

Test Question: Is there support to multiple versions of the query/document embedding models to co-exist at a given time? (helps with live experiments of new model versions)
Predicted Answer: predicted_answer


Average Metric: 3.8 / 1  (380.0):  10%|█         | 1/10 [00:21<03:09, 21.09s/it]

Faithful: 5
Detail: 4
Overall: 5
Test Question: How can I retrieve the total object count in a class?
Predicted Answer: predicted_answer


Average Metric: 7.199999999999999 / 2  (360.0):  20%|▏| 2/10 [00:21<01:11,  8.96

Faithful: 5
Detail: 2
Overall: 5
Test Question: How do I get the cosine similarity from Weaviate's certainty?
Predicted Answer: predicted_answer


Average Metric: 8.799999999999999 / 3  (293.3):  30%|▎| 3/10 [00:22<00:35,  5.10

Faithful: 1
Detail: 1
Overall: 5


Average Metric: 11.399999999999999 / 4  (285.0):  40%|▍| 4/10 [00:22<00:19,  3.2

Test Question: The quality of my search results change depending on the specified limit. Why? How can I fix this?
Predicted Answer: predicted_answer
Faithful: 2
Detail: 5
Overall: 4


Average Metric: 13.399999999999999 / 5  (268.0):  50%|▌| 5/10 [00:22<00:11,  2.2

Test Question: Why did you use GraphQL instead of SPARQL?
Predicted Answer: predicted_answer
Faithful: 1
Detail: 3
Overall: 5
Test Question: What is the best way to iterate through objects? Can I do paginated API calls?
Predicted Answer: predicted_answer


Average Metric: 14.999999999999998 / 6  (250.0):  60%|▌| 6/10 [00:23<00:06,  1.6

Faithful: 1
Detail: 1
Overall: 5
Test Question: What is best practice for updating data?
Predicted Answer: predicted_answer


Average Metric: 16.599999999999998 / 7  (237.1):  70%|▋| 7/10 [00:45<00:25,  8.4

Faithful: 1
Detail: 1
Overall: 5
Test Question: Can I connect my own module?
Predicted Answer: predicted_answer


Average Metric: 20.2 / 8  (252.5):  80%|███████▏ | 8/10 [01:08<00:25, 12.92s/it]

Faithful: 5
Detail: 5
Overall: 3
Test Question: Can I train my own text2vec-contextionary vectorizer module?
Predicted Answer: predicted_answer


Average Metric: 24.0 / 9  (266.7):  90%|████████ | 9/10 [01:33<00:16, 16.71s/it]

Faithful: 5
Detail: 4
Overall: 5
Test Question: Does Weaviate use Hnswlib?
Predicted Answer: predicted_answer


Average Metric: 27.8 / 10  (278.0): 100%|███████| 10/10 [01:53<00:00, 11.34s/it]
[I 2024-02-11 14:51:13,743] Trial 1 finished with value: 278.0 and parameters: {'10883682544_predictor_instruction': 3, '10883682544_predictor_demos': 2}. Best is trial 0 with value: 290.0.


Faithful: 5
Detail: 4
Overall: 5
Average Metric: 27.8 / 10  (278.0%)




You are presented with several contexts or data scenarios, typically related to software usage or tech process optimization. Based on this, you will be asked questions to clarify or retrieve certain details. Your task is to provide an accurate and specific response, which will assist in a tech-oriented customer service role, such as creating FAQs, supporting a chatbot, or enhancing Natural Language Processing to understand user queries more efficiently.

---

Follow the following format.

Context: may contain relevant facts

Question: ${question}

Reasoning: Let's think step by step in order to ${produce the answer}. We ...

Relevant Information: ${answer}

---

Context:
[1] «Finally, we can jump on a bike to reach our local destination. For a better understanding, consider the below graphic, which shows a graph with all the connections generated using 1000 objects in two dimensions. <img
    src={require('./img/v


  0%|                                                    | 0/10 [00:00<?, ?it/s]

Test Question: Is there support to multiple versions of the query/document embedding models to co-exist at a given time? (helps with live experiments of new model versions)
Predicted Answer: predicted_answer


Average Metric: 2.0 / 1  (200.0):  10%|█         | 1/10 [00:00<00:04,  1.99it/s]

Faithful: 1
Detail: 3
Overall: 5


Average Metric: 3.4 / 2  (170.0):  20%|██        | 2/10 [00:01<00:04,  1.79it/s]

Test Question: How can I retrieve the total object count in a class?
Predicted Answer: predicted_answer
Faithful: 1
Detail: 4
Overall: 1
Test Question: How do I get the cosine similarity from Weaviate's certainty?
Predicted Answer: predicted_answer


Average Metric: 5.0 / 3  (166.7):  30%|███       | 3/10 [00:01<00:03,  1.94it/s]

Faithful: 1
Detail: 5
Overall: 1


Average Metric: 7.6 / 4  (190.0):  40%|████      | 4/10 [00:02<00:03,  1.98it/s]

Test Question: The quality of my search results change depending on the specified limit. Why? How can I fix this?
Predicted Answer: predicted_answer
Faithful: 2
Detail: 5
Overall: 4
Test Question: Why did you use GraphQL instead of SPARQL?
Predicted Answer: predicted_answer


Average Metric: 9.6 / 5  (192.0):  50%|█████     | 5/10 [00:02<00:02,  1.96it/s]

Faithful: 1
Detail: 3
Overall: 5
Test Question: What is the best way to iterate through objects? Can I do paginated API calls?
Predicted Answer: predicted_answer


Average Metric: 11.2 / 6  (186.7):  60%|█████▍   | 6/10 [00:03<00:02,  1.98it/s]

Faithful: 1
Detail: 1
Overall: 5


Average Metric: 14.6 / 7  (208.6):  70%|██████▎  | 7/10 [00:03<00:01,  2.14it/s]

Test Question: What is best practice for updating data?
Predicted Answer: predicted_answer
Faithful: 5
Detail: 3
Overall: 4


Average Metric: 18.6 / 8  (232.5):  80%|███████▏ | 8/10 [00:03<00:00,  2.33it/s]

Test Question: Can I connect my own module?
Predicted Answer: predicted_answer
Faithful: 5
Detail: 5
Overall: 5
Test Question: Can I train my own text2vec-contextionary vectorizer module?
Predicted Answer: predicted_answer


Average Metric: 22.6 / 9  (251.1):  90%|████████ | 9/10 [00:04<00:00,  2.35it/s]

Faithful: 5
Detail: 5
Overall: 5


Average Metric: 24.400000000000002 / 10  (244.0): 100%|█| 10/10 [00:04<00:00,  2
[I 2024-02-11 14:51:18,463] Trial 2 finished with value: 244.0 and parameters: {'10883682544_predictor_instruction': 0, '10883682544_predictor_demos': 0}. Best is trial 0 with value: 290.0.


Test Question: Does Weaviate use Hnswlib?
Predicted Answer: predicted_answer
Faithful: 1
Detail: 2
Overall: 5
Average Metric: 24.400000000000002 / 10  (244.0%)




Answer questions based on the context.

---

Follow the following format.

Context: may contain relevant facts

Question: ${question}

Reasoning: Let's think step by step in order to ${produce the answer}. We ...

Answer: ${answer}

---

Context:
[1] «There is no performance gain if we have more parallelization than there are available CPUs. However, each thread needs additional memory. So with 32 parallel imports, we had the worst of both worlds: High memory usage and no performance gains beyond 8. With the fix, even if you import from multiple clients, Weaviate automatically handles the parallelization to ensure that it does not exceed the number of CPU cores. As a result, you get the maximum performance without "unnecessary" memory usage. ### HNSW optimization

Next, we optimized memory allocations for the HNSW (vector) i

In [63]:
third_compiled_rag("What do cross encoders do?")

Prediction(
    answer='Cross encoders are ranking models used for content-based re-ranking. They are designed to determine the relevance of a document to a query. Cross encoders take a [query, document] input and output a score indicating the relevance of the document to the query. They can be used to re-rank search results and provide a more personalized and context-aware search experience.'
)

# Check this out!!

Below you can see how the BayesianSignatureOptimizer jointly (1) optimizes the task instruction to:

```
Assess the context and answer the given questions that are predominantly about software usage, process optimization, and troubleshooting. Focus on providing accurate information related to tech or software-related queries.
```

As well as sourcing input-output examples for the prompt!

In [64]:
llm.inspect_history(n=1)





Assess the context and answer the given questions that are predominantly about software usage, process optimization, and troubleshooting. Focus on providing accurate information related to tech or software-related queries.

---

Follow the following format.

Context: may contain relevant facts

Question: ${question}

Reasoning: Let's think step by step in order to ${produce the answer}. We ...

Best Answer: ${answer}

---

Context:
[1] «Some models, such as [CLIP](https://openai.com/blog/clip/), are capable of vectorizing multiple data types (images and text in this case) into one vector space, so that an image can be searched by its content using only text. ## Vector embeddings with Weaviate

For this reason, Weaviate is configured to support many different vectorizer models and vectorizer service providers. You can even [bring your own vectors](/developers/weaviate/starter-guides/custom-vectors), for example if you already have a vectorization pipeline available, or if none of th

In [65]:
evaluate(third_compiled_rag, metric=llm_metric)


  0%|                                                    | 0/10 [00:00<?, ?it/s]

Test Question: Is there support to multiple versions of the query/document embedding models to co-exist at a given time? (helps with live experiments of new model versions)
Predicted Answer: predicted_answer


Average Metric: 2.0 / 1  (200.0):  10%|█         | 1/10 [00:00<00:02,  3.09it/s]

Faithful: 1
Detail: 3
Overall: 5
Test Question: How can I retrieve the total object count in a class?
Predicted Answer: predicted_answer


Average Metric: 3.8 / 2  (190.0):  20%|██        | 2/10 [00:00<00:02,  2.90it/s]

Faithful: 1
Detail: 5
Overall: 2
Test Question: How do I get the cosine similarity from Weaviate's certainty?
Predicted Answer: predicted_answer


Average Metric: 5.4 / 3  (180.0):  30%|███       | 3/10 [00:01<00:03,  2.24it/s]

Faithful: 1
Detail: 5
Overall: 1
Test Question: The quality of my search results change depending on the specified limit. Why? How can I fix this?
Predicted Answer: predicted_answer


Average Metric: 7.4 / 4  (185.0):  40%|████      | 4/10 [00:01<00:02,  2.26it/s]

Faithful: 1
Detail: 5
Overall: 3
Test Question: Why did you use GraphQL instead of SPARQL?
Predicted Answer: predicted_answer


Average Metric: 11.4 / 5  (228.0):  50%|████▌    | 5/10 [00:02<00:02,  1.90it/s]

Faithful: 5
Detail: 5
Overall: 5
Test Question: What is the best way to iterate through objects? Can I do paginated API calls?
Predicted Answer: predicted_answer


Average Metric: 13.0 / 6  (216.7):  60%|█████▍   | 6/10 [00:02<00:02,  1.86it/s]

Faithful: 1
Detail: 4
Overall: 2
Test Question: What is best practice for updating data?
Predicted Answer: predicted_answer


Average Metric: 17.0 / 7  (242.9):  70%|██████▎  | 7/10 [00:03<00:01,  1.80it/s]

Faithful: 5
Detail: 5
Overall: 5
Test Question: Can I connect my own module?
Predicted Answer: predicted_answer


Average Metric: 21.0 / 8  (262.5):  80%|███████▏ | 8/10 [00:04<00:01,  1.81it/s]

Faithful: 5
Detail: 5
Overall: 5
Test Question: Can I train my own text2vec-contextionary vectorizer module?
Predicted Answer: predicted_answer


Average Metric: 25.0 / 9  (277.8):  90%|████████ | 9/10 [00:04<00:00,  2.02it/s]

Faithful: 5
Detail: 5
Overall: 5
Test Question: Does Weaviate use Hnswlib?
Predicted Answer: predicted_answer


Average Metric: 29.0 / 10  (290.0): 100%|███████| 10/10 [00:04<00:00,  2.05it/s]

Faithful: 5
Detail: 5
Overall: 5
Average Metric: 29.0 / 10  (290.0%)





Unnamed: 0,question,answer,llm_metric
0,Is there support to multiple versions of the query/document embedding models to co-exist at a given time? (helps with live experiments of new model versions),No specific information is provided in the given context about support for multiple versions of the query/document embedding models to co-exist at a given time.,2.0
1,How can I retrieve the total object count in a class?,"The given context does not provide specific information on how to retrieve the total object count in a class. However, in Weaviate, you can use...",1.8
2,How do I get the cosine similarity from Weaviate's certainty?,"Weaviate does not directly provide the cosine similarity from its certainty value. The certainty value in Weaviate represents the confidence level of the result, but...",1.6
3,The quality of my search results change depending on the specified limit. Why? How can I fix this?,The quality of search results can change depending on the specified limit because the limit determines the number of results that are returned. When the...,2.0
4,Why did you use GraphQL instead of SPARQL?,The given context does not provide any information about why GraphQL was used instead of SPARQL. The context mainly discusses the performance and functionality of...,4.0


290.0

# Test Set Eval

In [66]:
# Evaluate Uncompiled
from dspy.evaluate.evaluate import Evaluate

# Set up the `evaluate_on_hotpotqa` function. We'll use this many times below.
evaluate = Evaluate(devset=testset, num_threads=1, display_progress=True, display_table=5)

In [67]:
evaluate(uncompiled_rag, metric=llm_metric)


  0%|                                                    | 0/14 [00:00<?, ?it/s]

Test Question: Are all ANN algorithms potential candidates to become an indexation plugin in Weaviate?
Predicted Answer: predicted_answer


Average Metric: 3.2 / 1  (320.0):   7%|▋         | 1/14 [00:28<06:09, 28.41s/it]

Faithful: 3
Detail: 5
Overall: 5
Test Question: Does Weaviate use pre- or post-filtering ANN index search?
Predicted Answer: predicted_answer


Average Metric: 7.2 / 2  (360.0):  14%|█▍        | 2/14 [00:55<05:31, 27.64s/it]

Faithful: 5
Detail: 5
Overall: 5
Test Question: How does Weaviate's vector and scalar filtering work?
Predicted Answer: predicted_answer


Average Metric: 7.2 / 3  (240.0):  21%|██▏       | 3/14 [01:17<04:37, 25.22s/it]

Faithful: No
Detail: 1
Overall: 5
Error for example in dev set: 		 could not convert string to float: 'No'
Test Question: What would you say is more important for query speed in Weaviate: More CPU power, or more RAM?
Predicted Answer: predicted_answer


Average Metric: 8.4 / 4  (210.0):  29%|██▊       | 4/14 [01:39<03:59, 23.94s/it]

Faithful: 1
Detail: 2
Overall: 2
Test Question: Data import takes long / is slow, what is causing this and what can I do?
Predicted Answer: predicted_answer


Average Metric: 12.2 / 5  (244.0):  36%|███▏     | 5/14 [02:07<03:47, 25.33s/it]

Faithful: 5
Detail: 5
Overall: 4
Test Question: How can slow queries be optimized?
Predicted Answer: predicted_answer


Average Metric: 16.2 / 6  (270.0):  43%|███▊     | 6/14 [02:32<03:21, 25.15s/it]

Faithful: 5
Detail: 5
Overall: 5
Test Question: When scalar and vector search are combined, will the scalar filter happen before or after the nearest neighbor (vector) search?
Predicted Answer: predicted_answer


Average Metric: 17.599999999999998 / 7  (251.4):  50%|▌| 7/14 [02:51<02:42, 23.2

Faithful: 2
Detail: 2
Overall: 1
Test Question: Regarding "filtered vector search": Since this is a two-phase pipeline, how big can that list of IDs get? Do you know how that size might affect query performance?
Predicted Answer: predicted_answer


Average Metric: 19.799999999999997 / 8  (247.5):  57%|▌| 8/14 [03:14<02:18, 23.1

Faithful: 1
Detail: 4
Overall: 5
Test Question: My Weaviate setup is using more memory than what I think is reasonable. How can I debug this?
Predicted Answer: predicted_answer


Average Metric: 23.799999999999997 / 9  (264.4):  64%|▋| 9/14 [03:39<01:58, 23.6

Faithful: 5
Detail: 5
Overall: 5
Test Question: How can I print a stack trace of Weaviate?
Predicted Answer: predicted_answer


Average Metric: 26.999999999999996 / 10  (270.0):  71%|▋| 10/14 [04:08<01:40, 25

Faithful: 5
Detail: 1
Overall: 5
Test Question: Can I request a feature in Weaviate?
Predicted Answer: predicted_answer


Average Metric: 30.399999999999995 / 11  (276.4):  79%|▊| 11/14 [04:28<01:10, 23

Faithful: 5
Detail: 3
Overall: 4
Test Question: What is Weaviate's consistency model in a distributed setup?
Predicted Answer: predicted_answer


Average Metric: 33.99999999999999 / 12  (283.3):  86%|▊| 12/14 [04:50<00:46, 23.

Faithful: 5
Detail: 3
Overall: 5
Test Question: With your aggregations I could not see how to do time buckets, is this possible?
Predicted Answer: predicted_answer


Average Metric: 37.99999999999999 / 13  (292.3):  93%|▉| 13/14 [05:10<00:22, 22.

Faithful: 5
Detail: 5
Overall: 5
Test Question: How can I run the latest master branch with Docker Compose?
Predicted Answer: predicted_answer


Average Metric: 37.99999999999999 / 14  (271.4): 100%|█| 14/14 [05:27<00:00, 23.

Faithful: No
Detail: 1
Overall: 5
Error for example in dev set: 		 could not convert string to float: 'No'
Average Metric: 37.99999999999999 / 14  (271.4%)





Unnamed: 0,question,answer,llm_metric
0,Are all ANN algorithms potential candidates to become an indexation plugin in Weaviate?,"No, not all ANN algorithms are potential candidates to become an indexation plugin in Weaviate. Reasoning: The context mentions that Weaviate has implemented the HNSW...",3.2
1,Does Weaviate use pre- or post-filtering ANN index search?,"Weaviate uses pre-filtering ANN index search. Reasoning: In the context, it is mentioned that Weaviate uses the HNSW indexing algorithm, which builds a hierarchical representation...",4.0
2,How does Weaviate's vector and scalar filtering work?,,0.0
3,"What would you say is more important for query speed in Weaviate: More CPU power, or more RAM?",More RAM is more important for query speed in Weaviate.,1.2
4,"Data import takes long / is slow, what is causing this and what can I do?",The slow data import in Weaviate could be caused by the large number of data objects being imported and the need to maintain data integrity....,3.8


271.43

In [68]:
evaluate(compiled_rag, metric=llm_metric)


  0%|                                                    | 0/14 [00:00<?, ?it/s]

Test Question: Are all ANN algorithms potential candidates to become an indexation plugin in Weaviate?
Predicted Answer: predicted_answer


Average Metric: 3.6 / 1  (360.0):   7%|▋         | 1/14 [00:28<06:07, 28.25s/it]

Faithful: 4
Detail: 5
Overall: 5
Test Question: Does Weaviate use pre- or post-filtering ANN index search?
Predicted Answer: predicted_answer


Average Metric: 7.6 / 2  (380.0):  14%|█▍        | 2/14 [00:54<05:24, 27.04s/it]

Faithful: 5
Detail: 5
Overall: 5
Test Question: How does Weaviate's vector and scalar filtering work?
Predicted Answer: predicted_answer


Average Metric: 9.2 / 3  (306.7):  21%|██▏       | 3/14 [01:15<04:26, 24.21s/it]

Faithful: 1
Detail: 1
Overall: 5
Test Question: What would you say is more important for query speed in Weaviate: More CPU power, or more RAM?
Predicted Answer: predicted_answer


Average Metric: 11.6 / 4  (290.0):  29%|██▌      | 4/14 [01:39<04:03, 24.38s/it]

Faithful: 2
Detail: 5
Overall: 3
Test Question: Data import takes long / is slow, what is causing this and what can I do?
Predicted Answer: predicted_answer


Average Metric: 15.0 / 5  (300.0):  36%|███▏     | 5/14 [02:03<03:38, 24.23s/it]

Faithful: 4
Detail: 5
Overall: 4
Test Question: How can slow queries be optimized?
Predicted Answer: predicted_answer


Average Metric: 19.0 / 6  (316.7):  43%|███▊     | 6/14 [02:24<03:03, 22.95s/it]

Faithful: 5
Detail: 5
Overall: 5
Test Question: When scalar and vector search are combined, will the scalar filter happen before or after the nearest neighbor (vector) search?
Predicted Answer: predicted_answer


Average Metric: 20.8 / 7  (297.1):  50%|████▌    | 7/14 [02:45<02:36, 22.30s/it]

Faithful: 1
Detail: 2
Overall: 5
Test Question: Regarding "filtered vector search": Since this is a two-phase pipeline, how big can that list of IDs get? Do you know how that size might affect query performance?
Predicted Answer: predicted_answer


Average Metric: 23.0 / 8  (287.5):  57%|█████▏   | 8/14 [03:07<02:12, 22.15s/it]

Faithful: 1
Detail: 4
Overall: 5
Test Question: My Weaviate setup is using more memory than what I think is reasonable. How can I debug this?
Predicted Answer: predicted_answer


Average Metric: 27.0 / 9  (300.0):  64%|█████▊   | 9/14 [03:45<02:16, 27.30s/it]

Faithful: 5
Detail: 5
Overall: 5
Test Question: How can I print a stack trace of Weaviate?
Predicted Answer: predicted_answer


Average Metric: 28.6 / 10  (286.0):  71%|█████  | 10/14 [04:12<01:49, 27.26s/it]

Faithful: 1
Detail: 1
Overall: 5
Test Question: Can I request a feature in Weaviate?
Predicted Answer: predicted_answer


Average Metric: 32.6 / 11  (296.4):  79%|█████▌ | 11/14 [04:35<01:17, 25.70s/it]

Faithful: 5
Detail: 5
Overall: 5
Test Question: What is Weaviate's consistency model in a distributed setup?
Predicted Answer: predicted_answer


Average Metric: 36.6 / 12  (305.0):  86%|██████ | 12/14 [05:01<00:51, 25.97s/it]

Faithful: 5
Detail: 5
Overall: 5
Test Question: With your aggregations I could not see how to do time buckets, is this possible?
Predicted Answer: predicted_answer


Average Metric: 38.4 / 13  (295.4):  93%|██████▌| 13/14 [05:24<00:24, 24.91s/it]

Faithful: 1
Detail: 2
Overall: 5
Test Question: How can I run the latest master branch with Docker Compose?
Predicted Answer: predicted_answer


Average Metric: 40.0 / 14  (285.7): 100%|███████| 14/14 [05:47<00:00, 24.83s/it]

Faithful: 1
Detail: 1
Overall: 5
Average Metric: 40.0 / 14  (285.7%)





Unnamed: 0,question,answer,llm_metric
0,Are all ANN algorithms potential candidates to become an indexation plugin in Weaviate?,"No, not all ANN algorithms are potential candidates to become an indexation plugin in Weaviate. According to the given context, Weaviate has implemented the HNSW...",3.6
1,Does Weaviate use pre- or post-filtering ANN index search?,"Weaviate uses pre-filtering ANN index search. According to the given context, Weaviate uses the HNSW indexing algorithm, which builds a hierarchical representation of the index...",4.0
2,How does Weaviate's vector and scalar filtering work?,The given context does not provide any information about how Weaviate's vector and scalar filtering works.,1.6
3,"What would you say is more important for query speed in Weaviate: More CPU power, or more RAM?","According to the given context, increasing the frequency of garbage collector (GC) cycles will use more CPU, while increasing RAM will incur additional costs. It...",2.4
4,"Data import takes long / is slow, what is causing this and what can I do?",The cause of slow data import in Weaviate could be the large volume of data being imported. Weaviate recommends using the batch import feature for...,3.4


285.71

In [69]:
evaluate(second_compiled_rag, metric=llm_metric)


  0%|                                                    | 0/14 [00:00<?, ?it/s]

Test Question: Are all ANN algorithms potential candidates to become an indexation plugin in Weaviate?
Predicted Answer: predicted_answer


Average Metric: 3.6 / 1  (360.0):   7%|▋         | 1/14 [00:22<04:47, 22.08s/it]

Faithful: 5
Detail: 3
Overall: 5
Test Question: Does Weaviate use pre- or post-filtering ANN index search?
Predicted Answer: predicted_answer


Average Metric: 5.4 / 2  (270.0):  14%|█▍        | 2/14 [00:55<05:41, 28.48s/it]

Faithful: 1
Detail: 2
Overall: 5
Test Question: How does Weaviate's vector and scalar filtering work?
Predicted Answer: predicted_answer


Average Metric: 7.2 / 3  (240.0):  21%|██▏       | 3/14 [01:21<05:05, 27.77s/it]

Faithful: 1
Detail: 2
Overall: 5
Test Question: What would you say is more important for query speed in Weaviate: More CPU power, or more RAM?
Predicted Answer: predicted_answer


Average Metric: 11.2 / 4  (280.0):  29%|██▌      | 4/14 [01:47<04:28, 26.83s/it]

Faithful: 5
Detail: 5
Overall: 5
Test Question: Data import takes long / is slow, what is causing this and what can I do?
Predicted Answer: predicted_answer


Average Metric: 12.0 / 5  (240.0):  36%|███▏     | 5/14 [02:15<04:05, 27.26s/it]

Faithful: 1
Detail: 1
Overall: 1
Test Question: How can slow queries be optimized?
Predicted Answer: predicted_answer


Average Metric: 13.6 / 6  (226.7):  43%|███▊     | 6/14 [02:40<03:32, 26.58s/it]

Faithful: 1
Detail: 1
Overall: 5
Test Question: When scalar and vector search are combined, will the scalar filter happen before or after the nearest neighbor (vector) search?
Predicted Answer: predicted_answer


Average Metric: 17.4 / 7  (248.6):  50%|████▌    | 7/14 [03:01<02:53, 24.78s/it]

Faithful: 5
Detail: 4
Overall: 5
Test Question: Regarding "filtered vector search": Since this is a two-phase pipeline, how big can that list of IDs get? Do you know how that size might affect query performance?
Predicted Answer: predicted_answer


Average Metric: 19.599999999999998 / 8  (245.0):  57%|▌| 8/14 [03:23<02:23, 23.9

Faithful: 1
Detail: 4
Overall: 5
Test Question: My Weaviate setup is using more memory than what I think is reasonable. How can I debug this?
Predicted Answer: predicted_answer


Average Metric: 21.2 / 9  (235.6):  64%|█████▊   | 9/14 [03:51<02:05, 25.07s/it]

Faithful: 1
Detail: 1
Overall: 5
Test Question: How can I print a stack trace of Weaviate?
Predicted Answer: predicted_answer


Average Metric: 24.0 / 10  (240.0):  71%|█████  | 10/14 [04:16<01:40, 25.04s/it]

Faithful: 5
Detail: 1
Overall: 3
Test Question: Can I request a feature in Weaviate?
Predicted Answer: predicted_answer


Average Metric: 27.4 / 11  (249.1):  79%|█████▌ | 11/14 [04:43<01:16, 25.59s/it]

Faithful: 5
Detail: 2
Overall: 5
Test Question: What is Weaviate's consistency model in a distributed setup?
Predicted Answer: predicted_answer


Average Metric: 30.599999999999998 / 12  (255.0):  86%|▊| 12/14 [05:07<00:50, 25

Faithful: 5
Detail: 1
Overall: 5
Test Question: With your aggregations I could not see how to do time buckets, is this possible?
Predicted Answer: predicted_answer


Average Metric: 34.199999999999996 / 13  (263.1):  93%|▉| 13/14 [05:30<00:24, 24

Faithful: 5
Detail: 3
Overall: 5
Test Question: How can I run the latest master branch with Docker Compose?
Predicted Answer: predicted_answer


Average Metric: 35.8 / 14  (255.7): 100%|███████| 14/14 [05:51<00:00, 25.14s/it]

Faithful: 1
Detail: 1
Overall: 5
Average Metric: 35.8 / 14  (255.7%)





Unnamed: 0,question,answer,llm_metric
0,Are all ANN algorithms potential candidates to become an indexation plugin in Weaviate?,"The given context does not provide any information about whether all ANN algorithms are potential candidates to become an indexation plugin in Weaviate. Therefore, we...",3.6
1,Does Weaviate use pre- or post-filtering ANN index search?,The given context does not provide information about whether Weaviate uses pre- or post-filtering ANN index search.,1.8
2,How does Weaviate's vector and scalar filtering work?,"The given context does not provide any information about how Weaviate's vector and scalar filtering work. Therefore, we cannot determine the answer based on the...",1.8
3,"What would you say is more important for query speed in Weaviate: More CPU power, or more RAM?","Based on the given information, it is not explicitly stated whether more CPU power or more RAM is more important for query speed in Weaviate....",4.0
4,"Data import takes long / is slow, what is causing this and what can I do?","The given context does not provide specific information about the causes of slow data import in Weaviate or the possible solutions. Therefore, we cannot determine...",0.8


255.71

In [70]:
evaluate(third_compiled_rag, metric=llm_metric)


  0%|                                                    | 0/14 [00:00<?, ?it/s]

Test Question: Are all ANN algorithms potential candidates to become an indexation plugin in Weaviate?
Predicted Answer: predicted_answer


Average Metric: 3.2 / 1  (320.0):   7%|▋         | 1/14 [00:34<07:25, 34.28s/it]

Faithful: 3
Detail: 5
Overall: 5
Test Question: Does Weaviate use pre- or post-filtering ANN index search?
Predicted Answer: predicted_answer


Average Metric: 6.0 / 2  (300.0):  14%|█▍        | 2/14 [00:59<05:45, 28.75s/it]

Faithful: 2
Detail: 5
Overall: 5
Test Question: How does Weaviate's vector and scalar filtering work?
Predicted Answer: predicted_answer


Average Metric: 9.4 / 3  (313.3):  21%|██▏       | 3/14 [01:23<04:51, 26.51s/it]

Faithful: 5
Detail: 3
Overall: 4
Test Question: What would you say is more important for query speed in Weaviate: More CPU power, or more RAM?
Predicted Answer: predicted_answer


Average Metric: 13.4 / 4  (335.0):  29%|██▌      | 4/14 [01:48<04:22, 26.28s/it]

Faithful: 5
Detail: 5
Overall: 5
Test Question: Data import takes long / is slow, what is causing this and what can I do?
Predicted Answer: predicted_answer


Average Metric: 17.4 / 5  (348.0):  36%|███▏     | 5/14 [02:11<03:43, 24.84s/it]

Faithful: 5
Detail: 5
Overall: 5
Test Question: How can slow queries be optimized?
Predicted Answer: predicted_answer


Average Metric: 20.799999999999997 / 6  (346.7):  43%|▍| 6/14 [02:36<03:20, 25.0

Faithful: 5
Detail: 3
Overall: 4
Test Question: When scalar and vector search are combined, will the scalar filter happen before or after the nearest neighbor (vector) search?
Predicted Answer: predicted_answer


Average Metric: 23.199999999999996 / 7  (331.4):  50%|▌| 7/14 [03:04<03:01, 25.8

Faithful: 2
Detail: 5
Overall: 3
Test Question: Regarding "filtered vector search": Since this is a two-phase pipeline, how big can that list of IDs get? Do you know how that size might affect query performance?
Predicted Answer: predicted_answer


Average Metric: 26.999999999999996 / 8  (337.5):  57%|▌| 8/14 [03:31<02:37, 26.3

Faithful: 5
Detail: 5
Overall: 4
Test Question: My Weaviate setup is using more memory than what I think is reasonable. How can I debug this?
Predicted Answer: predicted_answer


Average Metric: 30.799999999999997 / 9  (342.2):  64%|▋| 9/14 [03:56<02:09, 25.8

Faithful: 5
Detail: 4
Overall: 5
Test Question: How can I print a stack trace of Weaviate?
Predicted Answer: predicted_answer


Average Metric: 34.8 / 10  (348.0):  71%|█████  | 10/14 [04:22<01:43, 25.98s/it]

Faithful: 5
Detail: 5
Overall: 5
Test Question: Can I request a feature in Weaviate?
Predicted Answer: predicted_answer


Average Metric: 38.8 / 11  (352.7):  79%|█████▌ | 11/14 [04:44<01:14, 24.75s/it]

Faithful: 5
Detail: 5
Overall: 5
Test Question: What is Weaviate's consistency model in a distributed setup?
Predicted Answer: predicted_answer


Average Metric: 42.8 / 12  (356.7):  86%|██████ | 12/14 [05:08<00:48, 24.38s/it]

Faithful: 5
Detail: 5
Overall: 5
Test Question: With your aggregations I could not see how to do time buckets, is this possible?
Predicted Answer: predicted_answer


Average Metric: 46.8 / 13  (360.0):  93%|██████▌| 13/14 [05:31<00:24, 24.08s/it]

Faithful: 5
Detail: 5
Overall: 5
Test Question: How can I run the latest master branch with Docker Compose?
Predicted Answer: predicted_answer


Average Metric: 48.4 / 14  (345.7): 100%|███████| 14/14 [05:55<00:00, 25.37s/it]

Faithful: 1
Detail: 5
Overall: 1
Average Metric: 48.4 / 14  (345.7%)





Unnamed: 0,question,answer,llm_metric
0,Are all ANN algorithms potential candidates to become an indexation plugin in Weaviate?,"No, not all ANN (Approximate Nearest Neighbor) algorithms are potential candidates to become an indexation plugin in Weaviate. The context mentions that Weaviate has implemented...",3.2
1,Does Weaviate use pre- or post-filtering ANN index search?,"Weaviate uses pre-filtering ANN index search. The context mentions that Weaviate implements the HNSW indexing algorithm, which builds a hierarchical representation of the index in...",2.8
2,How does Weaviate's vector and scalar filtering work?,The given context does not provide specific information about how Weaviate's vector and scalar filtering works. It mentions that Weaviate looks for approximate (close enough)...,3.4
3,"What would you say is more important for query speed in Weaviate: More CPU power, or more RAM?","Based on the information provided in the context, it is more important to have more RAM for query speed in Weaviate. The context mentions optimizations...",4.0
4,"Data import takes long / is slow, what is causing this and what can I do?","The context mentions that Weaviate offers a batch import feature for adding data objects in bulk, which is recommended for speeding up the import process....",4.0


345.71