[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/weaviate/recipes/blob/main/integrations/llm-agent-frameworks/dspy/llms/Llama3.ipynb)

# Llama3

Learn more about Llama3 in Meta's [release notes!](https://ai.meta.com/blog/meta-llama-3/)

Massive thank you to our friends at [Ollama](https://ollama.com/library/llama3:latest) for supporting this so quickly!

This notebook will:

1. Show you how to build a RAG system with Llama3, Ollama, Weaviate, and DSPy
2. Use DSPy's MIPRO optimizer to find the optimal RAG prompt for Llama3

Please note the optimal prompt is not the same for all language models! We have recently published a blog post explaining this [here](https://weaviate.io/blog/dspy-optimizers) if interested.

### Connect to Llama3 (hosted with Ollama) and Weaviate

In [1]:
import dspy
llama3_ollama = dspy.OllamaLocal(model="llama3:8b-instruct-q5_1", max_tokens=4000, timeout_s=480)

import weaviate
from dspy.retrieve.weaviate_rm import WeaviateRM
weaviate_client = weaviate.connect_to_local()
retriever_model = WeaviateRM("WeaviateBlogChunk", weaviate_client=weaviate_client, k=10)

dspy.settings.configure(lm=llama3_ollama, rm=retriever_model)

  from .autonotebook import tqdm as notebook_tqdm


In [2]:
llama3_ollama("say hello")

['Hello!']

### Load Dataset (Questions derived from Weaviate's Blog Posts)

In [3]:
import json

file_path = './WeaviateBlogRAG-0-0-0.json'
with open(file_path, 'r') as file:
    dataset = json.load(file)

gold_answers = []
queries = []

for row in dataset:
    gold_answers.append(row["gold_answer"])
    queries.append(row["query"])
    
data = []

for i in range(len(gold_answers)):
    data.append(dspy.Example(gold_answer=gold_answers[i], question=queries[i]).with_inputs("question"))

trainset, devset, testset = data[:25], data[25:35], data[35:]

# Metric to Assess Response Quality 

In [4]:
class TypedEvaluator(dspy.Signature):
    """Evaluate the quality of a system's answer to a question according to a given criterion."""
    
    criterion: str = dspy.InputField(desc="The evaluation criterion.")
    question: str = dspy.InputField(desc="The question asked to the system.")
    ground_truth_answer: str = dspy.InputField(desc="An expert written Ground Truth Answer to the question.")
    predicted_answer: str = dspy.InputField(desc="The system's answer to the question.")
    rating: float = dspy.OutputField(desc="A float rating between 1 and 5. IMPORTANT!! ONLY OUTPUT THE RATING!!")


def MetricWrapper(gold, pred, trace=None):
    alignment_criterion = "How aligned is the predicted_answer with the ground_truth?"
    return dspy.TypedPredictor(TypedEvaluator)(criterion=alignment_criterion,
                                          question=gold.question,
                                          ground_truth_answer=gold.gold_answer,
                                          predicted_answer=pred.answer).rating

### DSPy RAG Program 

In [5]:
class GenerateAnswer(dspy.Signature):
    """Assess the the context and answer the question."""

    context = dspy.InputField(desc="Helpful information for answering the question.")
    question = dspy.InputField()
    answer = dspy.OutputField(desc="A detailed answer that is supported by the context. ONLY OUTPUT THE ANSWER!!")
    
class RAG(dspy.Module):
    def __init__(self, k=3):
        super().__init__()
        
        self.retrieve = dspy.Retrieve(k=k)
        self.generate_answer = dspy.Predict(GenerateAnswer)
    
    def forward(self, question):
        context = self.retrieve(question).passages
        pred = self.generate_answer(context=context, question=question).answer
        return dspy.Prediction(context=context, answer=pred, question=question)

# Run!

In [6]:
print(RAG()("What is binary quantization?").answer)

Binary Quantization is a technique that translates vectors into a binary sequence, where each element in the vector is represented as either 0 or 1. This process condenses the information in the original vector while preserving its semantic structure. The Hamming distance between two strings can be computed by comparing the position of each bit in the sequence.


# Compile with MIPRO

What is the optimal prompt for Llama3 when answering questions about Weaviate?

Starting with the prompt,

`Assess the context and answer the question.`

DSPy's MIPRO optimizers finds better performance with,

`Given the provided context, your task is to understand the content and accurately answer the question based on the information available in the context. You should use formal English with technical terminologies where necessary and provide a detailed, relevant response.`


In [12]:
from dspy.teleprompt import MIPRO

import openai
gpt4 = dspy.OpenAI(model="gpt-4", max_tokens=4000, model_type="chat")

teleprompter = MIPRO(prompt_model=gpt4, 
                     task_model=llama3_ollama, 
                     metric=MetricWrapper, 
                     num_candidates=3, 
                     init_temperature=0.5)
kwargs = dict(num_threads=1, 
              display_progress=True, 
              display_table=0)
MIPRO_compiled_RAG = teleprompter.compile(RAG(), trainset=trainset[:5], num_trials=3, max_bootstrapped_demos=1, max_labeled_demos=0, eval_kwargs=kwargs)


Please be advised that based on the parameters you have set, the maximum number of LM calls is projected as follows:

[93m- Task Model: [94m[1m5[0m[93m examples in dev set * [94m[1m3[0m[93m trials * [94m[1m# of LM calls in your program[0m[93m = ([94m[1m15 * # of LM calls in your program[0m[93m) task model calls[0m
[93m- Prompt Model: # data summarizer calls (max [94m[1m10[0m[93m) + [94m[1m3[0m[93m * [94m[1m1[0m[93m lm calls in program = [94m[1m13[0m[93m prompt model calls[0m

[93m[1mEstimated Cost Calculation:[0m

[93mTotal Cost = (Number of calls to task model * (Avg Input Token Length per Call * Task Model Price per Input Token + Avg Output Token Length per Call * Task Model Price per Output Token) 
            + (Number of calls to prompt model * (Avg Input Token Length per Call * Task Prompt Price per Input Token + Avg Output Token Length per Call * Prompt Model Price per Output Token).[0m

For a preliminary estimate of potential costs, we 


  0%|                                                     | 0/5 [00:00<?, ?it/s][A
 20%|█████████                                    | 1/5 [00:04<00:18,  4.71s/it][A

  0%|                                                     | 0/5 [00:00<?, ?it/s][A
 20%|█████████                                    | 1/5 [00:04<00:17,  4.49s/it][A
[I 2024-04-18 15:30:02,492] A new study created in memory with name: no-name-8e977397-d3d2-48f3-b319-fe46d1f20a25


Starting trial #0

  0%|                                                     | 0/5 [00:00<?, ?it/s][A
Average Metric: 4.0 / 1  (400.0):   0%|                   | 0/5 [00:07<?, ?it/s][A
Average Metric: 4.0 / 1  (400.0):  20%|██▏        | 1/5 [00:07<00:29,  7.39s/it][A
Average Metric: 8.0 / 2  (400.0):  20%|██▏        | 1/5 [00:14<00:29,  7.39s/it][A
Average Metric: 8.0 / 2  (400.0):  40%|████▍      | 2/5 [00:14<00:21,  7.18s/it][A
Average Metric: 12.0 / 3  (400.0):  40%|████      | 2/5 [00:22<00:21,  7.18s/it][A
Average Metric: 12.0 / 3  (400.0):  60%|██████    | 3/5 [00:22<00:15,  7.64s/it][A
Average Metric: 16.0 / 4  (400.0):  60%|██████    | 3/5 [00:28<00:15,  7.64s/it][A
Average Metric: 16.0 / 4  (400.0):  80%|████████  | 4/5 [00:28<00:06,  6.93s/it][A
Average Metric: 20.0 / 5  (400.0):  80%|████████  | 4/5 [00:46<00:06,  6.93s/it][A
Average Metric: 20.0 / 5  (400.0): 100%|██████████| 5/5 [00:46<00:00,  9.34s/it][A

[I 2024-04-18 15:30:49,208] Trial 0 finished with value: 400.0 and parameters: {'5971981264_predictor_instruction': 1, '5971981264_predictor_demos': 0}. Best is trial 0 with value: 400.0.



Starting trial #1

  0%|                                                     | 0/5 [00:00<?, ?it/s][A
Average Metric: 4.0 / 1  (400.0):   0%|                   | 0/5 [00:09<?, ?it/s][A
Average Metric: 4.0 / 1  (400.0):  20%|██▏        | 1/5 [00:09<00:37,  9.43s/it][A
Average Metric: 8.0 / 2  (400.0):  20%|██▏        | 1/5 [00:19<00:37,  9.43s/it][A
Average Metric: 8.0 / 2  (400.0):  40%|████▍      | 2/5 [00:19<00:29,  9.80s/it][A
Average Metric: 12.0 / 3  (400.0):  40%|████      | 2/5 [00:27<00:29,  9.80s/it][A
Average Metric: 12.0 / 3  (400.0):  60%|██████    | 3/5 [00:27<00:18,  9.07s/it][A
Average Metric: 16.0 / 4  (400.0):  60%|██████    | 3/5 [00:35<00:18,  9.07s/it][A
Average Metric: 16.0 / 4  (400.0):  80%|████████  | 4/5 [00:35<00:08,  8.51s/it][A
Average Metric: 20.0 / 5  (400.0):  80%|████████  | 4/5 [00:52<00:08,  8.51s/it][A
Average Metric: 20.0 / 5  (400.0): 100%|██████████| 5/5 [00:52<00:00, 10.45s/it][A

[I 2024-04-18 15:31:41,450] Trial 1 finished with value: 400.0 and parameters: {'5971981264_predictor_instruction': 1, '5971981264_predictor_demos': 2}. Best is trial 0 with value: 400.0.



Starting trial #2

  0%|                                                     | 0/5 [00:00<?, ?it/s][A
Average Metric: 2.0 / 1  (200.0):   0%|                   | 0/5 [00:11<?, ?it/s][A
Average Metric: 2.0 / 1  (200.0):  20%|██▏        | 1/5 [00:11<00:44, 11.25s/it][A
Average Metric: 6.0 / 2  (300.0):  20%|██▏        | 1/5 [00:18<00:44, 11.25s/it][A
Average Metric: 6.0 / 2  (300.0):  40%|████▍      | 2/5 [00:18<00:26,  8.86s/it][A
Average Metric: 10.0 / 3  (333.3):  40%|████      | 2/5 [00:22<00:26,  8.86s/it][A
Average Metric: 10.0 / 3  (333.3):  60%|██████    | 3/5 [00:22<00:13,  6.88s/it][A
Average Metric: 14.0 / 4  (350.0):  60%|██████    | 3/5 [00:33<00:13,  6.88s/it][A
Average Metric: 14.0 / 4  (350.0):  80%|████████  | 4/5 [00:33<00:08,  8.23s/it][A
Average Metric: 18.0 / 5  (360.0):  80%|████████  | 4/5 [00:52<00:08,  8.23s/it][A
Average Metric: 18.0 / 5  (360.0): 100%|██████████| 5/5 [00:52<00:00, 10.42s/it][A

[I 2024-04-18 15:32:33,545] Trial 2 finished with value: 360.0 and parameters: {'5971981264_predictor_instruction': 0, '5971981264_predictor_demos': 2}. Best is trial 0 with value: 400.0.



Returning generate_answer = Predict(StringSignature(context, question -> answer
    instructions='Given the provided context, your task is to understand the content and accurately answer the question based on the information available in the context. You should use formal English with technical terminologies where necessary and provide a detailed, relevant response.'
    context = Field(annotation=str required=True json_schema_extra={'desc': 'Helpful information for answering the question.', '__dspy_field_type': 'input', 'prefix': 'Context:'})
    question = Field(annotation=str required=True json_schema_extra={'__dspy_field_type': 'input', 'prefix': 'Question:', 'desc': '${question}'})
    answer = Field(annotation=str required=True json_schema_extra={'desc': 'A detailed answer that is supported by the context. ONLY OUTPUT THE ANSWER!!', '__dspy_field_type': 'output', 'prefix': 'The answer to the question based on the context is:'})
))
trial_logs[0][program].generate_answer = Predict

In [13]:
MIPRO_compiled_RAG("what are cross encoders?").answer

'Cross Encoders are one of the most well-known ranking models for content-based re-ranking, achieving high in-domain accuracy. They can be used with Weaviate using a specific syntax and can benefit from being chained behind Bi-Encoders in a multistage search pipeline to retrieve a list of result candidates and then rerank them for more accurate results.'

In [14]:
llama3_ollama.inspect_history(n=1)




Given the provided context, your task is to understand the content and accurately answer the question based on the information available in the context. You should use formal English with technical terminologies where necessary and provide a detailed, relevant response.

---

Follow the following format.

Context: Helpful information for answering the question.
Question: ${question}
The answer to the question based on the context is: A detailed answer that is supported by the context. ONLY OUTPUT THE ANSWER!!

---

Context:
[1] «[Cross Encoders](#cross-encoders) (collapsing the use of Large Language Models for ranking into this category as well)
1. [Metadata Rankers](#metadata-rankers)
1. [Score Rankers](#score-rankers)

## Cross Encoders
Cross Encoders are one of the most well known ranking models for content-based re-ranking. There is quite a collection of pre-trained cross encoders available on [sentence transformers](https://www.sbert.net/docs/pretrained_cross-encoders.html). We

"\n\n\nGiven the provided context, your task is to understand the content and accurately answer the question based on the information available in the context. You should use formal English with technical terminologies where necessary and provide a detailed, relevant response.\n\n---\n\nFollow the following format.\n\nContext: Helpful information for answering the question.\nQuestion: ${question}\nThe answer to the question based on the context is: A detailed answer that is supported by the context. ONLY OUTPUT THE ANSWER!!\n\n---\n\nContext:\n[1] «[Cross Encoders](#cross-encoders) (collapsing the use of Large Language Models for ranking into this category as well)\n1. [Metadata Rankers](#metadata-rankers)\n1. [Score Rankers](#score-rankers)\n\n## Cross Encoders\nCross Encoders are one of the most well known ranking models for content-based re-ranking. There is quite a collection of pre-trained cross encoders available on [sentence transformers](https://www.sbert.net/docs/pretrained_cr