[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/weaviate/recipes/blob/main/integrations/llm-agent-frameworks/dspy/llms/Gemini-1.5-Pro-and-Flash.ipynb)

# Gemini Evaluation 5/17/24

Hey everyone! Welcome to our notebook evaluating updates to Gemini at Google I/O 2024. Gemini **1.5 Pro** has released a new state-of-the-art input length for LLMs with 2 million tokens. Gemini **Flash** has also been released, offering faster and cheaper inference with a 1 million token input window.

### There are 3 main parts to this notebook:

1. Needle in the Haystack Test
2. Gemini for Re-Ranking
3. Many-Shot In-Context Learning with Gemini and Command R

# Setup

See Google Models avialable with `generativeai`

```python
import google.generativeai as genai

for value in genai.list_models():
    print(value)
```

In [115]:
import dspy

gemini_pro_1_5 = dspy.Google(model="gemini-1.5-pro-latest", api_key=google_api_key)
gemini_flash = dspy.Google(model="gemini-1.5-flash-latest", api_key=google_api_key)
command_r = dspy.Cohere(model="command-r", 
                        max_input_tokens=32_000, max_tokens=4_000, api_key=cohere_api_key)

lms = [{"name": "Gemini Flash", "lm": gemini_flash},
       {"name": "Gemini Pro", "lm": gemini_pro_1_5},
       {"name": "Command R", "lm": command_r}]

dspy.settings.configure(lm=gemini_pro_1_5)

In [86]:
question = "How can you make Approximate Nearest Neighbor Search faster and cheaper?"

for lm_dict in lms:
    lm, name = lm_dict["lm"], lm_dict["name"]
    with dspy.context(lm=lm):
        print(f"\033[91mResult for {name}\n")
        print(f"\033[0m{lm(question)[0]} \n")

[91mResult for Gemini Flash

[0m## Making Approximate Nearest Neighbor Search Faster and Cheaper

Approximate Nearest Neighbor Search (ANNS) is a crucial technique for many applications, but it can be computationally expensive. Here are some strategies to make it faster and cheaper:

**1. Data Preprocessing:**

* **Dimensionality Reduction:** Reduce the number of dimensions in your data using techniques like PCA, t-SNE, or Autoencoders. This can significantly speed up search and reduce storage costs.
* **Data Clustering:** Cluster similar data points together. This allows you to search within smaller clusters, reducing the search space.
* **Data Indexing:** Create efficient data structures like k-d trees, ball trees, or hash tables to index your data. This allows for faster retrieval of potential nearest neighbors.

**2. Search Algorithm Optimization:**

* **Approximate Search Algorithms:** Use algorithms like Locality Sensitive Hashing (LSH), k-d tree search with approximate distanc

# Test 2M Token Window

# Needle in the Haystack Test

In [35]:
# Connect to Weaviate
from dspy.retrieve.weaviate_rm import WeaviateRM
import weaviate

weaviate_blog_index = WeaviateRM("WeaviateBlogChunk", weaviate_client=weaviate.connect_to_local())

In [29]:
from typing import List

def format_RM_results(results: List[str]) -> str:
    results = [result["long_text"] for result in results]
    results = "\n".join([f"[{i+1}] {item}" for i, item in enumerate(results)]) 
    return results

question = "How does quantization help Vector Databases?"

search_results = format_RM_results(weaviate_blog_index(question, k=50))
print(search_results)

[1] Check out one of our free weekly workshops to help you understand what vector databases are and how they can help you build production-ready AI apps quickly and easily. If you’re curious, here are some of the most commonly asked questions we encountered:

**What’s the difference between a vector database and a graph or relational database?**

Graph databases are used to identify relationships between objects, and vector databases are used to find objects
Relational databases store the relations between tables and build indexes for the fast lookup of joined tables. Vector databases, on the other hand, store the embeddings of structured and unstructured data for the quick retrieval of the objects. **What features does Weaviate offer to help me protect data privacy?**

Weaviate is designed with robust security measures to ensure it meets the requirements of enterprise environments. Weaviate has achieved SOC 2 certification and is encrypted in transit and at rest.
[2] ---
title: How to

In [30]:
class NeedleInTheHaystack(dspy.Signature):
    """Given a long context and a question, find the answer in the context."""
    
    long_context = dspy.InputField()
    question = dspy.InputField()
    answer = dspy.OutputField()

rag = dspy.ChainOfThought(NeedleInTheHaystack)

In [32]:
# Answer contained in result 18
rag(long_context=results, question="When did Weaviate publish their blog post about Vamana and HNSW?")

Prediction(
    rationale='Reasoning: Let\'s think step by step in order to find the date the blog post about Vamana and HNSW was published. We can look for the title "Vamana vs. HNSW - Exploring ANN algorithms Part 1" and find the date associated with it.',
    answer='October 11, 2022'
)

# Re-ranking Long Inputs

In [33]:
class ContextRelevanceCounter(dspy.Signature):
    """Given a numbered list of responses from a search engine to a query, count how many of them are relevant to the query."""
    
    numbered_search_results: str = dspy.InputField()
    query: str = dspy.InputField()
    number_of_relevant_results: int = dspy.OutputField()
    
context_relevance_counter = dspy.TypedChainOfThought(ContextRelevanceCounter)

In [34]:
context_relevance_counter(numbered_search_results=results, query="How does quantization help with Vector Databases?")

Prediction(
    reasoning="Reasoning: Let's think step by step in order to produce the number of relevant results. We are looking for articles that discuss quantization in the context of vector databases. Articles 1, 4, 10, 16, 19, 23, and 38 all discuss quantization in this context.",
    number_of_relevant_results=7
)

# RAG with *Many* (Question, Context, Rationale, Answer) tuples

Agarwal et al. from Google DeepMind have recently published [**Many-Shot In-Context Learning**](https://arxiv.org/abs/2404.11018) on April 17th, 2024.

Jiang et al. from Stanford University (including Andrew Ng) have also published [**Many-Shot In-Context Learning in Multimodal Foundation Models**](https://arxiv.org/abs/2405.09798) on May 16th, 2024.

As a little bit of background,

1. Prior to GPT-3, Machine Learning was achieved mostly by Supervised Learning with a large number of labeled examples.

2. GPT-3 then changed the game with *In-Context Learning*, showing that you could perform a task by providing a **few** examples of the task in the input, rather than needing additional gradient descent training.

3. ChatGPT then came along and showed the power of *Reinforcement Learning from Human Feedback*, shifting the AI world into Instruction following and formatting, rather than example labeling.

4. Many-Shot In-Context Learning powered by LLMs **could** cause another paradigm shift in how we get AI systems to perform tasks.


![Alt text](./images/many-shot.png "Optional title")

Image taken from **Many-Shot In-Context Learning** by Agarwal et al. 2024.

In [104]:
# There is a bug in `BootstrapFewShot` when directly passing in the weaviate_blog_index
# something with calling .deep_copy() on something with grpc, will fix later

dspy.settings.configure(rm=weaviate_blog_index)

In [167]:
class GenerateAnswer(dspy.Signature):
    """Assess the context and answer the question."""
    
    context = dspy.InputField(desc="may contain relevant facts")
    question = dspy.InputField()
    answer = dspy.OutputField()

class RAG(dspy.Module):
    def __init__(self):
        super().__init__()
        
        self.retrieve = dspy.Retrieve()
        # Note this `ChainOfThought`, 
        # => maybe the most important part of synthetic example generation
        # as we imagine most RAG apps already have some kind of quesiton,answer gold dataset
        # BUT => they do not have rationales of this praticular structure
        # further, if the program is more complex than this, it will help to have intermediate labeling
        self.generate_answer = dspy.ChainOfThought(GenerateAnswer)
    
    def forward(self, question, k=5):
        context = self.retrieve(question, k=k).passages
        pred = self.generate_answer(context=context, question=question).answer
        return pred

In [168]:
import json

file_path = './WeaviateBlogRAG-0-0-0.json'
with open(file_path, 'r') as file:
    dataset = json.load(file)

gold_answers = []
queries = []

for row in dataset:
    gold_answers.append(row["gold_answer"])
    queries.append(row["query"])
    
data = []

for i in range(len(gold_answers)):
    data.append(dspy.Example(gold_answer=gold_answers[i], question=queries[i]).with_inputs("question"))

# converting all examples into Many-Shot Examples,
# ToDo - add train / dev / test splits

In [169]:
print(len(data))

50


In [172]:
from dspy.teleprompt import BootstrapFewShot
teacher_settings = {"lm": command_r}

# optionally add a metric to assess the quality of synthetic examples
compiler = BootstrapFewShot(teacher_settings=teacher_settings,
                           max_bootstrapped_demos=50)

many_shot_compiled_rag = compiler.compile(RAG(), trainset=data)

100%|███████████████████████████████████████████| 50/50 [06:24<00:00,  7.68s/it]


In [177]:
question = "How will Long Context LLMs impact re-ranker models?"

for lm_dict in lms:
    lm, name = lm_dict["lm"], lm_dict["name"]
    with dspy.context(lm=lm):
        print(f"\033[91mResult for {name}\n")
        print(f"\033[0m{many_shot_compiled_rag(question)} \n")

[91mResult for Gemini Flash

[0mLong Context LLMs can potentially improve the performance of re-ranker models by providing them with more context and information to work with. This can lead to more accurate and relevant rankings, as the re-ranker model can better understand the relationship between the query and the retrieved documents. However, the context also mentions that there are trade-offs between performance and latency when using Long Context LLMs, which may need to be considered when implementing them in re-ranker models. 

[91mResult for Gemini Pro

[0mReasoning: Let's think step by step in order to produce the answer. The provided context does not discuss the impact of Long Context LLMs on re-ranker models. Therefore, I cannot answer your question.

Answer: The provided context does not discuss the impact of Long Context LLMs on re-ranker models. Therefore, I cannot answer your question. 

[91mResult for Command R

[0mLong Context LLMs will likely have a significant i

# Many-Shot In-Context Learning Visualized

In the example above, Gemini Pro 1.5, Gemini Flash, and Command R all see **50** examples of `(question, context, answer)` tuples before the current inference!

In [178]:
gemini_pro_1_5.inspect_history(n=1)




Assess the context and answer the question.

---

Follow the following format.

Context: may contain relevant facts

Question: ${question}

Reasoning: Let's think step by step in order to ${produce the answer}. We ...

Answer: ${answer}

---

Context:
[1] «Note, the current implementation of hybrid search in Weaviate uses BM25/BM25F and vector search. If you’re interested to learn about how dense vector indexes are built and optimized in Weaviate, check out this [article](/blog/why-is-vector-search-so-fast). ### BM25
BM25 builds on the keyword scoring method [TF-IDF](https://en.wikipedia.org/wiki/Tf%E2%80%93idf) (Term-Frequency Inverse-Document Frequency) by taking the [Binary Independence Model](https://en.wikipedia.org/wiki/Binary_Independence_Model) from the IDF calculation and adding a normalization penalty that weighs a document’s length relative to the average length of all the documents in the database. The image below presents the scoring calculation of BM25:
![BM25 calculat

'\n\n\nAssess the context and answer the question.\n\n---\n\nFollow the following format.\n\nContext: may contain relevant facts\n\nQuestion: ${question}\n\nReasoning: Let\'s think step by step in order to ${produce the answer}. We ...\n\nAnswer: ${answer}\n\n---\n\nContext:\n[1] «Note, the current implementation of hybrid search in Weaviate uses BM25/BM25F and vector search. If you’re interested to learn about how dense vector indexes are built and optimized in Weaviate, check out this [article](/blog/why-is-vector-search-so-fast). ### BM25\nBM25 builds on the keyword scoring method [TF-IDF](https://en.wikipedia.org/wiki/Tf%E2%80%93idf) (Term-Frequency Inverse-Document Frequency) by taking the [Binary Independence Model](https://en.wikipedia.org/wiki/Binary_Independence_Model) from the IDF calculation and adding a normalization penalty that weighs a document’s length relative to the average length of all the documents in the database. The image below presents the scoring calculation o