# Retrieval Augmented Generation in Verba Notebook
## Using Weaviate

This notebook displays a small demo of how RAG works in Verba in its five stages: Reading, Chunking, Embedding, Retrieving, and Generating. Follow along the notebook to learn more! 

> (This demo uses the ADA model for the embeddings and GPT4-Turbo for generating the answer to the query)

## Step 01 - Setup Weaviate

In [1]:
!pip install weaviate-client



In [2]:
import os
import weaviate
from weaviate.embedded import EmbeddedOptions

# Setup Weaviate Embedded (runs locally)
client = weaviate.Client(
                additional_headers={"X-OpenAI-Api-Key": os.environ.get("OPENAI_API_KEY", "")},
                embedded_options=EmbeddedOptions(),
            )

Started /Users/edwardschmuhl/.cache/weaviate-embedded: process ID 50578


{"action":"startup","default_vectorizer_module":"none","level":"info","msg":"the default vectorizer modules is set to \"none\", as a result all new schema classes without an explicit vectorizer setting, will use this vectorizer","time":"2023-11-20T17:42:13+01:00"}
{"action":"startup","auto_schema_enabled":true,"level":"info","msg":"auto schema enabled setting is set to \"true\"","time":"2023-11-20T17:42:13+01:00"}
{"action":"hnsw_vector_cache_prefill","count":3000,"index_id":"document_oL4j0epV7Dcm","level":"info","limit":1000000000000,"msg":"prefilled vector cache","time":"2023-11-20T17:42:13+01:00","took":40333}
{"action":"hnsw_vector_cache_prefill","count":3000,"index_id":"chunk_z5kv7UEkQOX5","level":"info","limit":1000000000000,"msg":"prefilled vector cache","time":"2023-11-20T17:42:13+01:00","took":1678958}
{"action":"grpc_startup","level":"info","msg":"grpc server listening at [::]:50060","time":"2023-11-20T17:42:13+01:00"}
{"action":"restapi_management","level":"info","msg":"Serv

## Step 02 - Setup Classes
For our little RAG demo we'll create a Document and Chunk class.

In [3]:
# Check if client is ready, and clear all data
if client.is_ready():
    client.schema.delete_all()

SCHEMA_CHUNK = {
        "classes": [
            {
                "class": "Chunk",
                "vectorizer": "text2vec-openai",
                "moduleConfig": { 
                   "generative-openai": { 
                        "model": "gpt-4-1106-preview", 
                    }
                },
                "description": "Chunks of Documentations",
                "properties": [
                    {
                        "name": "text",
                        "dataType": ["text"],
                        "description": "Content of the document",
                    },
                    {
                        "name": "doc_uuid",
                        "dataType": ["text"],
                        "description": "Document UUID",
                    },
                    {
                        "name": "chunk_id",
                        "dataType": ["number"],
                        "description": "Document chunk from the whole document",
                    },
                ],
            }
        ]
    }

SCHEMA_DOCUMENT = {
        "classes": [
            {
                "class": "Document",
                "description": "Documentation",
                "properties": [
                    {
                        "name": "text",
                        "dataType": ["text"],
                        "description": "Content of the document",
                    },
                    {
                        "name": "doc_name",
                        "dataType": ["text"],
                        "description": "Document name",
                    },
                ],
            }
        ]
    }

client.schema.create(SCHEMA_CHUNK)
client.schema.create(SCHEMA_DOCUMENT)
for _class in client.schema.get()["classes"]:
    print(_class["class"])

Chunk
Document


{"action":"hnsw_vector_cache_prefill","count":1000,"index_id":"chunk_ysfbygP4DYw2","level":"info","limit":1000000000000,"msg":"prefilled vector cache","time":"2023-11-20T17:42:22+01:00","took":71042}
{"action":"hnsw_vector_cache_prefill","count":1000,"index_id":"document_nfurcsnAZkbc","level":"info","limit":1000000000000,"msg":"prefilled vector cache","time":"2023-11-20T17:42:22+01:00","took":87291}


## Step 03 - Load PDF to Python (Reading)
We're importing the paper: "Attention is all you need"

In [None]:
!pip install PyPDF2

In [4]:
from PyPDF2 import PdfReader

full_text = ""
reader = PdfReader("./data/attention_is_all_you_need.pdf")

for page in reader.pages:
    full_text += page.extract_text() + "\n\n"

document = { "text":full_text, "doc_name": "Attention_Is_All_You_Need", "chunks":[] }

print(document["text"][100:300])



 figures in this paper solely for use in journalistic or
scholarly works.
Attention Is All You Need
Ashish Vaswani∗
Google Brain
avaswani@google.comNoam Shazeer∗
Google Brain
noam@google.comNiki Parma


## Step 04 - Chunk Document (Chunking)

In [None]:
!pip install tiktoken

In [5]:
import tiktoken

units = 100
overlap = 50

encoding = tiktoken.encoding_for_model("gpt-3.5-turbo")
encoded_tokens = encoding.encode(document["text"], disallowed_special=())

document["chunks"] = []

i = 0
split_id_counter = 0
while i < len(encoded_tokens):
    # Overlap
    start_i = i
    end_i = min(i + units, len(encoded_tokens))

    chunk_tokens = encoded_tokens[start_i:end_i]
    chunk_text = encoding.decode(chunk_tokens)

    doc_chunk = {"text":chunk_text, "chunk_id":split_id_counter}
    document["chunks"].append(doc_chunk)
    split_id_counter += 1

    # Exit loop if this was the last possible chunk
    if end_i == len(encoded_tokens):
        break

    i += units - overlap  # Step forward, considering overlap

print(len(document["chunks"]))
print("----FIRST CHUNK----")
print(document["chunks"][5])
print("")
print("----SECOND CHUNK----")
print(document["chunks"][6])

203
----FIRST CHUNK----
{'text': ' recurrence and convolutions\nentirely. Experiments on two machine translation tasks show these models to\nbe superior in quality while being more parallelizable and requiring significantly\nless time to train. Our model achieves 28.4 BLEU on the WMT 2014 English-\nto-German translation task, improving over the existing best results, including\nensembles, by over 2 BLEU. On the WMT 2014 English-to-French translation task,\nour model establishes', 'chunk_id': 5}

----SECOND CHUNK----
{'text': ' WMT 2014 English-\nto-German translation task, improving over the existing best results, including\nensembles, by over 2 BLEU. On the WMT 2014 English-to-French translation task,\nour model establishes a new single-model state-of-the-art BLEU score of 41.8 after\ntraining for 3.5 days on eight GPUs, a small fraction of the training costs of the\nbest models from the literature. We show that the Transformer', 'chunk_id': 6}


## Step 05 - Embed into Weaviate (Embedding)

In [6]:
with client.batch as batch:
    batch.batch_size = 1
    properties = {
        "text": str(document["text"]),
        "doc_name": str(document["doc_name"]),
    }
    
    class_name = "Document"
    uuid = client.batch.add_data_object(properties, class_name)
    
    for chunk in document["chunks"]:
        chunk["doc_uuid"] = uuid


with client.batch as batch:
    batch.batch_size = len(document["chunks"])
    for i, chunk in enumerate(document["chunks"]):

        properties = {
            "text": chunk["text"],
            "doc_uuid": chunk["doc_uuid"],
            "chunk_id": chunk["chunk_id"],
        }
        class_name = "Chunk"
        client.batch.add_data_object(properties, class_name)

print("Done :)")

Done :)


## Step 06 - Retrieve Chunks based on Query (Retrieving)

In [7]:
from weaviate.gql.get import HybridFusion

query = "What is attention?"

results = client.query.get(
        class_name="Chunk",
        properties=[
            "text",
            "chunk_id",
            "doc_uuid",
        ]).with_additional(properties=["score"]).with_autocut(2).with_hybrid(query=query,fusion_type=HybridFusion.RELATIVE_SCORE,properties=["text"]).do()

print(len(results["data"]["Get"]["Chunk"]))
print(results["data"]["Get"]["Chunk"][0])

6
{'_additional': {'score': '0.75'}, 'chunk_id': 24, 'doc_uuid': 'bf38b308-172c-4394-9c32-16ee6c405682', 'text': '2.\nSelf-attention, sometimes called intra-attention is an attention mechanism relating different positions\nof a single sequence in order to compute a representation of the sequence. Self-attention has been\nused successfully in a variety of tasks including reading comprehension, abstractive summarization,\ntextual entailment and learning task-independent sentence representations [4, 27, 28, 22].\nEnd-to-end memory networks are based on a recurrent attention mechanism instead of sequence-\naligned recurrence and have been shown'}


## Step 07 - Use Generative AI to generate response to our Query (Retrieving)

In [8]:
results = client.query.get(
        class_name="Chunk",
        properties=[
            "text",
            "chunk_id",
            "doc_uuid",
        ]).with_generate(grouped_task = f"Answer {query} with the provided context").with_additional(properties=["score"]).with_autocut(2).with_hybrid(query=query,fusion_type=HybridFusion.RELATIVE_SCORE,properties=["text"]).do()

print(results["data"]["Get"]["Chunk"][0]["_additional"]["generate"]["groupedResult"])

Attention, in the context provided, refers to a mechanism in the field of machine learning and, more specifically, in the development of neural network architectures such as the Transformer model. It is a technique that allows the model to focus on different parts of the input data (which can be a sequence of words, for example) when performing a task like translation, summarization, or comprehension.

The attention function maps a query and a set of key-value pairs to an output, with all these elements being represented as vectors. The output is computed as a weighted sum, where the weights are determined by the relevance or 'attention' that the model assigns to each input element in relation to the query.

Self-attention, or intra-attention, is a specific type of attention mechanism that relates different positions of a single sequence to compute a representation of that sequence. This allows the model to consider the context within the sequence, which is particularly useful for task