# Advanced RAG with LlamaIndex

This notebook walks your through an advanced Retrieval-Augmented Generation (RAG) pipeline using LlamaIndex and Weaviate.

You can find the related notebook for a naive RAG pipeline here: [Naive RAG](https://github.com/weaviate/recipes/blob/main/integrations/llamaindex/retrieval-augmented-generation/naive_rag.ipyn).

## Prerequisites

### Installation
First, you need to install the required packages for LlamaIndex and Weaviate.

In [1]:
# !pip3 install llama_index

# Use this to upgrade from an older llama_index version to v0.10.1
# !pip3 uninstall llama-index
# !pip3 install llama-index --upgrade --no-cache-dir --force-reinstall

# ------------------------------------------------------------------------------------ #

# !pip3 install -U weaviate-client
# !pip3 install llama-index-vector-stores-weaviate

# ------------------------------------------------------------------------------------ #

# !pip3 install pypdf torch pytorch torchvision #transformers  sentence-transformers

import llama_index
from importlib.metadata import version

version('llama_index')

'0.10.1'

### Set your OpenAI API key

This tutorial uses an embedding model and LLM from OpenAI, for which you will need an API key set as an evironment variable.

In [2]:
import os
from dotenv import load_dotenv,find_dotenv

# Use this line of code if you have a local .env file
load_dotenv(find_dotenv()) 

# Or set it like this
# os.environ["OPENAI_API_KEY"] = "sk-..."

# Print this line to double check your API key
# print(os.environ["OPENAI_API_KEY"])

True

## Step 1: Define Embedding Model and LLM

First, you can define an embedding model and LLM in a global settings object. By doing this, you don't have to specify the models explicitly in the code again.

* Embedding model: used to generate vector embeddings for the document chunks as well as the query.
* LLM: used to generate an answer based on the user query and the relevant context.

*Note, that you can specify  an embedding model (vectorizer module) and an LLM (generative module) in Weaviate as well but in this case the LLM and embedding model defined in Llamaindex will be used.*

In [3]:
from llama_index.embeddings.openai import OpenAIEmbedding
from llama_index.llms.openai import OpenAI
from llama_index.core.settings import Settings

Settings.llm = OpenAI(model="gpt-3.5-turbo", temperature=0.1)
Settings.embed_model = OpenAIEmbedding()

## Step 2: Load data

Next, you will download and load the data to be stored in the external knowledge source.

In [4]:
!mkdir -p 'data'
!wget 'https://raw.githubusercontent.com/run-llama/llama_index/main/docs/examples/data/paul_graham/paul_graham_essay.txt' -O 'data/paul_graham_essay.txt'

--2024-02-14 12:14:31--  https://raw.githubusercontent.com/run-llama/llama_index/main/docs/examples/data/paul_graham/paul_graham_essay.txt
Resolving raw.githubusercontent.com (raw.githubusercontent.com)... 2606:50c0:8000::154, 2606:50c0:8001::154, 2606:50c0:8002::154, ...
Connecting to raw.githubusercontent.com (raw.githubusercontent.com)|2606:50c0:8000::154|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 75042 (73K) [text/plain]
Saving to: ‘data/paul_graham_essay.txt’


2024-02-14 12:14:31 (7,20 MB/s) - ‘data/paul_graham_essay.txt’ saved [75042/75042]



In [5]:
from llama_index.core import SimpleDirectoryReader

# Load data
documents = SimpleDirectoryReader(
        input_files=["./data/paul_graham_essay.txt"]
).load_data()

documents

[Document(id_='42c51cb1-85f4-4a4f-84e4-c829296754bc', embedding=None, metadata={'file_path': 'data/paul_graham_essay.txt', 'file_name': 'paul_graham_essay.txt', 'file_type': 'text/plain', 'file_size': 75042, 'creation_date': '2024-02-14', 'last_modified_date': '2024-02-14', 'last_accessed_date': '2024-02-14'}, excluded_embed_metadata_keys=['file_name', 'file_type', 'file_size', 'creation_date', 'last_modified_date', 'last_accessed_date'], excluded_llm_metadata_keys=['file_name', 'file_type', 'file_size', 'creation_date', 'last_modified_date', 'last_accessed_date'], relationships={}, text='\n\nWhat I Worked On\n\nFebruary 2021\n\nBefore college the two main things I worked on, outside of school, were writing and programming. I didn\'t write essays. I wrote what beginning writers were supposed to write then, and probably still are: short stories. My stories were awful. They had hardly any plot, just characters with strong feelings, which I imagined made them deep.\n\nThe first programs I

## Step 3: Chunk documents into Nodes

As the whole document is too large to fit into the context window of the LLM, you will need to partition it into smaller text chunks, which are called `Nodes` in LlamaIndex.

With the `SentenceWindowNodeParser` each sentence is stored as a chunk together with a larger window of text surrounding the original sentence as metadata. 

In [6]:
from llama_index.core.node_parser import SentenceWindowNodeParser

# create the sentence window node parser w/ default settings
node_parser = SentenceWindowNodeParser.from_defaults(
    window_size=3,
    window_metadata_key="window",
    original_text_metadata_key="original_text",
)

# Extract nodes from documents
nodes = node_parser.get_nodes_from_documents(documents)

Below you can see an example of the `i`th node:

In [7]:
# This block of code is for educational purposes 
# to showcase what the nodes looks like
i=100

print(f"Text: \n{nodes[i].text}")
print("------------------")
print(f"Window: \n{nodes[i].metadata['window']}")

Text: 
You wouldn't have a boss, or even need to get research funding.


------------------
Window: 
And moreover this was something you could make a living doing.  Not as easily as you could by writing software, of course, but I thought if you were really industrious and lived really cheaply, it had to be possible to make enough to survive.  And as an artist you could be truly independent.  You wouldn't have a boss, or even need to get research funding.

 I had always liked looking at paintings.  Could I make them? 


## Step 4: Build the index

Next, you will build the index that stores all the external knowledge in a Weaviate vector database.

First, you will need to connect to a Weaviate instance. In this case, we're using Weaviate Embedded.
 
*Note, that the embedded instance will stay alive for as long as the parent application is running. For a more persistent solution, we recommend using a managed Weaviate Cloud Serivces (WCS) instance. You can try it out for free for 14 days [here](https://console.weaviate.cloud).*

In [8]:
import weaviate

# Connect to your Weaviate instance
client = weaviate.Client(
    embedded_options=weaviate.embedded.EmbeddedOptions(), 
#    additional_headers={ 
#        'X-OpenAI-Api-Key': os.environ["OPENAI_API_KEY"]
#        }
)

print(f"Client is ready: {client.is_ready()}")

# Print this line to get more information about the client
# client.get_meta()

Started /Users/leonie/.cache/weaviate-embedded: process ID 7307


{"action":"startup","default_vectorizer_module":"none","level":"info","msg":"the default vectorizer modules is set to \"none\", as a result all new schema classes without an explicit vectorizer setting, will use this vectorizer","time":"2024-02-14T12:14:31+01:00"}
{"action":"startup","auto_schema_enabled":true,"level":"info","msg":"auto schema enabled setting is set to \"true\"","time":"2024-02-14T12:14:31+01:00"}
{"level":"info","msg":"No resource limits set, weaviate will use all available memory and CPU. To limit resources, set LIMIT_RESOURCES=true","time":"2024-02-14T12:14:31+01:00"}
{"action":"grpc_startup","level":"info","msg":"grpc server listening at [::]:50060","time":"2024-02-14T12:14:31+01:00"}
{"action":"restapi_management","level":"info","msg":"Serving weaviate at http://127.0.0.1:8079","time":"2024-02-14T12:14:31+01:00"}


Client is ready: True


            Please consider upgrading to the latest version. See https://weaviate.io/developers/weaviate/client-libraries/python for details.


Next, you will build a `VectorStoreIndex` from the Weaviate client to store your data in and interact with.

In [9]:
from llama_index.core import VectorStoreIndex, StorageContext
from llama_index.vector_stores.weaviate import WeaviateVectorStore

index_name = "MyExternalContext"

# Construct vector store
vector_store = WeaviateVectorStore(
    weaviate_client = client, 
    index_name = index_name
)

# Set up the storage for the embeddings
storage_context = StorageContext.from_defaults(vector_store=vector_store)

# If an index with the same index name already exists within Weaviate, delete it
if client.schema.exists(index_name):
    client.schema.delete_class(index_name)

# Setup the index
# build VectorStoreIndex that takes care of chunking documents
# and encoding chunks to embeddings for future retrieval
index = VectorStoreIndex.from_documents(
    documents,
    storage_context = storage_context,
    node_parser = node_parser,
)

{"level":"info","msg":"Created shard myexternalcontext_SyOOwJiGvlEF in 1.419709ms","time":"2024-02-14T12:14:32+01:00"}
{"action":"hnsw_vector_cache_prefill","count":1000,"index_id":"main","level":"info","limit":1000000000000,"msg":"prefilled vector cache","time":"2024-02-14T12:14:32+01:00","took":47625}
{"level":"info","msg":"Completed loading shard llamaindex_dWivqPiChdO8 in 4.627667ms","time":"2024-02-14T12:14:32+01:00"}
{"action":"hnsw_vector_cache_prefill","count":3000,"index_id":"main","level":"info","limit":1000000000000,"msg":"prefilled vector cache","time":"2024-02-14T12:14:32+01:00","took":499167}
{"level":"info","msg":"Created shard myexternalcontext_fMmZ36K7hhUR in 2.24825ms","time":"2024-02-14T12:14:33+01:00"}
{"action":"hnsw_vector_cache_prefill","count":1000,"index_id":"main","level":"info","limit":1000000000000,"msg":"prefilled vector cache","time":"2024-02-14T12:14:33+01:00","took":46667}


In [10]:
import json
response = client.schema.get(index_name)

print(json.dumps(response, indent=2))

{
  "class": "MyExternalContext",
  "description": "This property was generated by Weaviate's auto-schema feature on Wed Feb 14 12:14:33 2024",
  "invertedIndexConfig": {
    "bm25": {
      "b": 0.75,
      "k1": 1.2
    },
    "cleanupIntervalSeconds": 60,
    "stopwords": {
      "additions": null,
      "preset": "en",
      "removals": null
    }
  },
  "multiTenancyConfig": {
    "enabled": false
  },
  "properties": [
    {
      "dataType": [
        "text"
      ],
      "description": "This property was generated by Weaviate's auto-schema feature on Wed Feb 14 12:14:33 2024",
      "indexFilterable": true,
      "indexSearchable": true,
      "name": "file_type",
      "tokenization": "word"
    },
    {
      "dataType": [
        "text"
      ],
      "description": "This property was generated by Weaviate's auto-schema feature on Wed Feb 14 12:14:33 2024",
      "indexFilterable": true,
      "indexSearchable": true,
      "name": "last_modified_date",
      "tokenization"

## Step 5: Setup the Query Engine

Lastly, you will setup the index as the query engine.

### Build the Metadata Replacement Post Processor
In advanced RAG, you can use the `MetadataReplacementPostProcessor` to replace the sentence in each node with it’s surrounding context as part of the sentence-window-retrieval method.

In [11]:
from llama_index.core.postprocessor import MetadataReplacementPostProcessor

# The target key defaults to `window` to match the node_parser's default
postproc = MetadataReplacementPostProcessor(
    target_metadata_key="window"
)

In [12]:
# This block of code is for educational purposes 
# to showcase how the MetadataReplacementPostProcessor works
from llama_index.core.schema import NodeWithScore
from copy import deepcopy

scored_nodes = [NodeWithScore(node=x, score=1.0) for x in nodes]
nodes_old = [deepcopy(n) for n in nodes]
replaced_nodes = postproc.postprocess_nodes(scored_nodes)

print(f"Retrieved sentece: {nodes_old[i].text}")
print("------------------")
print(f"Replaced window: {replaced_nodes[i].text}")


Retrieved sentece: You wouldn't have a boss, or even need to get research funding.


------------------
Replaced window: And moreover this was something you could make a living doing.  Not as easily as you could by writing software, of course, but I thought if you were really industrious and lived really cheaply, it had to be possible to make enough to survive.  And as an artist you could be truly independent.  You wouldn't have a boss, or even need to get research funding.

 I had always liked looking at paintings.  Could I make them? 


### Add a re-ranker
For advanced RAG, you can also add a re-ranker, which re-ranks the retrieved context for its relevance to the query. Note, that you should retrieve a larger number of `similarity_top_k`, which will be reduced to `top_n`.

In [13]:
from llama_index.core.postprocessor import SentenceTransformerRerank

# BAAI/bge-reranker-base
# link: https://huggingface.co/BAAI/bge-reranker-base
rerank = SentenceTransformerRerank(
    top_n = 2, 
    model = "BAAI/bge-reranker-base"
)

/opt/homebrew/lib/python3.11/site-packages/huggingface_hub/inference/_text_generation.py:121: PydanticDeprecatedSince20: Pydantic V1 style `@validator` validators are deprecated. You should migrate to Pydantic V2 style `@field_validator` validators, see the migration guide for more details. Deprecated in Pydantic V2.0 to be removed in V3.0. See Pydantic V2 Migration Guide at https://errors.pydantic.dev/2.6/migration/
  @validator("best_of")
/opt/homebrew/lib/python3.11/site-packages/huggingface_hub/inference/_text_generation.py:140: PydanticDeprecatedSince20: Pydantic V1 style `@validator` validators are deprecated. You should migrate to Pydantic V2 style `@field_validator` validators, see the migration guide for more details. Deprecated in Pydantic V2.0 to be removed in V3.0. See Pydantic V2 Migration Guide at https://errors.pydantic.dev/2.6/migration/
  @validator("repetition_penalty")
/opt/homebrew/lib/python3.11/site-packages/huggingface_hub/inference/_text_generation.py:146: Pydan

Finally, you can put all components together in the query engine!

In [14]:
# The QueryEngine class is equipped with the generator
# and facilitates the retrieval and generation steps
query_engine = index.as_query_engine(
        similarity_top_k = 6, 
        node_postprocessors=[postproc, rerank]
)

## Step 6: Run an Advanced RAG Query on Your Data
Now, you can run advanced RAG queries on your data.

In [15]:
# Use your Default RAG
response = query_engine.query(
    "What did the author do growing up?"
)
print(str(response))

The context does not provide any information about what the author did growing up.


In [16]:
response.source_nodes[0].node

TextNode(id_='f85ebdb6-10be-46d0-9184-e9ce5fe051ce', embedding=[0.0061649675, 0.02088512, 0.020017201, -0.0132185165, 0.01920439, 0.02302047, -0.020003425, -0.009423101, 0.0019114843, -0.036755603, 0.0020888562, 0.001190803, 0.0075632785, 0.002173237, 0.0010246244, 0.012343711, 0.023847058, 0.0109867295, -0.012171505, -0.017041486, -0.013742022, -0.0057620057, 0.01807472, 0.0006638533, -0.02461854, 0.014051992, 0.0064439406, -0.008148778, -0.009429989, -0.0051351767, 0.018322697, -0.019493695, -0.013583593, -0.0042603714, -0.021119319, 0.007025996, -0.0038298569, 0.004101942, 0.02574821, -0.029867373, 0.027938668, 0.0033149614, -0.010387453, -0.014782145, -0.024398116, 0.009436877, 0.0142035335, -0.001097812, 0.011000506, 0.019920766, 0.017482333, 0.024081258, -0.033862546, -0.004597895, -0.017027708, 0.013624922, 0.008810048, 0.01086963, 0.006040979, 0.01250214, 0.001578266, 0.017661426, -0.0081901075, -0.009120019, -0.026905432, -0.007287749, -0.010614765, 0.01016703, 0.0051041795, 0

In [17]:
window = response.source_nodes[0].node.metadata["window"]
sentence = response.source_nodes[0].node.metadata["original_text"]

print(f"Window: {window}")
print("------------------")
print(f"Original Sentence: {sentence}")

KeyError: 'window'

# References
* [Llamaindex docs: Weaviate Vector Store](https://docs.llamaindex.ai/en/stable/examples/vector_stores/WeaviateIndexDemo.html)
* [LlamaIndex docs: Metadata Replacement + Node Sentence Window](https://docs.llamaindex.ai/en/stable/examples/node_postprocessor/MetadataReplacementDemo.html)