# Advanced RAG with LlamaIndex

## Prerequisites

In [9]:
# !pip3 install llama_index

# Use this to upgrade from an older llama_index version to v0.10.1
# !pip3 uninstall llama-index
# !pip3 install llama-index --upgrade --no-cache-dir --force-reinstall

# ------------------------------------------------------------------------------------ #

# !pip3 install -U weaviate-client
# !pip3 install llama-index-vector-stores-weaviate

# ------------------------------------------------------------------------------------ #

# !pip3 install pypdf torch pytorch torchvision transformers  sentence-transformers

import llama_index
from importlib.metadata import version
version('llama_index')

'0.10.1'

In [10]:
import warnings
warnings.filterwarnings('ignore')

### Set your OpenAI API key
This tutorial uses OpenAI’s gpt-3.5-turbo by default. 

Make sure your API key is available to your code by setting it as an environment variable. 

In [11]:
import os

from dotenv import load_dotenv,find_dotenv
load_dotenv(find_dotenv()) 

print(os.environ["OPENAI_API_KEY"])

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


## Select Embedding Model and LLM

Or you can define a global settings object

that way you don't have to specify the LLM and embedding model explicitly in the code again.

We need to define two models:

The embedding model is used to create vector embeddings for each of the text chunks. Here we are calling the FlagEmbedding model from Hugging Face.
LLM: user query and the relevant text chunks are fed into the LLM so that it can generate answers with relevant context.

Note, that you can specify  an embedding model (vectorizer module) and an LLM (generative module) in Weaviate as well but in this case the LLM and embedding model defined in Llamaindex will be used.

In [12]:
from llama_index.embeddings.openai import OpenAIEmbedding
from llama_index.llms.openai import OpenAI
from llama_index.core.settings import Settings

Settings.llm = OpenAI(model="gpt-3.5-turbo", temperature=0.1)
Settings.embed_model =  OpenAIEmbedding()

### Load the data
This builds an index over the documents from the `input_files` list.

In [13]:
# !mkdir -p 'data'
# !wget 'https://raw.githubusercontent.com/run-llama/llama_index/main/docs/examples/data/paul_graham/paul_graham_essay.txt' -O 'data/paul_graham/paul_graham_essay.txt'

In [14]:
from llama_index.core import SimpleDirectoryReader

# Load data
documents = SimpleDirectoryReader("./data").load_data()

documents



## Chunk documents into Nodes

We extract out the set of nodes that will be stored in the VectorIndex. This includes both the nodes with the sentence window parser, as well as the “base” nodes extracted using the standard parser.
Then we split the document into text chunks, which are called “Nodes” in LlamaIndex, where we define the chuck size as 1024. The default node IDs are random text strings, we can then format our node ID to follow a certain format.

In [15]:
from llama_index.core.node_parser import SentenceWindowNodeParser

# create the sentence window node parser w/ default settings
node_parser = SentenceWindowNodeParser.from_defaults(
    window_size=3,
    window_metadata_key="window",
    original_text_metadata_key="original_text",
)

#node_parser = SimpleNodeParser.from_defaults(chunk_size=1024)

In [17]:
text = "hello. how are you? I am fine!  "

nodes = node_parser.get_nodes_from_documents(documents)

print([x.text for x in nodes])
print(nodes[1].metadata["window"])

#for idx, node in enumerate(base_nodes):
#node.id_ = f"node-{idx}"

%PDF-1.7
%
558 0 obj
<</Linearized 1/L 3717673/O 561/E 183070/N 41/T 3706397/H [ 1156 1419]>>
endobj
         
xref
558 43
0000000016 00000 n
0000002575 00000 n
0000002720 00000 n
0000002756 00000 n
0000004096 00000 n
0000004603 00000 n
0000004992 00000 n
0000005515 00000 n
0000005770 00000 n
0000005807 00000 n
0000005921 00000 n
0000006780 00000 n
0000007329 00000 n
0000007447 00000 n
0000007713 00000 n
0000008236 00000 n
0000008791 00000 n
0000009265 00000 n
0000009522 00000 n
0000009861 00000 n
0000009909 00000 n
0000012021 00000 n
0000012891 00000 n
0000013234 00000 n
0000019526 00000 n
0000022175 00000 n
0000028173 00000 n
0000071037 00000 n
0000079882 00000 n
0000088727 00000 n
0000115870 00000 n
0000119996 00000 n
0000120255 00000 n
0000120635 00000 n
0000120710 00000 n
0000120741 00000 n
0000120816 00000 n
0000121141 00000 n
0000121207 00000 n
0000121324 00000 n
0000121676 00000 n
0000183011 00000 n
0000001156 00000 n
trailer
<</Size 601/Root 559 0 R/Info 557 0 R/ID[<2EDD10744A

## Build the index
Usually you can cofigure an embedding model and a vectorizer in the Weaviate schema but when using together with LlamaIndex, the embeding model and LLM from the serviceContext will be used.


In [None]:
import weaviate

# connect to your weaviate instance
client = weaviate.Client(
    embedded_options=weaviate.embedded.EmbeddedOptions(), 
    additional_headers={ 
        'X-OpenAI-Api-Key': os.environ["OPENAI_API_KEY"]
        }
)

print(f"Client is ready: {client.is_ready()}")

# Print this line to get more information about the client
# client.get_meta()

            Consider upgrading to the new and improved v4 client instead!
            See here for usage: https://weaviate.io/developers/weaviate/client-libraries/python
            
{"action":"restapi_management","level":"info","msg":"Shutting down... ","time":"2024-02-14T10:10:19+01:00"}
{"action":"restapi_management","level":"info","msg":"Stopped serving weaviate at http://127.0.0.1:8079","time":"2024-02-14T10:10:19+01:00"}
{"action":"startup","default_vectorizer_module":"none","level":"info","msg":"the default vectorizer modules is set to \"none\", as a result all new schema classes without an explicit vectorizer setting, will use this vectorizer","time":"2024-02-14T10:10:20+01:00"}
{"action":"startup","auto_schema_enabled":true,"level":"info","msg":"auto schema enabled setting is set to \"true\"","time":"2024-02-14T10:10:20+01:00"}
{"level":"info","msg":"No resource limits set, weaviate will use all available memory and CPU. To limit resources, set LIMIT_RESOURCES=true","time":"20

embedded weaviate is already listening on port 8079
Embedded weaviate wasn't listening on ports http:8079 & grpc:50060, so starting embedded weaviate again
Started /Users/leonie/.cache/weaviate-embedded: process ID 4786
Client is ready: True


{"action":"restapi_management","level":"info","msg":"Serving weaviate at http://127.0.0.1:8079","time":"2024-02-14T10:10:20+01:00"}


In [None]:
from llama_index.core import VectorStoreIndex, StorageContext
from llama_index.vector_stores.weaviate import WeaviateVectorStore

index_name = "LlamaIndex"

# Construct vector store
vector_store = WeaviateVectorStore(
    weaviate_client = client, 
    index_name = index_name
)

# Set up the storage for the embeddings
storage_context = StorageContext.from_defaults(vector_store=vector_store)

# If an index with the same index name already exists within Weaviate, delete it
if client.schema.exists(index_name):
    client.schema.delete_class(index_name)

# Setup the index
# build VectorStoreIndex that takes care of chunking documents
# and encoding chunks to embeddings for future retrieval
index = VectorStoreIndex.from_documents(
    documents,
    storage_context = storage_context,
    embed_model = embed_model,
    node_parser=node_parser,
)

{"level":"info","msg":"Created shard llamaindex_c0LjcseFcw5O in 1.585709ms","time":"2024-02-14T10:15:10+01:00"}
{"action":"hnsw_vector_cache_prefill","count":1000,"index_id":"main","level":"info","limit":1000000000000,"msg":"prefilled vector cache","time":"2024-02-14T10:15:10+01:00","took":45417}
{"level":"info","msg":"Created shard llamaindex_Ch3Ujib7H9Gl in 2.455375ms","time":"2024-02-14T10:16:28+01:00"}
{"action":"hnsw_vector_cache_prefill","count":1000,"index_id":"main","level":"info","limit":1000000000000,"msg":"prefilled vector cache","time":"2024-02-14T10:16:28+01:00","took":31542}


In [None]:
import json
response = client.schema.get(index_name)

print(json.dumps(response, indent=2))

{
  "class": "LlamaIndex",
  "description": "This property was generated by Weaviate's auto-schema feature on Wed Feb 14 10:16:28 2024",
  "invertedIndexConfig": {
    "bm25": {
      "b": 0.75,
      "k1": 1.2
    },
    "cleanupIntervalSeconds": 60,
    "stopwords": {
      "additions": null,
      "preset": "en",
      "removals": null
    }
  },
  "multiTenancyConfig": {
    "enabled": false
  },
  "properties": [
    {
      "dataType": [
        "text"
      ],
      "description": "This property was generated by Weaviate's auto-schema feature on Wed Feb 14 10:16:28 2024",
      "indexFilterable": true,
      "indexSearchable": true,
      "name": "file_path",
      "tokenization": "word"
    },
    {
      "dataType": [
        "text"
      ],
      "description": "This property was generated by Weaviate's auto-schema feature on Wed Feb 14 10:16:28 2024",
      "indexFilterable": true,
      "indexSearchable": true,
      "name": "file_type",
      "tokenization": "word"
    },


## Build the post processor

Here, we now use the MetadataReplacementPostProcessor to replace the sentence in each node with it’s surrounding context.

In [None]:
from llama_index.core.postprocessor import MetadataReplacementPostProcessor

# the target key defaults to `window` to match the node_parser's default
node_postprocessor = MetadataReplacementPostProcessor(
    target_metadata_key="window"
)

In [None]:
from llama_index.schema import NodeWithScore
from copy import deepcopy

scored_nodes = [NodeWithScore(node=x, score=1.0) for x in nodes]
nodes_old = [deepcopy(n) for n in nodes]

In [None]:
nodes_old[1].text

In [None]:
replaced_nodes = node_postprocessor.postprocess_nodes(scored_nodes)

In [None]:
print(replaced_nodes[1].text)

## Add a re-ranker

In [None]:
from llama_index.indices.postprocessor import SentenceTransformerRerank

# BAAI/bge-reranker-base
# link: https://huggingface.co/BAAI/bge-reranker-base
rerank = SentenceTransformerRerank(
    top_n = 2, 
    model = "BAAI/bge-reranker-base"
)

## Query engine

In [None]:
# The QueryEngine class is equipped with the generator
# and facilitates the retrieval and generation steps
query_engine = index.as_query_engine(
        similarity_top_k=6, 
        node_postprocessors=[node_postprocessor, rerank]
)

### Query your data
This creates an engine for Q&A over your index and asks a simple question. 

In [None]:
# Use your Default RAG
response = query_engine.query(
    "What are steps to take when finding projects to build your experience?"
)
print(str(response))

huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)


When finding projects to build your experience, there are several steps you can take. First, you can join existing projects by asking to join someone else's project if they have an idea. Additionally, you can develop a side hustle or personal project, even if you have a full-time job, as this can stir your creative juices and strengthen bonds with collaborators. It's important to choose projects that will help you grow technically, by selecting ones that are challenging enough to stretch your skills but not too difficult that you have little chance of success. Having good teammates or people to discuss things with is also important, as we learn a lot from the people around us. Finally, consider if the project can be a stepping stone to larger projects, as its technical complexity and business impact can make it a meaningful progression in your career.


In [None]:
window = response.source_nodes[0].node.metadata["window"]
sentence = response.source_nodes[0].node.metadata["original_text"]

print(f"Window: {window}")
print("------------------")
print(f"Original Sentence: {sentence}")

# References
* [Llamaindex docs: Weaviate Vector Store](https://docs.llamaindex.ai/en/stable/examples/vector_stores/WeaviateIndexDemo.html)
* [LlamaIndex docs: Metadata Replacement + Node Sentence Window](https://docs.llamaindex.ai/en/stable/examples/node_postprocessor/MetadataReplacementDemo.html)