## Metadata replacement + node sentence window for a RAG pipeline

This notebook demonstrates how to use Metadata replacement + node sentence window in a RAG pipeline.

`SentenceWindowNodeParser` is a tool that can be used to create representations of sentences that consider the surrounding words and sentences. During retrieval, before passing the retrieved sentences to the LLM, the single sentences are replaced with a window containing the surrounding sentences using the `MetadataReplacementNodePostProcessor`. This can be useful for tasks such as machine translation or summarization, where it is essential to understand the meaning of the sentence in its entirety. This is most useful for large documents, as it helps to retrieve more fine-grained details.

In [1]:
%pip install -q llama-index-embeddings-openai
%pip install -q llama-index-embeddings-huggingface
%pip install -q llama-index-postprocessor-cohere-rerank
%pip install -q llama-index-llms-ollama

Note: you may need to restart the kernel to use updated packages.
Note: you may need to restart the kernel to use updated packages.
Note: you may need to restart the kernel to use updated packages.
Note: you may need to restart the kernel to use updated packages.


In [2]:
%pip install -q ollama

Note: you may need to restart the kernel to use updated packages.


In [3]:
!pip install -q llama-index pypdf sentence-transformers

In [4]:
!mkdir data

mkdir: data: File exists


In [5]:
import requests

url = "https://static.pib.gov.in/WriteReadData/specificdocs/documents/2022/feb/doc20222914401.pdf"
file_path = "./data/IndianTourism.pdf"  # Specify the full file path here

# Send a GET request to the specified URL
response = requests.get(url)
# Check if the request was successful
if response.status_code == 200:
    # Write the content of the response to a file
    with open(file_path, "wb") as f:
        f.write(response.content)
    print("File downloaded successfully")
else:
    print("Failed to download the file, status code:", response.status_code)

File downloaded successfully


In [6]:
from llama_index.core import set_global_service_context
from llama_index.llms.ollama import Ollama
from llama_index.embeddings.openai import OpenAIEmbedding
from llama_index.embeddings.huggingface import HuggingFaceEmbedding
from llama_index.core.node_parser import SentenceWindowNodeParser, SimpleNodeParser

# create the sentence window node parser
node_parser = SentenceWindowNodeParser.from_defaults(
    window_size=3,
    window_metadata_key="window",
    original_text_metadata_key="original_text",
)

llm= Ollama(model='gemma:2b')

  from .autonotebook import tqdm as notebook_tqdm


In [7]:
resp = llm.complete("Who invented the theory of relativity")
print(resp)

The theory of relativity was developed by Albert Einstein in the early 20th century.


In [8]:
embed_model = HuggingFaceEmbedding(
    model_name="sentence-transformers/all-mpnet-base-v2", max_length=512)

In [9]:
from llama_index.core import VectorStoreIndex, SimpleDirectoryReader

document_list = SimpleDirectoryReader("data").load_data()
nodes = node_parser.get_nodes_from_documents(document_list)
index = VectorStoreIndex(nodes, embed_model=embed_model)

In [22]:
from llama_index.core.postprocessor import MetadataReplacementPostProcessor

metadata_query_engine = index.as_query_engine(
    similarity_top_k=8,
    llm=llm,
    # the target key defaults to `window` to match the node_parser's default
    node_postprocessors=[
        MetadataReplacementPostProcessor(target_metadata_key="window"),
    ],
)

In [23]:
query = "Explain the  Loan Guarantee Scheme for COVID affected Tourism Service Sector"
response = metadata_query_engine.query(query)
print(str(response))

Sure, here is the answer to the query:

The Loan Guarantee Scheme for COVID affected Tourism Service Sector is a government initiative designed to support the tourism industry and alleviate the financial burden on tourism service providers during the COVID-19 pandemic. The scheme provides financial assistance in the form of capital grants to eligible tourism service providers, such as hotels, restaurants, and transportation companies.

The main objectives of the scheme are to:

- Provide immediate relief to tourism service providers affected by COVID-19

- Support the recovery and growth of the tourism industry

- Stabilize employment and livelihoods in the tourism sector

- Boost economic activity and revenue generation

- Attract foreign investments and promote India as a safe and attractive tourist destination.


In [24]:
query = "According to World Travel and Tourism Council what is India's rank"
response = metadata_query_engine.query(query)
print(str(response))

According to World Travel and Tourism Council, India ranked 10th among 185 countries in terms of travel & tourism's total contribution to GDP in 2019.
