<a href="https://colab.research.google.com/github/towardsai/ai-tutor-rag-system/blob/main/notebooks/Metadata_Filtering.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Install Packages and Setup Variables


In [None]:
!pip install -q llama-index==0.10.37 openai==1.12.0 tiktoken==0.6.0 llama-index-vector-stores-qdrant==0.2.10

[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m226.7/226.7 kB[0m [31m2.9 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m1.8/1.8 MB[0m [31m33.0 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m76.4/76.4 kB[0m [31m4.5 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m77.9/77.9 kB[0m [31m4.7 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m1.6/1.6 MB[0m [31m44.5 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m1.2/1.2 MB[0m [31m37.8 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m258.9/258.9 kB[0m [31m14.9 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m2.4/2.4 MB[0m [31m47.3 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0

In [None]:
import os

# Set the following API Keys in the Python environment. Will be used later.
os.environ["OPENAI_API_KEY"] = "[OPENAI_API_KEY]"

# from google.colab import userdata
# os.environ["OPENAI_API_KEY"] = userdata.get('openai_api_key')

In [None]:
import nest_asyncio

nest_asyncio.apply()

# Load a Model


In [None]:
from llama_index.llms.openai import OpenAI
from llama_index.embeddings.openai import OpenAIEmbedding
from llama_index.core import Settings

Settings.llm = OpenAI(temperature=0.9, model="gpt-4o-mini", max_tokens=512)
Settings.embed_model = OpenAIEmbedding(model="text-embedding-3-small")

# Create a VectoreStore


In [None]:
from qdrant_client import QdrantClient
from llama_index.vector_stores.qdrant import QdrantVectorStore

# qdrant_client = QdrantClient(location=":memory:")
# or Persist storage
qdrant_client = QdrantClient(path="/content/")

In [None]:
vector_store = QdrantVectorStore(client=qdrant_client, collection_name="ai_tutor_knowledge")

# Load the Dataset (JSON)


## Download


In [None]:
from huggingface_hub import hf_hub_download
file_path = hf_hub_download(repo_id="jaiganesan/ai_tutor_knowledge", filename="ai_tutor_knowledge.jsonl",repo_type="dataset",local_dir="/content")

ai_tutor_knowledge.jsonl:   0%|          | 0.00/6.96M [00:00<?, ?B/s]

## Read File


In [None]:
import json
with open(file_path, "r") as file:
    ai_tutor_knowledge = [json.loads(line) for line in file]
ai_tutor_knowledge[1]['content']

"Github Repo: https://github.com/vaibhawkhemka/ML-Umbrella/tree/main/NLP/Product-Categorization   From e-commerce to Customer support  all businesses require some kind of NER model to process huge amounts of texts from users.   To automate this whole  one requires NER models to extract relevant and important entities from text.   Final Result/OutputInput text = EL D68 (Green  32 GB) 3 GB RAM [3 GB RAM U+007C 32 GB ROM U+007C Expandable Upto 128 GB  15.46 cm (6.088 inch) Display  13MP Rear Camera U+007C 8MP Front Camera  4000 mAh Battery  Quad-Core Processor]   Output =   Green ->>>> COLOR 32 GB ->>>> STORAGE 3 GB RAM ->>>> RAM 3 GB RAM ->>>> RAM 32 GB ROM ->>>> STORAGE Expandable Upto 128 GB ->>>> EXPANDABLE_STORAGE 15.46 cm (6.088 inch) ->>>> SCREEN_SIZE 13MP Rear Camera ->>>> BACK_CAMERA 8MP Front Camera ->>>> FRONT_CAMERA 4000 mAh Battery ->>>> BATTERY_CAPACITY Quad-Core Processor ->>>> PROCESSOR_CORE   Data PreparationA tool for creating this dataset (https://github.com/tecoholic/n

In [None]:
# Not necessary to use full dataset
documents = ai_tutor_knowledge[:100]+ai_tutor_knowledge[500:]

# Transforming


In [None]:
from typing import List
from llama_index.core import Document

def create_docs_from_list(data_list: List[dict]) -> List[Document]:
    documents = []
    for data in data_list:
        documents.append(
            Document(
                doc_id=data["doc_id"],
                text=data["content"],
                metadata={  # type: ignore
                    "url": data["url"],
                    "title": data["name"],
                    "tokens": data["tokens"],
                    "source": data["source"],
                },
                excluded_llm_metadata_keys=[
                    "title",
                    "tokens",
                    "source",
                ],
                excluded_embed_metadata_keys=[
                    "url",
                    "tokens",
                    "source",
                ],
            )
        )
    return documents

doc = create_docs_from_list(documents)
doc[2]

Document(id_='45501b72-9391-529e-8e5e-59a2604ba26e', embedding=None, metadata={'url': 'https://towardsai.net/p/machine-learning/adaboost-explained-from-its-original-paper', 'title': 'AdaBoost Explained From Its Original Paper', 'tokens': 1697, 'source': 'tai_blog'}, excluded_embed_metadata_keys=['url', 'tokens', 'source'], excluded_llm_metadata_keys=['title', 'tokens', 'source'], relationships={}, text="This publication is meant to show a very popular ML algorithm in complete detail  how it works  the math behind it  how to execute it in Python and an explanation of the proofs of the original paper. There will be math and code  but it is written in a way that allows you to decide which are the fun parts.   A bit on the origins of the algorithm: It was proposed by Yoav Freund and Robert E. Schapire in a 1997 paper  A Decision-Theoretic Generalization of On-Line Learning and an Application to Boostinga beautiful and brilliant publication for an effective and useful algorithm.   Lets star

In [None]:
from llama_index.core.node_parser import TokenTextSplitter

# Define the splitter object that split the text into segments with 512 tokens,
# with a 128 overlap between the segments.
text_splitter = TokenTextSplitter(separator=" ", chunk_size=512, chunk_overlap=128)

In [None]:
from llama_index.core.extractors import (
    KeywordExtractor,
)
from llama_index.embeddings.openai import OpenAIEmbedding
from llama_index.core.ingestion import IngestionPipeline

# Create the pipeline to apply the transformation on each chunk,
# and store the transformed text in the chroma vector store.
pipeline = IngestionPipeline(
    transformations=[
        text_splitter,
        KeywordExtractor(keywords=10, llm=Settings.llm),
        OpenAIEmbedding(model="text-embedding-3-small"),
    ],
    vector_store=vector_store,
)

# Run the transformation pipeline.
nodes = pipeline.run(documents=doc, show_progress=True)

Parsing nodes:   0%|          | 0/362 [00:00<?, ?it/s]

100%|██████████| 1181/1181 [04:11<00:00,  4.69it/s]


Generating embeddings:   0%|          | 0/1181 [00:00<?, ?it/s]



In [None]:
!zip ai_tutor_knowledge_metadata.zip /content/collection/ai_tutor_knowledge

  adding: content/collection/ai_tutor_knowledge/ (stored 0%)


In [None]:
len(nodes)

1181

In [None]:
nodes[0].metadata

{'url': 'https://towardsai.net/p/machine-learning/bert-huggingface-model-deployment-using-kubernetes-github-repo-03-07-2024',
 'title': 'BERT HuggingFace Model Deployment using Kubernetes [ Github Repo]  03/07/2024',
 'tokens': 768,
 'source': 'tai_blog',
 'excerpt_keywords': 'BERT, HuggingFace, Kubernetes, model deployment, FastAPI, Docker, scalability, containerization, ML pipeline, inference'}

In [None]:
from llama_index.core import VectorStoreIndex

# Create the index based on the vector store.
index = VectorStoreIndex.from_vector_store(vector_store)

In [None]:
query_engine = index.as_query_engine(similarity_top_k=10)
res = query_engine.query("Explain how Advance RAG works?")

res.response

'Advanced RAG systems, such as Self-Reflective RAG (Self-RAG) and Corrective RAG (CRAG), enhance the performance of traditional Retrieval-Augmented Generation by improving the quality of contextual information used during the generation process. Self-RAG employs instruction-tuning techniques to create self-reflection tags that guide the language model in dynamically retrieving relevant documents and critically assessing their relevance before generating responses. This self-reflective approach helps ensure that the information used in the generation is both relevant and high-quality.\n\nOn the other hand, CRAG introduces an external evaluator to assess and refine the retrieved documents before they are utilized in the generation process. This external evaluator ensures that only the most relevant and high-quality documents are considered, addressing potential retrieval errors that could hinder the generation output.\n\nTogether, these advanced systems focus on optimizing the retrieval 

# Metadata Filtering


In [None]:
from llama_index.core.vector_stores import MetadataFilter, MetadataFilters, FilterOperator

filters = MetadataFilters(
    filters=[
        MetadataFilter(
            key="excerpt_keywords",
            operator=FilterOperator.TEXT_MATCH,
            value="PEFT",
        ),
    ]
)

# Query Dataset


In [None]:
# Define a query engine that is responsible for retrieving related pieces of text,
# and using a LLM to formulate the final answer.
query_engine = index.as_query_engine(filters=filters)

res = query_engine.query("How Parameter efficient fine tuning (PEFT) Works?")

In [None]:
res.response

"Parameter Efficient Fine Tuning (PEFT) involves a more economical approach to training large language models by minimizing the computational costs associated with full fine-tuning. Instead of adjusting all weights in a model, PEFT focuses on specific strategies to efficiently update only a subset of parameters.\n\n1. **Selective Fine-tuning**: This approach selectively picks a subset of the model's initial parameters to optimize, reducing the overall number of parameters that require adjustment.\n\n2. **Reparameterization**: This method involves reparameterizing the model weights using a low-rank representation, which compresses the weight matrices and allows for more efficient training. \n\nBy employing these strategies, PEFT significantly cuts down on the resources needed, enabling effective model training while preserving most of the knowledge embedded in the pre-trained models."

In [None]:
# Show the retrieved nodes
for src in res.source_nodes:
    print("Node ID\t", src.node_id)
    print("Title\t", src.metadata["title"])
    print("Text\t", src.text)
    print("Score\t", src.score)
    print("Score\t", src.metadata["excerpt_keywords"])
    print("-_" * 20)

Node ID	 8075d3d3-beb5-471f-b70c-024775830fd9
Title	 Fine-Tuning and Evaluating Large Language Models: Key Benchmarks and Metrics
Text	 SAMSUM is one of the datasets that FLAN T5 uses. There are several pre-trained FLAN T5 models that have been fine-tuned on SAMSUM  including Phil Schmid/flan-t5-base-samsum and jasonmcaffee/flan-t5-large-samsum on Hugging Face. If we want to fine-tune the FLAN T5 model specifically for formal dialogue conversations  we can do so using the DIALOGUESUM dataset.   Models fine-tuned on DialogSum can be applied to areas like customer support  meeting minutes generation  chatbot summarization  and more.   2. PEFT (Parameter efficient fine tuning)Training LLMs is computationally intensive. Full finetuning is computationally expensive as it might change each weight in the model. First  we start with a pretrained LLM like GPT-3. This model already has a vast amount of knowledge and understanding of language. Then we provide task-specific datasets  which could b

# Filter Metadata (source_name)


In [None]:
from llama_index.core.vector_stores import MetadataFilter,MetadataFilters,FilterOperator,FilterCondition

filters = MetadataFilters(
    filters=[
        MetadataFilter(
            key="excerpt_keywords",
            operator=FilterOperator.TEXT_MATCH,
            value="BERT",
        ),
        MetadataFilter(
            key="source", operator=FilterOperator.EQ, value="tai_blog"
        ),
    ],
    condition=FilterCondition.AND,
)

In [None]:
# Define a query engine that is responsible for retrieving related pieces of text,
# and using a LLM to formulate the final answer.
query_engine = index.as_query_engine(filters=filters)

result = query_engine.query("Explain BERT?")

In [None]:
result.response

'BERT, which stands for Bidirectional Encoder Representations from Transformers, is a language model developed by Google that utilizes transformer architecture. It is designed to understand and generate human-like language by processing text in both directions simultaneously. This bidirectional capability allows BERT to capture context more effectively than traditional models that read text sequentially.\n\nFor example, in a sentence with a blank, BERT considers the entire context—both preceding and following words—to make a more accurate prediction about the missing word. BERT is available in two versions: BERT BASE and BERT LARGE, differing in the number of layers, parameters, and hidden units.\n\nThe model was pre-trained on a significant amount of text, approximately 3.3 billion words, through two main tasks: Masked Language Modeling (MLM) and Next Sentence Prediction (NSP). In MLM, some words in a sentence are masked, and BERT learns to predict those based on surrounding context. 

In [None]:
# Show the retrieved nodes
for src in result.source_nodes:
    print("Node ID\t", src.node_id)
    print("Title\t", src.metadata["title"])
    print("Text\t", src.text)
    print("Score\t", src.score)
    print("Score\t", src.metadata["excerpt_keywords"])
    print("-_" * 20)

Node ID	 527f1ba7-ea18-48ac-9f08-3f0082892f9b
Title	 Attention is all you need: How Transformer Architecture in NLP started.
Text	 GPT-2  GPT-3  and GPT-4  which were decoder-only architectures. Another well-known example is BERT (Bidirectional Encoder Representations from Transformers)  an encoder-only transformer mode used as a component in sentence embedding models.   Lets talk about BERT!BERT stands for Bidirectional Encoder Representations from Transformers. It is a language model by Google that uses a transformer architecture to understand and generate human-like language. BERT is designed to simultaneously process text in both directions  allowing it to capture context more effectively than traditional unidirectional models  which read text sequentially from left to right or right to left.   Example of Bidirectional CapabilityConsider the sentence:    The bank is situated on the _______ of the river.   In a unidirectional model  understanding the blank would primarily rely on th

In [None]:
# When Mismatch between Keyword (value) and Query

filters = MetadataFilters(
    filters=[
        MetadataFilter(
            key="excerpt_keywords",
            operator=FilterOperator.TEXT_MATCH,
            value="BERT",
        ),
        MetadataFilter(
            key="source", operator=FilterOperator.EQ, value="tai_blog"
        ),
    ],
    condition=FilterCondition.AND,
)

query_engine = index.as_query_engine(filters=filters)

result = query_engine.query("Explain PEFT?")

print(result.response)


# Show the retrieved nodes
for src in result.source_nodes:
    print("Node ID\t", src.node_id)
    print("Title\t", src.metadata["title"])
    print("Text\t", src.text)
    print("Score\t", src.score)
    print("Score\t", src.metadata["excerpt_keywords"])
    print("-_" * 20)


The provided information does not mention PEFT or provide details about it. Therefore, I am unable to explain PEFT based on the available context.
Node ID	 637d10d8-d7a4-4352-81e2-48a21a9a3cf8
Title	 Month in 4 Papers (June 2023)
Text	 not needing to train a reward model is that it is more feasible to train models with fewer resources and fewer trials.   Experiments were conducted to evaluate the performance of LLMs tuned with DPO and PPO. The models were tested on the HH-RLHF dialogue task and two coding tasks. The results demonstrated that PPO consistently improves the models performance on complex tasks such as coding. They also discovered that using iterative DPO  which involves generating additional data with the newly trained rewards model during the tuning process  is more effective. However  PPO still outperforms DPO and achieves state-of-the-art results on challenging coding tasks.   Lastly  the ablation study highlights the crucial elements for the success of PPO training: no