# Semantic search based RAG

We are going to use LlamaIndex to build a basic RAG pipeline that will use one of the open source embedding models. Then, we will consider different optimizations to either improve the performance or reduce the cost of the pipeline.


## Loading the configuration

Before we start, all the configuration is loaded from the `.env` file we created in the previous notebook.

In [5]:
from dotenv import load_dotenv

load_dotenv()

True

## Basic RAG setup

We will be using one of the open source embedding models to vectorize our document (actually, the snapshots we imported in the previous notebook were generated using the same model, so we need to use it for queries as well). OpenAI GPT will be our LLM, and it is the default model for LlamaIndex, so there is no need to configure it explicitly.

The vector index, which will act as a fast retrieval layer, is the last missing piece to build our basic semantic search RAG. Qdrant will serve that purpose, as all the documents are already there.

In [6]:
from llama_index import ServiceContext

service_context = ServiceContext.from_defaults(
    embed_model="local:BAAI/bge-large-en"
)

config.json:   0%|          | 0.00/720 [00:00<?, ?B/s]

model.safetensors:   0%|          | 0.00/1.34G [00:00<?, ?B/s]

tokenizer_config.json:   0%|          | 0.00/366 [00:00<?, ?B/s]

vocab.txt:   0%|          | 0.00/232k [00:00<?, ?B/s]

tokenizer.json:   0%|          | 0.00/711k [00:00<?, ?B/s]

special_tokens_map.json:   0%|          | 0.00/125 [00:00<?, ?B/s]

In [7]:
from qdrant_client import QdrantClient
from llama_index.vector_stores.qdrant import QdrantVectorStore

import os

client = QdrantClient(
    os.environ.get("QDRANT_URL"), 
    api_key=os.environ.get("QDRANT_API_KEY"),
)
vector_store = QdrantVectorStore(
    client=client, 
    collection_name="hacker-news"
)

In [8]:
from llama_index import VectorStoreIndex

index = VectorStoreIndex.from_vector_store(
    vector_store=vector_store,
    service_context=service_context,
)

### Querying RAG

LlamaIndex simplifies the querying process by providing a high-level API that abstracts the underlying complexity. We can use the `as_query_engine` method to create a query engine that will handle the entire process for us, with the default configuration.

In [9]:
query_engine = index.as_query_engine()
response = query_engine.query("What is the best way to learn programming?")
print(response.response)

huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)


The best way to learn programming is to jump in and start working on real projects that interest you. Find a problem to solve or a project to build, such as an audio amplifier, Arduino robot, or experimenting with GPIO on a Raspberry Pi. Engaging in hands-on projects will help you understand how programming works in real-life scenarios and how to build things properly. Additionally, exploring resources like the Hackaday website can provide you with ideas and inspiration for your programming journey.


Our RAG retrieves some possibly relevant documents by using the original prompt as a query, and then sends them as a part of the prompt to the LLM. It seems to be a good idea to check what were these documents, and if our LLM was not making up the answer using its internal knowledge.

In [10]:
for i, node in enumerate(response.source_nodes):
    print(i + 1, node.text, end="\n\n")

1 Ask HN: What is the best way to get into building electronics as a programmer?

I am asking not only about learning what is taught in classes for solving ideal problems. I am talking about the real engineering like a hobbyist who actually understands what works in real life and how to build it properly.

Assuming you&#x27;re interested in embedded software (pure electronics is it&#x27;s own thing):<p>Unless you&#x27;re already a competent C++ developer, I would start with getting either an ESP8266 or ESP32 and making some simple projects in Arduino IDE by stitching libraries together. You can do a lot with various sensors, actuators, and a bit of simple glue code. Getting outside that simplified Arduino world requires additional learning curves so have fun there first if you can.<p>I&#x27;m a full stack developer and I code in many languages but I haven&#x27;t had to do any low level C++ code in a while and I&#x27;m finding that my biggest hurdle as I&#x27;m getting into a complicate

The first tweak we can consider is to increase the number of documents fetched from our knowledge base (the default of LlamaIndex is just 2). We can do that by setting the `similarity_top_k` parameter of the `as_query_engine` method.

In [11]:
response = index \
    .as_query_engine(similarity_top_k=5) \
    .query("What is the best way to learn programming?")
print(response.response)

The best way to learn programming is to start by jumping in and actually doing it. Find a real problem to solve or a project to work on. By engaging in hands-on coding and building projects, you can gain practical experience and develop a deeper understanding of how programming works in real-life scenarios. Learning by doing, even if it means starting with simple projects, is an effective way to improve your programming skills and knowledge.


In [12]:
for i, node in enumerate(response.source_nodes):
    print(i + 1, node.text, end="\n\n")

1 Ask HN: What is the best way to get into building electronics as a programmer?

I am asking not only about learning what is taught in classes for solving ideal problems. I am talking about the real engineering like a hobbyist who actually understands what works in real life and how to build it properly.

Assuming you&#x27;re interested in embedded software (pure electronics is it&#x27;s own thing):<p>Unless you&#x27;re already a competent C++ developer, I would start with getting either an ESP8266 or ESP32 and making some simple projects in Arduino IDE by stitching libraries together. You can do a lot with various sensors, actuators, and a bit of simple glue code. Getting outside that simplified Arduino world requires additional learning curves so have fun there first if you can.<p>I&#x27;m a full stack developer and I code in many languages but I haven&#x27;t had to do any low level C++ code in a while and I&#x27;m finding that my biggest hurdle as I&#x27;m getting into a complicate

## Customizing the RAG pipeline

The defaults of LlamaIndex are a good starting point, but we can customize the pipeline to better fit our needs. That gives us more control over the behavior of the semantic search retriever or the way we interact with the LLM. LlamaIndex has pretty decent support for customizing the pipeline and there are three components that we need to set up:

1. Retriever
2. Response synthesizer
3. Query engine

In [13]:
from llama_index.query_engine import RetrieverQueryEngine
from llama_index import get_response_synthesizer
from llama_index.indices.vector_store import VectorIndexRetriever

retriever = VectorIndexRetriever(
    index=index,
    similarity_top_k=5,
)

response_synthesizer = get_response_synthesizer()

query_engine = RetrieverQueryEngine(
    retriever=retriever,
    response_synthesizer=response_synthesizer,
)

In [14]:
response = query_engine.query("What is the best way to learn programming?")
print(response.response)

The best way to learn programming is to start by jumping in and actually doing it. Find a real problem to solve and build something to address it. Engaging in hands-on projects, such as creating an audio amplifier, building an Arduino robot, or experimenting with GPIO on a Raspberry Pi, can be a practical and effective way to learn programming. Additionally, starting with beginner-friendly kits or platforms that offer hands-on experience can also be beneficial. Remember, the key is to start doing things, even if it means starting off by making mistakes and learning from them.


## Playing with response synthesizers

Response synthesizers are responsible for interactions with the LLM. This a component we want to control, when it comes to prompts and the way we actually communicate with the language model. There are lots of parameters to tweak, and prompt engineering is a topic of its own. Thus, we won't play with it too, but we can at least test out different response modes.

The default one is `ResponseMode.COMPACT`, that combines retrieved text chunks into larger pieces, to utilize the available context window. There are also plenty of other modes, and they may work best in some specific scenario. For example, some of the modes may make a separate LLM call per extracted text chunk, which may be beneficial in some cases, but also increase the cost of the pipeline.

Let's just compare the previous response with the `ResponseMode.ACCUMULATE` and `ResponseMode.REFINE` modes. The first one should create a response for each chunk and the concatenate them, while the second one should make a separate LLM call for each chunk in an iterative manner. That means, each call will use the previous response as a context.

In [15]:
from llama_index.response_synthesizers import ResponseMode

accumulate_response_synthesizer = get_response_synthesizer(
    response_mode=ResponseMode.ACCUMULATE,
)

accumulate_query_engine = RetrieverQueryEngine(
    retriever=retriever,
    response_synthesizer=accumulate_response_synthesizer,
)

In [16]:
response = accumulate_query_engine.query("What is the best way to learn programming?")
print(response.response)

Response 1: The best way to learn programming is to start by becoming proficient in a language like C++ and then gradually move on to working with microcontrollers like ESP8266 or ESP32. By starting with simple projects in Arduino IDE and gradually expanding your knowledge with various sensors and actuators, you can gain practical experience in building electronics as a programmer. It is also beneficial to explore different frameworks like Lua, Micro Python, and JavaScript for programming ESPs to broaden your skill set.
---------------------
Response 2: The best way to learn programming is to dive in and start working on real projects. Find a problem that interests you and start building something to solve it. Experiment with different projects like creating an audio amplifier, building an Arduino robot, or tinkering with GPIO on a Raspberry Pi. Engaging in hands-on projects will help you understand how programming works in real-life scenarios and improve your skills effectively.
-----

In [17]:
refine_response_synthesizer = get_response_synthesizer(
    response_mode=ResponseMode.REFINE,
)

refine_query_engine = RetrieverQueryEngine(
    retriever=retriever,
    response_synthesizer=refine_response_synthesizer,
)

In [18]:
response = refine_query_engine.query("What is the best way to learn programming?")
print(response.response)

Start by getting either an ESP8266 or ESP32 and creating simple projects in Arduino IDE by combining libraries. This approach allows for practical learning and hands-on experience with various sensors, actuators, and basic code. It's recommended to begin with this simplified Arduino environment before delving into more complex programming aspects.


## Multitenancy

Most of the real applications require some sort of data separation. If you collect data coming from different users or organizations, you probably don't want to mix them up in the answers. Quite a common mistake, while using Qdrant, is to create a separate collection for each tenant. Instead, you can use the metadata field to separate the data. This field should have a payload index created, so the operations are fast. 

This is a Qdrant-specific feature, and the configuration is not done in LlamaIndex, but in Qdrant itself. However, we passed an instance of `QdrantClient` to the `QdrantVectorStore`, so we can use it to create a payload index for the metadata field.

In our case, we can consider splitting the data by the type of the document. We have two types of documents in our collection: `story` and `comment`. We can use the `type` field to separate them.

In [19]:
from qdrant_client import models

client.create_payload_index(
    collection_name="hacker-news",
    field_name="type",
    field_schema=models.PayloadSchemaType.KEYWORD,
)

UpdateResult(operation_id=472, status=<UpdateStatus.COMPLETED: 'completed'>)

Using the newly created payload index, we can filter the documents by type. That's why we wanted to customize the pipeline, so we can add this filter to the retriever.

In [20]:
from llama_index.vector_stores import MetadataFilters, MetadataFilter

filtering_retriever = VectorIndexRetriever(
    index=index,
    similarity_top_k=5,
    filters=MetadataFilters(
        filters=[
            MetadataFilter(key="type", value="story"),
        ]
    ),
)

filtering_query_engine = RetrieverQueryEngine(
    retriever=filtering_retriever,
    response_synthesizer=response_synthesizer,
)

In [21]:
response = filtering_query_engine.query("What is the best way to learn programming?")
print(response.response)

For beginners, starting with visual and interactive platforms like Scratch can be a great way to learn programming concepts in a fun and engaging manner. Additionally, exploring physical programming books can provide a solid foundation for learning coding principles and methodologies. It's also beneficial to engage in hands-on projects and real-world applications to deepen understanding and practical skills in programming.


In [22]:
for i, node in enumerate(response.source_nodes):
    print(i + 1, node.text, end="\n\n")

1 Ask HN: Where do you go to find recommendations for physical programming books?

I&#x27;m old school and like sitting down with a book both for learning actual coding and also for methodologies and philosophies. I don&#x27;t know where to go for recommendations. Any help? Thanks!

2 Ask HN: What to learn in order to get a software job in a decent country?

A good friend of mine is 18 and Russian. He is a programming prodigy and is trying to formulate a plan to get out. He&#x27;s thinking about his future CV and applying for jobs. What would be the best frameworks to invest time in getting experience with now?

3 Ask HN: What is the best way to get into building electronics as a programmer?

I am asking not only about learning what is taught in classes for solving ideal problems. I am talking about the real engineering like a hobbyist who actually understands what works in real life and how to build it properly.

4 Ask HN: Best tools for 4/5 year old to learn programming?

I&#x27;m lo

## Additional tweaks

Some scenarios require different means than just semantic search. For example, if we want to prefer the most recent documents, none of the embedding models is going to capture it, since it is a cross-document relationship. LlamaIndex provides a way to add additional postprocessing, so we can include the additional constraints directly on the prefetched documents.


In [23]:
from llama_index.postprocessor import FixedRecencyPostprocessor

prefetching_retriever = VectorIndexRetriever(
    index=index,
    similarity_top_k=25,  # prefetch way more documents
    filters=MetadataFilters(
        filters=[
            MetadataFilter(key="type", value="comment"),  # we want comments this time
        ]
    ),
)

recency_query_engine = RetrieverQueryEngine(
    retriever=prefetching_retriever,
    response_synthesizer=response_synthesizer,
    node_postprocessors=[
        FixedRecencyPostprocessor(
            service_context=service_context,
            date_key="date",  # date is the default key also, but make it explicit
            top_k=5,  # leave just 20% of the prefetched documents
        )
    ]
)

In [24]:
response = recency_query_engine.query("What is the best way to learn programming?")
print(response.response)

CS50 on YouTube is a great resource for learning programming. It is well taught, covers a lot of ground, has no prerequisites, and does not coddle its audience.


In [25]:
for i, node in enumerate(response.source_nodes):
    print(i + 1, node.text, end="\n\n")

1 Ask HN: Do you know a good course or book to learn CS basics for teens?

A teen relative of mine is interested in computer science and wants to explore this path before going to college.<p>I need to recommend an overall learning experience for him that is appropriate for his age (around 16) and want him to understand the basics without getting too deep in math or algorithms, so he can get a good view of the field and understand the basics.<p>Have you came across a simple course, book or learning platform (preferably free) appropriate for beginners?<p>Thanks!

I read this when I was a teen and liked it. It explains how computers work, and is still relevant 20 years later.<p><a href="https:&#x2F;&#x2F;en.wikipedia.org&#x2F;wiki&#x2F;Code:_The_Hidden_Language_of_Computer_Hardware_and_Software" rel="nofollow">https:&#x2F;&#x2F;en.wikipedia.org&#x2F;wiki&#x2F;Code:_The_Hidden_Language_of_C...</a>

<i>Code</i> is a good motivator (and I need to buy a second copy, my copy was loaned out and

In [26]:
from llama_index.postprocessor import EmbeddingRecencyPostprocessor

embedding_recency_query_engine = RetrieverQueryEngine(
    retriever=prefetching_retriever,
    response_synthesizer=response_synthesizer,
    node_postprocessors=[
        EmbeddingRecencyPostprocessor(
            service_context=service_context,
            date_key="date",  # date is the default key
            similarity_cutoff=0.9,
        )
    ]
)

In [27]:
response = embedding_recency_query_engine.query("What is the best way to learn programming?")
print(response.response)

Jumping in and starting with a real problem is a good way to learn programming. Finding hands-on projects like building an audio amplifier or an Arduino robot can be a practical approach. Additionally, experimenting with simple tasks and being willing to make mistakes is essential for learning and improving programming skills.


In [28]:
for i, node in enumerate(response.source_nodes):
    print(i + 1, node.text, end="\n\n")

1 Ask HN: Do you know a good course or book to learn CS basics for teens?

A teen relative of mine is interested in computer science and wants to explore this path before going to college.<p>I need to recommend an overall learning experience for him that is appropriate for his age (around 16) and want him to understand the basics without getting too deep in math or algorithms, so he can get a good view of the field and understand the basics.<p>Have you came across a simple course, book or learning platform (preferably free) appropriate for beginners?<p>Thanks!

I read this when I was a teen and liked it. It explains how computers work, and is still relevant 20 years later.<p><a href="https:&#x2F;&#x2F;en.wikipedia.org&#x2F;wiki&#x2F;Code:_The_Hidden_Language_of_Computer_Hardware_and_Software" rel="nofollow">https:&#x2F;&#x2F;en.wikipedia.org&#x2F;wiki&#x2F;Code:_The_Hidden_Language_of_C...</a>

<i>Code</i> is a good motivator (and I need to buy a second copy, my copy was loaned out and