# RAG with Haystack

## Wait, RAG again?
In the notebook 03, we implemented a RAG pipeline from scratch, using only the Qdrant and the OpenAI SDK.
Now, we want to build something similar using Haystack. Once agiain, we expect to get a more readable and maintanable code, at the expense of taking on an extra dependency, and one that will forever be entangled in our application.

# Setup: packages and environment variables

In [None]:
import importlib

if not importlib.util.find_spec("class_utils"):
    !pip install -qqq git+https://github.com/xtreamsrl/genai-for-engineers-class

In [None]:
import os

from haystack import Pipeline, Document
from haystack.components.builders import PromptBuilder
from haystack.components.embedders import (
    SentenceTransformersTextEmbedder,
    SentenceTransformersDocumentEmbedder,
)
from haystack.components.generators import OpenAIGenerator
from haystack.components.retrievers import InMemoryEmbeddingRetriever
from haystack.components.writers import DocumentWriter
from haystack.document_stores.in_memory import InMemoryDocumentStore
from haystack.document_stores.types import DocumentStore
from haystack_integrations.document_stores.qdrant import QdrantDocumentStore
from haystack_integrations.components.retrievers.qdrant import QdrantEmbeddingRetriever


os.environ["OPENAI_API_KEY"] = ...
os.environ["TOKENIZERS_PARALLELISM"] = "true"

# Build a pipeline to create embeddings

The first step is to embed documents. We'll use an the `InMemoryDocumentStore`, an in-memory structure that is a much simplified version of a vector database. 

In [None]:
documents = [
    Document(
        content="Poor Things is a 2023 film directed by Yorgos Lanthimos and written by Tony McNamara, "
        "based on the 1992 novel by Alasdair Gray."
    ),
    Document(
        content="Oppenheimer is a 2023 epic biographical thriller film[a] written, directed,"
        " and co-produced by Christopher Nolan.[8] It follows the life of J. Robert "
        "Oppenheimer, the American theoretical physicist who helped develop the "
        "first nuclear weapons during World War II"
    ),
    Document(
        content="Dune: Part Two is a 2024 American epic science fiction film directed and produced by Denis "
        "Villeneuve, who co-wrote the screenplay with Jon Spaihts. The sequel to Dune (2021), it "
        "is the second of a two-part adaptation of the 1965 novel Dune by Frank Herbert. "
    ),
]

In [None]:
def build_indexing_pipline(
    document_store: DocumentStore,
    embedding_model: str = "sentence-transformers/all-MiniLM-L6-v2",
) -> Pipeline:
    pipe = Pipeline()
    pipe.add_component(
        instance=SentenceTransformersDocumentEmbedder(model=embedding_model),
        name="doc_embedder",
    )
    pipe.add_component(
        instance=DocumentWriter(document_store=document_store), name="doc_writer"
    )
    pipe.connect("doc_embedder.documents", "doc_writer.documents")
    return pipe

In [None]:
document_store = InMemoryDocumentStore()
indexing_pipeline = build_indexing_pipline(document_store)
indexing_pipeline.run({"doc_embedder": {"documents": documents}})

Let's check if the documents are there...

In [None]:
document_store.filter_documents()

# RAG Pipeline

Great, now we can buid the proper RAG pipeline using our documents.
As in notebook 02, we need a prompt template. However, this time we will use a real templating engine, [Jinja](https://jinja.palletsprojects.com/en/3.1.x/). 

We will implement our RAG as a Pipiline. Pipelines are the key abstraction of Haystack (and Langchain, and Llamaindex). 

The pipelines in Haystack 2.0 are directed multigraphs of different Haystack components and integrations. They give the freedom to connect these components in various ways. This means that the pipeline doesn't need to be a continuous stream of information. With the flexibility of Haystack pipelines, you can have simultaneous flows, standalone components, loops, and other types of connections.

Learn more at https://docs.haystack.deepset.ai/docs/pipelines

In [None]:
template = """
Answer the questions based on the given context.

Context:
{% for document in documents %}
    {{ document.content }}
{% endfor %}
Question: {{ question }}
Answer:
"""

In [None]:
def build_openai_rag_pipeline(
    retriever: InMemoryEmbeddingRetriever | QdrantEmbeddingRetriever,
    prompt_template: str,
    embedding_model: str = "sentence-transformers/all-MiniLM-L6-v2",
    openai_model: str = "gpt-3.5-turbo",
) -> Pipeline:
    pipe = Pipeline()
    pipe.add_component(
        "embedder", SentenceTransformersTextEmbedder(model=embedding_model)
    )
    pipe.add_component("retriever", retriever)
    pipe.add_component(
        "prompt_builder",
        PromptBuilder(template=prompt_template, required_variables="*"),
    )
    pipe.add_component("llm", OpenAIGenerator(model=openai_model))

    pipe.connect("embedder.embedding", "retriever.query_embedding")
    pipe.connect("retriever", "prompt_builder.documents")
    pipe.connect("prompt_builder", "llm")
    return pipe

In [None]:
rag_pipe = build_openai_rag_pipeline(
    InMemoryEmbeddingRetriever(document_store), template
)
rag_pipe.show()

And now we can run it.

In [None]:
from pprint import pprint

query = "What film talks about the atomic bomb?"
response = rag_pipe.run(
    {"embedder": {"text": query}, "prompt_builder": {"question": query}}
)
pprint(response)

# RAG Pipeline with Qdrant

Now we'll build the same pipeline with Qdrant. 

Once again, the components in Haystack will help us perform the update quickly and without changing anything in our business logic.

In [None]:
qdrant_document_store = QdrantDocumentStore(":memory:", embedding_dim=384)
qdrant_indexing_pipeline = build_indexing_pipline(qdrant_document_store)
qdrant_indexing_pipeline.run({"doc_embedder": {"documents": documents}})

In [None]:
qdrant_document_store.count_documents()

In [None]:
qdrant_rag_pipe = build_openai_rag_pipeline(
    QdrantEmbeddingRetriever(qdrant_document_store), template
)
qdrant_response = qdrant_rag_pipe.run(
    {"embedder": {"text": query}, "prompt_builder": {"question": query}}
)
pprint(qdrant_response)