# RAG with Haystack

## Wait, RAG again?
In the notebook 02, we implemented a RAG pipeline from scratch, using only the Qdrant and the OpenAI SDK.
Now, we want to build something similar using Haystack. Once agiain, we expect to get a more readable and maintanable code, at the expense of taking on an extra dependency, and one that will forever be entangled in our application.

# Setup: packages and environment variables

In [None]:
import os

from haystack import Pipeline, Document
from haystack.document_stores.in_memory import InMemoryDocumentStore


os.environ["OPENAI_API_KEY"] = ...
os.environ["TOKENIZERS_PARALLELISM"] = "true"

# Build a pipeline to create embeddings

The first step is to embed documents. We'll use an the `InMemoryDocumentStore`, an in-memory structure that is a much simplified version of a vector database. 

In [None]:
documents = [
    Document(
        content="Poor Things is a 2023 film directed by Yorgos Lanthimos and written by Tony McNamara, "
        "based on the 1992 novel by Alasdair Gray."
    ),
    Document(
        content="Oppenheimer is a 2023 epic biographical thriller film[a] written, directed,"
        " and co-produced by Christopher Nolan.[8] It follows the life of J. Robert "
        "Oppenheimer, the American theoretical physicist who helped develop the "
        "first nuclear weapons during World War II"
    ),
    Document(
        content="Dune: Part Two is a 2024 American epic science fiction film directed and produced by Denis "
        "Villeneuve, who co-wrote the screenplay with Jon Spaihts. The sequel to Dune (2021), it "
        "is the second of a two-part adaptation of the 1965 novel Dune by Frank Herbert. "
    ),
]

In [None]:
document_store = InMemoryDocumentStore()

indexing_pipeline = Pipeline()
# Add the embedded and the document writer components to the pipeline
indexing_pipeline.connect("doc_embedder.documents", "doc_writer.documents")
indexing_pipeline.run({"doc_embedder": {"documents": documents}})

Let's check if the documents are there...

In [None]:
document_store.filter_documents()

# RAG Pipeline

Great, now we can buid the proper RAG pipeline using our documents.
As in notebook 02, we need a prompt template. However, this time we will use a real templating engine, [Jinja](https://jinja.palletsprojects.com/en/3.1.x/). 

We will implement our RAG as a Pipiline. Pipelines are the key abstraction of Haystack (and Langchain, and Llamaindex). 

The pipelines in Haystack 2.0 are directed multigraphs of different Haystack components and integrations. They give the freedom to connect these components in various ways. This means that the pipeline doesn't need to be a continuous stream of information. With the flexibility of Haystack pipelines, you can have simultaneous flows, standalone components, loops, and other types of connections.

Learn more at https://docs.haystack.deepset.ai/docs/pipelines

In [None]:
template = """
Answer the questions based on the given context.

Context:
{% for document in documents %}
    {{ document.content }}
{% endfor %}
Question: {{ question }}
Answer:
"""

In [None]:
rag_pipe = Pipeline()

# Here you need an embedded, a retriever, a prompt builder and finally a generator.

rag_pipe.connect("embedder.embedding", "retriever.query_embedding")
rag_pipe.connect("retriever", "prompt_builder.documents")
rag_pipe.connect("prompt_builder", "llm")

rag_pipe.show()

And now we can run it.

In [None]:
from pprint import pprint

query = "What film talks about the atomic bomb?"
response = rag_pipe.run(
    {"embedder": {"text": query}, "prompt_builder": {"question": query}}
)
pprint(response)