## Notebook for simple RAG pipeline with HayStack
##### prepared by Vladimir Kanchev using [1]


#### 1. Initialize document store Documents which Q&A system uses to find answers to questions.

In [41]:
from haystack.document_stores.in_memory import InMemoryDocumentStore

document_store = InMemoryDocumentStore()

#### 2. Fetch the data.
Here use the Wikipedia data from Seven Wonders of the Ancient World - preprocessed data from Hugging face 'Seven Wonders'.

In [42]:
from datasets import load_dataset
from haystack import Document

dataset = load_dataset("bilgeyucel/seven-wonders", split="train")
docs = [Document(content=doc["content"], meta=doc["meta"]) for doc in dataset]

#### 3. Initialize a document embedder.
Store the data in DocumentStore with embeddings.

In [43]:
from haystack.components.embedders import SentenceTransformersDocumentEmbedder

doc_embedder = SentenceTransformersDocumentEmbedder(model="sentence-transformers/all-MiniLM-L6-v2")
doc_embedder.warm_up()


#### 4. Write Documents to DocumentStore
Create embeddings for each document and write them to the DocumentStore.

In [44]:
docs_with_embeddings = doc_embedder.run(docs)
document_store.write_documents(docs_with_embeddings["documents"])

Batches:   0%|          | 0/5 [00:00<?, ?it/s]

Batches: 100%|██████████| 5/5 [00:05<00:00,  1.05s/it]


151

#### 5. Build the RAG pipeline.
Build a pipeline to generate answers for the user query following the RAG approach.

##### 5.1 Initialize a text embedder to create an embedding for the user query.

In [45]:
from haystack.components.embedders import SentenceTransformersTextEmbedder

text_embedder = SentenceTransformersTextEmbedder(model="sentence-transformers/all-MiniLM-L6-v2")

##### 5.2 Initialize the Retriever
Aim to get the relevant documents to the query from the Store 

In [46]:
from haystack.components.retrievers.in_memory import InMemoryEmbeddingRetriever

retriever = InMemoryEmbeddingRetriever(document_store)

##### 5.3 Define a Template Prompt
Create a custom prompt with two parameters: *documents*, retrieved from document store and *question*, from the user. Initialize a PromptBuilder instance that 
will generate a complete prompt.

In [47]:
from haystack.components.builders import PromptBuilder

template = """
Given the following information, answer the question.

Context:
{% for document in documents %}
    {{ document.content }}
{% endfor %}

Question: {{question}}
Answer:
"""

prompt_builder = PromptBuilder(template=template)

##### 5.4 Initialize a Generator 
Initialize a *OpenAIGenerator* to communicate with OpenAI GPT.

In [48]:
import os
from getpass import getpass
from haystack.components.generators import OpenAIGenerator
from dotenv import load_dotenv, find_dotenv

load_dotenv(find_dotenv())
if "OPENAI_API_KEY" not in os.environ:
    os.environ["OPENAI_API_KEY"] = getpass("Enter OpenAI API key:")
generator = OpenAIGenerator(model="gpt-3.5-turbo")

##### 5.5 Build the pipeline.
Add all components to the pipeline and connect them. 

In [49]:
from haystack import Pipeline

basic_rag_pipeline = Pipeline()
# Add components to your pipeline
basic_rag_pipeline.add_component("text_embedder", text_embedder)
basic_rag_pipeline.add_component("retriever", retriever)
basic_rag_pipeline.add_component("prompt_builder", prompt_builder)
basic_rag_pipeline.add_component("llm", generator)

# Now, connect the components to each other
basic_rag_pipeline.connect("text_embedder.embedding", "retriever.query_embedding")
basic_rag_pipeline.connect("retriever", "prompt_builder.documents")
basic_rag_pipeline.connect("prompt_builder", "llm")

<haystack.core.pipeline.pipeline.Pipeline object at 0x78331a0655a0>
🚅 Components
  - text_embedder: SentenceTransformersTextEmbedder
  - retriever: InMemoryEmbeddingRetriever
  - prompt_builder: PromptBuilder
  - llm: OpenAIGenerator
🛤️ Connections
  - text_embedder.embedding -> retriever.query_embedding (List[float])
  - retriever.documents -> prompt_builder.documents (List[Document])
  - prompt_builder.prompt -> llm.prompt (str)

#### 6. Ask a question.


In [50]:
question = "What does Rhodes Statue look like?"

response = basic_rag_pipeline.run({"text_embedder": {"text": question}, "prompt_builder": {"question": question}})

print(response["llm"]["replies"][0])

Batches: 100%|██████████| 1/1 [00:00<00:00, 96.14it/s]


The Rhodes Statue was a statue of the Greek sun-god Helios, erected in the city of Rhodes. It stood approximately 70 cubits, or 33 meters (108 feet) tall, and was said to have curly hair with bronze or silver flame radiating from the head, similar to images found on contemporary Rhodian coins.


In [51]:
examples = [
    "Where is Gardens of Babylon?",
    "Why did people build Great Pyramid of Giza?",
    "What does Rhodes Statue look like?",
    "Why did people visit the Temple of Artemis?",
    "What is the importance of Colossus of Rhodes?",
    "What happened to the Tomb of Mausolus?",
    "How did Colossus of Rhodes collapse?",
]