# Retrieval-Augmented Text Generation

The moment that we've all been waiting for has finally arrived! The Retrieval-Augmented Text Generation (RAG) Framework is here! 🎉

Throughout this notebook we will be exploring RAG, what it is, how it works, and why it's so exciting.

## Why RAG?

Although trained on large datasets, stale data can severely limit LLMs. It faces several challenges:

1. The models are trained on internet content, so they might not generate relevant output when prompted for information that is not publicly available on the internet.

2. The models are trained up to a certain date, they might not generate relevant output when prompted for content and information that has happened after the training completion date of the model.

3. The models are trained to be more generalized. This means that they can only produce generic outputs and might not perform as expected when prompted for specific deep-dive concepts related to a particular topic.

One way to dynamically integrate relevant external information is retrieval-augmented generation (RAG), which can help improve the reliability of LLM outputs.

Going back to our original question of how this can be utilized in our own work or organization on [section 1](1-domain-specific-question-answering.ipynb) of this module. RAG Framework can really be useful in the scenario where there may be a set of documents, GitHub repositories, research papers, and domain-specific knowledge bases that you might want to search through quickly.

## RAG Framework

RAG proposes a solution to this issue by supplementing the prompt sent to the LLM with information from external sources through a retrieval model via vector embeddings (more on this later), thereby providing the LLM with more relevant input to generation responses. It allows you to use pre-trained LLMs without fine-tuning them or training your own LLM on your training data. 

![RAG Workflow](../../images/rag-workflow.webp)


Image Source: [Medium Blog](https://medium.com/@henryhengluo/intro-of-retrieval-augmented-generation-rag-and-application-demos-c1d9239ababf)

Multiple concepts influence RAG pipeline:

1. Retrieval
2. Augmentation
3. Generation

## Retrieval

The retrieval phase can also be considered the data and query/prompt preparation phase, focusing on efficient information retrieval or data access. To improve your RAG pipeline, the pre-retrieval phase contains tasks such as: `(1): Indexing, (2) Query Manipulation, (3) Data Modification, (4) Search, and (5) Ranking.` In this tutorial, we primarily focus on indexing and search. 

`Indexing` enables fast and accurate information retrieval that sets up the context for any LLM to improve its response to a given user prompt or query. 

We will be indexing abstracts for all astrophysics papers and Astropy's documentation, a common core package for Astronomy in Python. 

### Embeddings

Embeddings, also called "Vector Embedding," help LLMs develop a semantic understanding of the textual data they are trained on. In simpler terms, these embedding models lay the groundwork for LLMs to perform tasks like sentence completion, similarity search, questions and answers, etc.

#### Embedding vs Fine-tuning

|| Embedding | Fine-tuning|
|---|---|---|
|**Definition**| Use pre-trained LLM as feature extractor​ | Update parameters of pre-trained LLM during task-specific training|
|**Process**| Input Encoding > tokenized​ > Embedding Extraction ​ > Downstream Task | Initialization ​> Task-specific Training​ > Fine-tuning Layers (optional)|
|**Advantages**| Efficient use of pre-trained knowledge, Faster inference | Adaptability to task-specific nuances, May require less labeled data than from scratch​|
|**Considerations**| N/A |Risk of overfitting, Computational cost can be high|
|**When to use**| Limited computational resources​, Limited labeled data | Significant computational resources, Large corpus of labeled data|
|**Performance**| Performs well, especially with limited data​ | Can achieve state-of-the-art results on a wide range of tasks|

### In a nutshell

- Embeddings models are typically small in size and less computationally intensive​

- Regular updates of embedding vectors are faster, cheaper, and simpler compared to fine-tuning a model.​

#### Vector

At the lowest level, machines only understand numeric values. For LLMs to work, natural language is converted into an array of numeric values before they are fed into the models. These arrays of numeric values are called "Vector."

An example of a vector: [2.5, 1.0, 3.3, 7.8]

The above is an example of a vector of size 4. 

In [6]:
import numpy as np

vector = np.array([2.5, 1.7, 3.3, 7.8])
print(f"Vector: {vector}")

Vector: [2.5 1.7 3.3 7.8]


#### Tokens

We stated above that **"texts are converted into an array of numeric values called vectors"**.

But depending on your use case, each word, sentence, paragraph, or entire document can be represented as a vector. 

Tokens are the smallest natural language units converted into a vector. It could be at the character level, sub-word level, word level, sentence level, paragraph level, or document level.

Example: Consider the text below.

`Earth is a planet of the solar system. There are 9 planets in the solar system. 
All planets revolve around the sun. Sun is a star.`


Case 1.) **Tokenizing the entire paragraph into vector.**  
Tokenization: The entire paragraph is a single token.   
Vectorization: A single vector.  
Sample Vector Representation: [3.1, 6.8, 5.4, 8.0, 7.1]

Case 2.) **Tokenizing each sentence into vectors.**  
Tokenization: One token for each sentence (total 4 tokens)  
Vectorization: One vector for each sentence (total 4 vectors).   
Sample Vector Representation: [[1.2, 2.3, 3.8, 7.9, 0.8], [2.5, 3.0, 8.2, 6.6, 4.1], [3.2, 6.5, 8.1, 9.3, 1.4], [1.1, 0.7, 7.2, 3.5, 8.5]]

Case 3.) **Tokenizing each word in the paragraph into a vector. There are 26 words in the paragraph, ignoring punctuation. Each word gets converted into a vector.**  
Tokenization: One token for each word in the paragraph (26 tokens)  
Vectorization: One vector for each token (total 26 vectors).    
Sample Vector Representation: [[2.1, 3.2, 4.1, 9.8, 7.0], [8.2, 4.2, 7.1, 3.8, 2.0].....total 26 such representations]


#### Tokenizers

Tokenizers are components responsible for converting large texts into tokens (tokenization). Different types of pre-trained tokenizers are available. You can even train your own tokenizers. But for the scope of this tutorial, we will use a pre-trained one. 

Generally, each tokenizer follows the following steps:

1. Break down the original text into tokens. These tokens could again be at the character, sub-word, word, sentence, paragraph, or document levels.
2. Assign a unique identifier to each of the tokens created.

In [7]:
# For example, here is how you can split a short sentence into chunks of text
from langchain_text_splitters import CharacterTextSplitter

In [8]:
text_splitter = CharacterTextSplitter(
    separator=" ",
    chunk_size=10,
    chunk_overlap=0,
)
text_splitter.split_text(text="Earth is a planet in the solar system.")

['Earth is a', 'planet in', 'the solar', 'system.']

[Learn more about how to split text into tokens in LangChain here.](https://python.langchain.com/v0.2/docs/how_to/split_by_token/) 

#### Embedding Models

A language model needs to understand how tokens are related to each other in the context of human language. To understand this semantic relationship, these tokens are converted into numerical vectors.

Embedding Models are trained upon these tokens to develop an "embedding space."

- Before the training, the embedding model initializes an N-dimensional 'vector' corresponding to each 'token' with random values. (Value of N depends on the embedding model)
  
- During the embedding model training, the values for these vectors are updated across iterations. In this process, similar or related tokens are updated to have similarly valued vectors.
  
- After the training, the collection of all the 'vectors' corresponding to all the tokens is called the "embedding space."

- "Embedding Space" is an encoded representation of meanings of tokens and inter-token relationships.

See [Word Embeddings Resource](https://www.nlplanet.org/course-practical-nlp/01-intro-to-nlp/11-text-as-vectors-embeddings/) for more conceptual details on embeddings.

To understand this further, let's take a look at how it all works using a pre-trained embedding model.

For the tutorial and simplicity, we are using the Langchain Hugging Face integrations, which is available in the [`langchain-huggingface`](https://pypi.org/project/langchain-huggingface/) package.
To use an embedding model available in Hugging Face,
we will simply use the `HuggingFaceEmbedding` class.

In [9]:
from langchain_huggingface import HuggingFaceEmbeddings

We are using the [all-MiniLM-L12-v2](https://huggingface.co/sentence-transformers/all-MiniLM-L12-v2) sentence-transformers embedding model for this tutorial.
After [some evaluation](https://github.com/uw-ssec/tutorials/issues/6) that we did, we found that this model works well for our use case as it is lightweight and provides good performance.

This model "maps sentences & paragraphs to a 384 dimensional dense vector space and can be used for tasks like clustering or semantic search".

However, you can use any other embedding model available in Hugging Face, and we recommend going to [MTEB Leaderboard](https://huggingface.co/spaces/mteb/leaderboard) to find embedding models and see how they compare to each other.

In [10]:
# Setup the embedding, we are using the MiniLM model here
embeddings_model = HuggingFaceEmbeddings(
    model_name="sentence-transformers/all-MiniLM-L12-v2"
)

  from tqdm.autonotebook import tqdm, trange


In [11]:
query_result = embeddings_model.embed_query("Earth is a planet in the solar system.")

In [12]:
# Dimension of vector
len(query_result)

384

In [13]:
query_result[-3:]

[0.05409970134496689, 0.07589352875947952, -0.04195248335599899]

In an embedding space, you can find how similar two vectors are using `dot product` or  using `cosine similarity.`  

In [14]:
from scipy import spatial

In [15]:
print(
    "Similarity:",
    1
    - spatial.distance.cosine(
        query_result,
        embeddings_model.embed_query("Mars is a planet in the solar system."),
    ),
)

Similarity: 0.7926038246541188


In [16]:
print(
    "Similarity:",
    1
    - spatial.distance.cosine(
        query_result, embeddings_model.embed_query("Hello Tacoma.")
    ),
)

Similarity: 0.08770959442668558


What we have demonstrated above in finding similarity between vectors is essentially what's happening in the retrieval phase of the RAG pipeline within a *Vector Database*.

#### Vector Stores

Once the embeddings are created for our relevant documents or knowledge base, we need to store these embeddings in the database for fast retrieval. 

The type of databases that store these vector embeddings are called "Vector Stores." We will use a vector store called "Qdrant," as shown below. 

In the below code, 
- Vector store works along with the embedding model to create vector embeddings.
- Vector embeddings are stored in the Qdrant Vector database collection.

We have already created a vector database that contains the astrophysics paper abstracts and Astropy's documentation, please refer to the [notebook](../appendix/astrophysics-dataset-creation.ipynb) in the Appendix.

The `ssec_tutorials` utility package contains a `download_qdrant_data` function that downloads the existing Qdrant database that we've created for this tutorial.
Additionally, there's a `QDRANT_COLLECTION_NAME` constant variable
that contains the name of the collection in the Qdrant database.

In [17]:
from ssec_tutorials import download_qdrant_data, QDRANT_COLLECTION_NAME

In [18]:
QDRANT_PATH = download_qdrant_data()

Qdrant data already exists at /Users/lsetiawan/.cache/ssec_tutorials/scipy_qdrant


In [19]:
QDRANT_PATH

PosixPath('/Users/lsetiawan/.cache/ssec_tutorials/scipy_qdrant')

In [20]:
QDRANT_COLLECTION_NAME

'arxiv_astro-ph_abstracts_astropy_github_documentation'

With having the Qdrant path and collection name information, as well as the embeddings model, we can now use the Langchain Qdrant integrations package called `langchain-qdrant` to interact with the Qdrant database by using the `Qdrant` class.

In [21]:
from langchain_qdrant import Qdrant

In [22]:
qdrant = Qdrant.from_existing_collection(
    embedding=embeddings_model,
    collection_name=QDRANT_COLLECTION_NAME,
    path=QDRANT_PATH,
)

### Search

Now that we have the Qdrant database instance, we are ready to search for the relevant documents based on the user query.
However, before we can simply search, we will need a [`VectorStoreRetriever`](https://python.langchain.com/v0.2/docs/how_to/vectorstore_retriever/) object.

To get the `VectorStoreRetriever` object, we can simply call the `.as_retriever()` method on the Qdrant object.

In this example, we will be setting the `search_type` to `"mmr"` and `search_kwargs` to `{"k": 2}`.

"mmr" stands for  Maximum Marginal Relevance

MMR selects examples based on a combination of which examples are most similar to the inputs, while also optimizing for diversity. It does this by finding the examples with the embeddings that have the greatest cosine similarity with the inputs, and then iteratively adding them while penalizing them for closeness to already selected examples.

The `k` parameter in `search_kwargs` specifies the number of documents to retrieve.


In [23]:
# Setup the retriever for later step
retriever = qdrant.as_retriever(search_type="mmr", search_kwargs={"k": 2})

Let's invoke this retriever object with some of the questions from previous section and see what we get.

In [24]:
documents = retriever.invoke("What is dark matter?")

We got the relevant documents from the Qdrant database for the given questions. Let's see what these documents look like.

In [25]:
document = documents[0]

In [26]:
type(document)

langchain_core.documents.base.Document

In [27]:
dict(document)

{'page_content': '  One of the great scientific enigmas still unsolved, the existence of dark\nmatter, is reviewed. Simple gravitational arguments imply that most of the mass\nin the Universe, at least 90%, is some (unknown) non-luminous matter. Some\nparticle candidates for dark matter are discussed with particular emphasis on\nthe neutralino, a particle predicted by the supersymmetric extension of the\nStandard Model of particle physics. Experiments searching for these relic\nparticles, carried out by many groups around the world, are also discussed.\nThese experiments are becoming more sensitive every year and in fact one of the\ncollaborations claims that the first direct evidence for dark matter has\nalready been observed.\n',
 'metadata': {'id': 'hep-ph/0110122',
  'title': 'The Enigma of the Dark Matter',
  '_id': '4ab99f7c922747d9a6a34b855d959779',
  '_collection_name': 'arxiv_astro-ph_abstracts_astropy_github_documentation'},
 'type': 'Document'}

We see that this is a core Langchain Document object that contains the document's metadata and content.

Later we will see how we can use this document to generate the response, for now let's create a utility formatting function to retrieve just the content of the document so that we can put this as part of our prompt template input, also known as "Augmentation".

In [28]:
# Post-processing
def format_docs(docs):
    return "\n\n".join(doc.page_content for doc in docs)

In [29]:
print(format_docs(documents))

  One of the great scientific enigmas still unsolved, the existence of dark
matter, is reviewed. Simple gravitational arguments imply that most of the mass
in the Universe, at least 90%, is some (unknown) non-luminous matter. Some
particle candidates for dark matter are discussed with particular emphasis on
the neutralino, a particle predicted by the supersymmetric extension of the
Standard Model of particle physics. Experiments searching for these relic
particles, carried out by many groups around the world, are also discussed.
These experiments are becoming more sensitive every year and in fact one of the
collaborations claims that the first direct evidence for dark matter has
already been observed.


  Dark matter could be composed of compact dark objects (CDOs). These objects
may interact very weakly with normal matter and could move freely {\it inside}
the Earth. A CDO moving in the inner core of the Earth will have an orbital
period near 55 min and produce a time dependent signal

## Augmentation & Generation

Now that we can retrieve the most relevant document based on a question, we can use the retrieved document and send it along with the prompt to increase the context for the LLM.

This can also be referred to as the `retrieval-augmented prompt.`

In [30]:
from langchain_community.llms import LlamaCpp
from langchain_core.prompts import PromptTemplate
from ssec_tutorials import download_olmo_model

In [31]:
OLMO_MODEL = download_olmo_model()

Model already exists at /Users/lsetiawan/.cache/ssec_tutorials/OLMo-7B-Instruct-Q4_K_M.gguf


In [32]:
olmo = LlamaCpp(
    model_path=str(OLMO_MODEL),
    temperature=0.8,
    verbose=False,
    n_ctx=2048,
    max_tokens=512,
)

In [33]:
# Create a prompt template using OLMo's tokenizer chat template we saw in module 1.
prompt_template = PromptTemplate.from_template(
    template=olmo.client.metadata["tokenizer.chat_template"],
    template_format="jinja2",
    partial_variables={"add_generation_prompt": True, "eos_token": "<|endoftext|>"},
)

In [34]:
# Test the prompt you want to send to OLMo.
question = "What is dark matter?"
context = format_docs(retriever.invoke(question))

final_prompt_content = prompt_template.format(
    messages=[
        {
            "role": "user",
            "content": f"""\
                You are an astrophysics expert. Please answer the question on astrophysics based on the following context:

                Context: {context}

                Question: {question}
            """,
        }
    ]
)

In [35]:
print(final_prompt_content)

<|endoftext|>

<|user|>
                You are an astrophysics expert. Please answer the question on astrophysics based on the following context:

                Context:   One of the great scientific enigmas still unsolved, the existence of dark
matter, is reviewed. Simple gravitational arguments imply that most of the mass
in the Universe, at least 90%, is some (unknown) non-luminous matter. Some
particle candidates for dark matter are discussed with particular emphasis on
the neutralino, a particle predicted by the supersymmetric extension of the
Standard Model of particle physics. Experiments searching for these relic
particles, carried out by many groups around the world, are also discussed.
These experiments are becoming more sensitive every year and in fact one of the
collaborations claims that the first direct evidence for dark matter has
already been observed.


  Dark matter could be composed of compact dark objects (CDOs). These objects
may interact very weakly with normal

You can see above that we now have a `context` input within the prompt.
This context is the content of the document(s) that we retrieved from the Qdrant database.
With this context, the LLM can generate more relevant responses.
So let's see how it does!

In [36]:
from langchain_core.callbacks import StreamingStdOutCallbackHandler

### OLMo with context

In [37]:
olmo.invoke(
    final_prompt_content, config={"callbacks": [StreamingStdOutCallbackHandler()]}
)

 Dark matter is a theoretical component of the Universe that has yet to be directly observed but its presence can be inferred from the gravitational effects it exerts on visible matter such as stars, gas, and galaxies. According to current astrophysical data, approximately 90% of the content in the universe is dark matter, while only 5% is made up of visible, or baryonic, matter (stars, gas, and dust). Dark matter particles are yet to be directly observed; their existence is inferred from their gravitational effects on visible objects.

One candidate for dark matter is the neutralino, a particle predicted by the supersymmetric extension of the Standard Model of particle physics. Neutralinos have several properties that make them an appealing choice as dark matter candidates. First, they are stable and can persist in the universe until today. Second, they are non-interacting with ordinary matter particles, allowing them to move freely within planets like Earth without being affected by 

" Dark matter is a theoretical component of the Universe that has yet to be directly observed but its presence can be inferred from the gravitational effects it exerts on visible matter such as stars, gas, and galaxies. According to current astrophysical data, approximately 90% of the content in the universe is dark matter, while only 5% is made up of visible, or baryonic, matter (stars, gas, and dust). Dark matter particles are yet to be directly observed; their existence is inferred from their gravitational effects on visible objects.\n\nOne candidate for dark matter is the neutralino, a particle predicted by the supersymmetric extension of the Standard Model of particle physics. Neutralinos have several properties that make them an appealing choice as dark matter candidates. First, they are stable and can persist in the universe until today. Second, they are non-interacting with ordinary matter particles, allowing them to move freely within planets like Earth without being affected 

### OLMo without context

In [38]:
olmo.invoke(question, config={"callbacks": [StreamingStdOutCallbackHandler()]})

 What is dark energy?
These are two of the most interesting, and perhaps most important, questions in modern physics. In this episode of SciFri we explore what these mysterious substances might be, and why they have such a big impact on our understanding of the universe.
Dark Matter is a mysterious substance that seems to make up about 85% of the matter in the Universe. It doesn't interact with light or energy, but it does interact with gravity. This means that it affects how galaxies spin, and it also leaves its mark on the way that stars move through the Milky Way. Scientists believe that Dark Matter is a substance made from something called WIMPs - Weakly Interacting Massive Particles.
Dark Energy, on the other hand, is a mysterious force in the Universe that seems to be pushing the cosmos apart at an ever-increasing rate. It makes up about 68% of the energy in the universe. Scientists think that Dark Energy might be caused by something called "chocolate" - a substance that acts lik

' What is dark energy?\nThese are two of the most interesting, and perhaps most important, questions in modern physics. In this episode of SciFri we explore what these mysterious substances might be, and why they have such a big impact on our understanding of the universe.\nDark Matter is a mysterious substance that seems to make up about 85% of the matter in the Universe. It doesn\'t interact with light or energy, but it does interact with gravity. This means that it affects how galaxies spin, and it also leaves its mark on the way that stars move through the Milky Way. Scientists believe that Dark Matter is a substance made from something called WIMPs - Weakly Interacting Massive Particles.\nDark Energy, on the other hand, is a mysterious force in the Universe that seems to be pushing the cosmos apart at an ever-increasing rate. It makes up about 68% of the energy in the universe. Scientists think that Dark Energy might be caused by something called "chocolate" - a substance that act

From the responses above, we can see that the response with context is more relevant and informative compared to the response without context, an this shows the power of the RAG framework, with just a few documents.

One way to generate the response with OLMo is to build `context` using the `question` beforehand, as shown above, create an llm_chain then `invoke` it with `messages`.

However, We can further use [LangChain's convenience functions](https://python.langchain.com/v0.2/docs/tutorials/rag/#built-in-chains) to streamline our pipeline using [create_stuff_documents_chain](https://api.python.langchain.com/en/latest/chains/langchain.chains.combine_documents.stuff.create_stuff_documents_chain.html) and [create_retrieval_chain](https://api.python.langchain.com/en/latest/chains/langchain.chains.retrieval.create_retrieval_chain.html) from the main `langchain` package.

The main `langchain` package contains chains, agents, and retrieval strategies that make up an application's cognitive architecture

`create_stuff_documents_chain` specifies how retrieved context is fed into a prompt and LLM. 

On looking its signature, notice that it accepts `prompt` argument of type `BasePromptTemplate` but it needs input keys as `context` and `input`.

In [39]:
from langchain.chains import create_retrieval_chain
from langchain.chains.combine_documents import create_stuff_documents_chain

To use the helper functions,
we'll need to setup our template string to use the `context` and `input` keys as variables.

In [40]:
# Create a new prompt_template
# so that it accepts `context` and `input` as input_variables
input_string_template = """\
You are an astrophysics expert. Please answer the question on astrophysics based on the following context.
Context: {context}
Question: {input}
"""
transformed_prompt_template = PromptTemplate.from_template(
    prompt_template.partial(
        messages=[{"role": "user", "content": input_string_template}]
    ).format()
)
transformed_prompt_template

PromptTemplate(input_variables=['context', 'input'], template='<|endoftext|>\n\n<|user|>\nYou are an astrophysics expert. Please answer the question on astrophysics based on the following context.\nContext: {context}\nQuestion: {input}\n\n\n\n<|assistant|>\n\n')

In [41]:
document_chain = create_stuff_documents_chain(
    llm=olmo, prompt=transformed_prompt_template
)

We can run this by passing in the context directly:

In [42]:
question = "What is dark matter?"
document_chain.invoke(
    {
        "input": question,
        "context": retriever.invoke(question),
    },
    config={"callbacks": [StreamingStdOutCallbackHandler()]},
)

Dark matter is a theoretical entity that is still not directly observed or detected in our universe. Based on current scientific understanding, it makes up approximately 90% of the matter content in the observable Universe (1). The term "dark" refers to its invisible nature; it does not emit, reflect, nor absorb light and cannot be seen with conventional telescopes.

One possible explanation for dark matter is that it comprises weakly interacting massive particles (WIMPs), which are hypothetical particles predicted by theoretical physics. Although they have yet to be discovered, WIMPs could account for the missing mass in our Universe. Some particle candidates for dark matter include the neutralino mentioned in the context, which is a stable superluminous particle that can naturally acquire a tiny electric charge without violating energy conservation laws (2).

The existence of dark matter is supported by several astronomical and cosmological observations. These include the rotation cu

'Dark matter is a theoretical entity that is still not directly observed or detected in our universe. Based on current scientific understanding, it makes up approximately 90% of the matter content in the observable Universe (1). The term "dark" refers to its invisible nature; it does not emit, reflect, nor absorb light and cannot be seen with conventional telescopes.\n\nOne possible explanation for dark matter is that it comprises weakly interacting massive particles (WIMPs), which are hypothetical particles predicted by theoretical physics. Although they have yet to be discovered, WIMPs could account for the missing mass in our Universe. Some particle candidates for dark matter include the neutralino mentioned in the context, which is a stable superluminous particle that can naturally acquire a tiny electric charge without violating energy conservation laws (2).\n\nThe existence of dark matter is supported by several astronomical and cosmological observations. These include the rotati

However, we want the context to be dynamically generated using the passed input or question.

From LangChain's documentation: `create_retrieval_chain` adds the retrieval step and propagates the retrieved context through the chain, providing it alongside the final answer. It has input key `input`, and includes input, context, and answer in its output.

In [43]:
retrieval_chain = create_retrieval_chain(retriever, document_chain)

In [44]:
response = retrieval_chain.invoke(
    {"input": "What is dark matter?"},
    config={"callbacks": [StreamingStdOutCallbackHandler()]},
)

Dark matter is a theoretical entity that is still not directly observed or detected in our universe, despite its significant presence based on gravitational arguments and astronomical observations. According to astrophysics, approximately 90% of the visible mass in the Universe is believed to be non-luminous dark matter, which does not emit, reflect, or absorb light. Dark matter particles are yet to be discovered, but they have been predicted by particle physics based on supersymmetry (SUSY). SUSY predicts that dark matter can take various forms such as the neutralino, a particle that is an inert supersymmetric partner of the known photon and lepton particles.

The term "dark" refers to its inability to be detected or observed using electromagnetic radiation, making it difficult to study directly. Dark matter could exist in various forms like compact dark objects (CDOs) - hypothetical particles with no interaction with normal matter, moving freely inside Earth, and producing a time-dep

In [45]:
response

{'input': 'What is dark matter?',
 'context': [Document(page_content='  One of the great scientific enigmas still unsolved, the existence of dark\nmatter, is reviewed. Simple gravitational arguments imply that most of the mass\nin the Universe, at least 90%, is some (unknown) non-luminous matter. Some\nparticle candidates for dark matter are discussed with particular emphasis on\nthe neutralino, a particle predicted by the supersymmetric extension of the\nStandard Model of particle physics. Experiments searching for these relic\nparticles, carried out by many groups around the world, are also discussed.\nThese experiments are becoming more sensitive every year and in fact one of the\ncollaborations claims that the first direct evidence for dark matter has\nalready been observed.\n', metadata={'id': 'hep-ph/0110122', 'title': 'The Enigma of the Dark Matter', '_id': '4ab99f7c922747d9a6a34b855d959779', '_collection_name': 'arxiv_astro-ph_abstracts_astropy_github_documentation'}),
  Docume

One of the nice things about the LangChain helper function is that the result is a dictionary containing the `input`, `context`, and `answer` keys, so you can easily see what you asked and the context that was used to generate the answer.

This way of creating the RAG pipeline is quick, but not as customizable. If you need more control over the input variables, we'll need to create our own chain.

In the next module, we'll explore how to do this to create a simple Panel application that uses the RAG pipeline to generate responses to user questions.

For now let's clean up the qdrant client by closing it before the next module, otherwise we'll run into errors!

In [46]:
qdrant.client.close()