# Domain Adaptated Langue Model (DALM) - Arcee.ai - An End to End RAG Solution

In the following notebook, we'll work through an "end-to-end" RAG approach created by Arcee.ai called ["Domain Adapted Language Model"](https://github.com/arcee-ai/DALM)!

- ü§ù Breakout Room #1
  1. Task 1: Cloning DALM Repository and Installing Dependencies
  2. Task 2: Preparing Dataset for Training
  3. Task 3: Training E2E Rag
  4. Task 4: Implementing a LCEL RAG Chain with our Models

## Task 1: Cloning DALM Repository and Installing Dependencies




In [1]:
# !git clone https://github.com/arcee-ai/DALM

In [2]:
%cd DALM
%pip install --upgrade -e .

In [3]:
%pip install -U langchain langchain-core langchain-community sentence_transformers

Note: you may need to restart the kernel to use updated packages.


In [4]:
%pip install -U pymupdf faiss-cpu

Collecting pymupdf
  Using cached PyMuPDF-1.24.3-cp312-none-win_amd64.whl.metadata (3.4 kB)
Collecting PyMuPDFb==1.24.3 (from pymupdf)
  Using cached PyMuPDFb-1.24.3-py3-none-win_amd64.whl.metadata (1.4 kB)
Using cached PyMuPDF-1.24.3-cp312-none-win_amd64.whl (3.2 MB)
Using cached PyMuPDFb-1.24.3-py3-none-win_amd64.whl (12.4 MB)
Installing collected packages: PyMuPDFb, pymupdf
  Attempting uninstall: PyMuPDFb
    Found existing installation: PyMuPDFb 1.24.1
    Uninstalling PyMuPDFb-1.24.1:
      Successfully uninstalled PyMuPDFb-1.24.1
  Attempting uninstall: pymupdf
    Found existing installation: PyMuPDF 1.24.2
    Uninstalling PyMuPDF-1.24.2:
      Successfully uninstalled PyMuPDF-1.24.2
Successfully installed PyMuPDFb-1.24.3 pymupdf-1.24.3
Note: you may need to restart the kernel to use updated packages.


## Task 2: Prepare Dataset of Examples

E2E RAG requires a dataset of `[Question, Abstract, Answer]` triples

At inference time, our model will take a users query, draw from the available passages, and pass relevant context to the generator to create an answer.

We'll use synthetic dataset generation powered by OpenAI's `gpt-3.5-turbo` to generate our questions and answers for each piece of context through `llama-index`.

We'll be working with Douglas Adam's Hitchhiker's Guide - but feel free to substitute your own data!


## Generating Synthetic Training Data with Llama Index

Let's generate some synthetic data using Llama Index - we'll do this with `gpt-3.5-turbo` and then use the resultant data to fine-tune!

Let's install our dependencies for this process!

In [5]:
%pip install -U llama-index pypdf

Collecting llama-index
  Downloading llama_index-0.10.36-py3-none-any.whl.metadata (11 kB)
Collecting llama-index-core<0.11.0,>=0.10.35 (from llama-index)
  Downloading llama_index_core-0.10.36-py3-none-any.whl.metadata (3.7 kB)
Downloading llama_index-0.10.36-py3-none-any.whl (6.8 kB)
Downloading llama_index_core-0.10.36-py3-none-any.whl (15.4 MB)
   ---------------------------------------- 0.0/15.4 MB ? eta -:--:--
   ---------------------------------------- 0.1/15.4 MB 2.6 MB/s eta 0:00:06
   -- ------------------------------------- 1.0/15.4 MB 16.4 MB/s eta 0:00:01
   -- ------------------------------------- 1.0/15.4 MB 16.4 MB/s eta 0:00:01
   ------ --------------------------------- 2.4/15.4 MB 13.7 MB/s eta 0:00:01
   ------------ --------------------------- 4.8/15.4 MB 21.7 MB/s eta 0:00:01
   ------------------ --------------------- 7.1/15.4 MB 26.6 MB/s eta 0:00:01
   ------------------------- -------------- 9.8/15.4 MB 31.1 MB/s eta 0:00:01
   -------------------------------

### Loading Data

Now we're good to grab some data!

We're going to use Hithhiker's Guide to the Galaxy as our example data today!

In [6]:
# !wget https://justcheckingonall.files.wordpress.com/2008/01/hhgtg1.pdf

In [7]:
TRAINING_FILES = ["hhgtg1.pdf"]

Now that we have our data, let's organize into our desired format for generating synthetic questions/responses.

In [8]:
from llama_index.core import SimpleDirectoryReader
from llama_index.core.node_parser import SimpleNodeParser
from llama_index.core.schema import MetadataMode

def load_corpus(files, verbose=False):
    if verbose:
        print(f"Loading files {files}")

    reader = SimpleDirectoryReader(input_files=files)
    docs = reader.load_data()
    if verbose:
        print(f'Loaded {len(docs)} docs')

    parser = SimpleNodeParser.from_defaults()
    nodes = parser.get_nodes_from_documents(docs, show_progress=verbose)

    if verbose:
        print(f'Parsed {len(nodes)} nodes')

    corpus = {node.node_id: node.get_content(metadata_mode=MetadataMode.NONE) for node in nodes}
    return corpus

In [9]:
train_corpus = load_corpus(TRAINING_FILES, verbose=True)

Loading files ['hhgtg1.pdf']
Loaded 139 docs


Parsing nodes:   0%|          | 0/139 [00:00<?, ?it/s]

Parsed 139 nodes


### Creating Synthetic QA Pairs

We can leverage everyone's favourite OpenAI model `gpt-3.5-turbo` to help us generate some QA pairs.

In [10]:
%pip install -U llama-index-llms-openai

Collecting llama-index-llms-openai
  Downloading llama_index_llms_openai-0.1.19-py3-none-any.whl.metadata (559 bytes)
Downloading llama_index_llms_openai-0.1.19-py3-none-any.whl (11 kB)
Installing collected packages: llama-index-llms-openai
  Attempting uninstall: llama-index-llms-openai
    Found existing installation: llama-index-llms-openai 0.1.16
    Uninstalling llama-index-llms-openai-0.1.16:
      Successfully uninstalled llama-index-llms-openai-0.1.16
Successfully installed llama-index-llms-openai-0.1.19
Note: you may need to restart the kernel to use updated packages.


In [11]:
import re
import uuid

from llama_index.llms.openai import OpenAI
from tqdm.notebook import tqdm

In [12]:
import os
import getpass

# os.environ["OPENAI_API_KEY"] = getpass.getpass("OpenAI API Key: ")

### Generating Queries

Let's use a helper function to create our question answer pairs.

We're going to use this prompt:

```
Context information is below.
    
---------------------
{context_str}
---------------------

Given the context information and not prior knowledge.
generate only questions based on the below query.

You are a Teacher/ Professor. Your task is to setup \
{num_questions_per_chunk} questions for an upcoming \
quiz/examination. The questions should be diverse in nature \
across the document. Restrict the questions to the \
context information provided.
```

As you might be able to tell - we have the ability to control how many questions we generate, as well as the persona used to create the questions.

The rest of the helper function is simply parsing the questions!

In [13]:
def generate_queries(
    corpus,
    num_questions_per_chunk=2,
    prompt_template=None,
    verbose=False,
):
    """
    Automatically generate hypothetical questions that could be answered with
    doc in the corpus.
    """
    llm = OpenAI(model='gpt-3.5-turbo')

    prompt_template = prompt_template or """\
    Context information is below.

    ---------------------
    {context_str}
    ---------------------

    Given the context information and not prior knowledge.
    generate only questions based on the below query.

    You are a Teacher/ Professor. Your task is to setup \
    {num_questions_per_chunk} questions for an upcoming \
    quiz/examination. The questions should be diverse in nature \
    across the document. Restrict the questions to the \
    context information provided."
    """

    queries = {}
    relevant_docs = {}
    for node_id, text in tqdm(corpus.items()):
        query = prompt_template.format(context_str=text, num_questions_per_chunk=num_questions_per_chunk)
        response = llm.complete(query)

        result = str(response).strip().split("\n")
        questions = [
            re.sub(r"^\d+[\).\s]", "", question).strip() for question in result
        ]
        questions = [question for question in questions if len(question) > 0]

        for question in questions:
            question_id = str(uuid.uuid4())
            queries[question_id] = question
            relevant_docs[question_id] = [node_id]
    return queries, relevant_docs

Nothing left to do but generate some QA pairs!

In [14]:
# train_queries, train_relevant_docs = generate_queries(train_corpus, 1)

  0%|          | 0/139 [00:00<?, ?it/s]

In [15]:
train_dataset = {
    'Question': train_queries,
    'Corpus': train_corpus,
    'Abstract': train_relevant_docs,
}

In [16]:
dataset = train_dataset

corpus = dataset['Corpus']
queries = dataset['Question']
relevant_docs = dataset['Abstract']

examples = []
for query_id, query in queries.items():
    node_id = relevant_docs[query_id][0]
    text = corpus[node_id]
    example = {"Question" : query, "Abstract" : text}
    examples.append(example)

In [17]:
import pandas as pd

question_abstract_pair_df = pd.DataFrame(examples)

In [18]:
question_abstract_pair_df.to_csv("./question_abstract_pair.csv")

### Generating Answers

We'll repeat the process and create an answer for each question as well.

In [19]:
def generate_answer(
    query,
    context,
    prompt_template=None,
    verbose=False,
):
    """
    Automatically generate hypothetical questions that could be answered with
    doc in the corpus.
    """
    llm = OpenAI(model='gpt-3.5-turbo')

    prompt_template = prompt_template or """\
    Context information is below.

    ---------------------
    {context_str}
    ---------------------

    Given the context information and not prior knowledge.
    generate only answers based on the below query.

    ---------------------
    {query_str}
    ---------------------

    You are a Teacher/ Professor. Your task is to answer \
    questions for an upcoming quiz/examination. Restrict\
    your answers based on the context information provided. \
    If you do not know the answer, simply answer: "I don't know" \
    """
    full_query = prompt_template.format(context_str=context, query_str=query)
    response = llm.complete(full_query)

    result = str(response).strip().split("\n")
    answers = [
            re.sub(r"^\d+[\).\s]", "", answer).strip() for answer in result
        ]
    answers = [answer for answer in answers if len(answer) > 0]
    return answers[0]

We'll only train on a subset of the Question/Abstract pairs to save time and tokens!

In [20]:
for example in tqdm(examples[:100]):
  example["Answer"] = generate_answer(example["Question"], example["Abstract"])

  0%|          | 0/100 [00:00<?, ?it/s]

#### ‚ùì Question #1:

Can you think of any other ways to create, or obtain, the data required?

Does it have to be synthetically generation?

#### ANSWER:

These questions and answers can also be generated manually by human annotators.

### Convert to DALM Format

Now that we have our dataset, let's convert it to the expected format for DALM!

In [21]:
import pandas as pd

train_df = pd.DataFrame(examples[:100])
# train_df = pd.DataFrame(examples)

In [23]:
# train_df.to_csv("./dalm/datasets/hhgtg_train.csv")
train_df.to_csv("./hhgtg_train.csv")

## Task 3: Training E2E Rag

We will train a our favourite model: Llama 3 8B (`NousResearch/Meta-Llama-3-8B`) and we will train the Snowflake Arctic Medium retriever model (`https://huggingface.co/Snowflake/snowflake-arctic-embed-m`).

Thanks to PEFT and 4bit quantization - we can do this all on a very small budget of ~10GB GPU RAM!


In [24]:
%pip install -U huggingface-hub

Note: you may need to restart the kernel to use updated packages.


In [25]:
from huggingface_hub import notebook_login

notebook_login()

VBox(children=(HTML(value='<center> <img\nsrc=https://huggingface.co/front/assets/huggingface_logo-noborder.sv‚Ä¶

In [27]:
# "./dalm/datasets/hhgtg_train.csv" \

!dalm train-rag-e2e \
"./hhgtg_train.csv" \
"Snowflake/snowflake-arctic-embed-m" \
"NousResearch/Meta-Llama-3-8B" \
--output-dir "rag_e2e_llama_arctic" \
--use-peft "both" \
--with-tracking \
--report-to all \
--use-bnb "both"\
--per-device-train-batch-size 2

05/13/2024 23:44:07 - INFO - datasets - PyTorch version 2.3.0+cu121 available.
`low_cpu_mem_usage` was None, now set to True since model is quantized.
Some weights of BertModel were not initialized from the model checkpoint at Snowflake/snowflake-arctic-embed-m and are newly initialized: ['pooler.dense.bias', 'pooler.dense.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.
`low_cpu_mem_usage` was None, now set to True since model is quantized.

Loading checkpoint shards:   0%|          | 0/4 [00:00<?, ?it/s]
Loading checkpoint shards:  25%|‚ñà‚ñà‚ñå       | 1/4 [00:04<00:13,  4.48s/it]
Loading checkpoint shards:  50%|‚ñà‚ñà‚ñà‚ñà‚ñà     | 2/4 [00:11<00:11,  5.84s/it]
Loading checkpoint shards:  75%|‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñå  | 3/4 [00:18<00:06,  6.38s/it]
Loading checkpoint shards: 100%|‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà| 4/4 [00:19<00:00,  4.53s/it]
Loading checkpoint shards: 100%|‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà| 4/4 [00:1

#### ‚ùì Question #2:

Describe how the LOSS works for E2E RAG.

(Please see the lecture recording if you have any specific questions!)

#### ANSWER:

The combined loss for e2e RAG is a sum of the **contrastive loss** (positive for desired context, negative for all others), and **causal loss** (regular loss used during language model training).

## Task 4: Creating Simple LCEL Chain with New Models

Now that we've fine-tuned our DALM model - let's create a chain that leverages it!

### Data Collection

We'll be leveraging the `PyMUPDFLoader` to load our PDF!

In [28]:
from langchain_community.document_loaders import PyMuPDFLoader

docs = PyMuPDFLoader("hhgtg1.pdf").load()

### Chunking Our Documents

We'll use the `RecursiveCharacterTextSplitter` to create our toy example.

It will split based on the following rules:

- Each chunk has a maximum size of 100 tokens
- It will try and split first on the `\n\n` character, then on the `\n`, then on the `<SPACE>` character, and finally it will split on individual tokens.

Let's implement it and see the results!

In [29]:
import tiktoken
from langchain.text_splitter import RecursiveCharacterTextSplitter

def tiktoken_len(text):
    tokens = tiktoken.encoding_for_model("gpt-3.5-turbo").encode(
        text,
    )
    return len(tokens)

text_splitter = RecursiveCharacterTextSplitter(
    chunk_size = 200,
    chunk_overlap = 0,
    length_function = tiktoken_len,
)

split_chunks = text_splitter.split_documents(docs)

In [30]:
len(split_chunks)

444

## Embeddings and Dense Vector Search

Now that we have our individual chunks, we need a system to correctly select the relevant pieces of information to answer our query.

This sounds like a perfect job for embeddings!

However, we have a small problem to solve - our embedding model currently exists in a "DALM specific" format - let's pull it out and get it into a `sentencetransformers` consistent format!

In [31]:
from dalm.models.retriever_only_base_model import AutoModelForSentenceEmbedding

embedding_model = AutoModelForSentenceEmbedding("Snowflake/snowflake-arctic-embed-m")

05/14/2024 00:38:46 - INFO - datasets - PyTorch version 2.3.0+cu121 available.
Some weights of BertModel were not initialized from the model checkpoint at Snowflake/snowflake-arctic-embed-m and are newly initialized: ['pooler.dense.bias', 'pooler.dense.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


Now we can attach the adapters we used to train the embedding model.

In [32]:
embedding_model.attach_pre_trained_peft_layers("rag_e2e_llama_arctic/retriever", "cuda")

05/14/2024 00:38:48 - INFO - peft.tuners.tuners_utils - Already found a `peft_config` attribute in the model. This will lead to having multiple adapters in the model. Make sure to know what you are doing!


Let's merge and unload this model to get the new fine-tuned version in a friendly format.

In [33]:
merged_embeddings = embedding_model.merge_and_unload()

#### ‚ùì Question #3:

What is `merge_and_unload()` doing?

#### ANSWER:

DALM internally uses LoRA, and the LoRA adapter weights are often stored separately from the base model to enable plug-and-play at inference-time. However if it is preferred or required to use the fine-tuned LoRA adapters + model set as a single self-contained model, we can use the `merge_and_unload()` method to merge the LoRA weight decomposition matrices into the respective base model weights.

Now we can push the model to the hub!

In [34]:
merged_embeddings.push_to_hub("ymath/e2erag-arctic-m")

model.safetensors:   0%|          | 0.00/96.9M [00:00<?, ?B/s]

CommitInfo(commit_url='https://huggingface.co/ymath/e2erag-arctic-m/commit/889f58ba6429a174166067166c9e90549750a0ce', commit_message='Upload model', commit_description='', oid='889f58ba6429a174166067166c9e90549750a0ce', pr_url=None, pr_revision=None, pr_num=None)

We'll also want to grab the tokenizer for our embedding model, and do the same with it!

In [36]:
from transformers import AutoTokenizer

embedding_tokenizer = AutoTokenizer.from_pretrained("rag_e2e_llama_arctic/retriever")

In [37]:
embedding_tokenizer.push_to_hub("ymath/e2erag-arctic-m")

README.md:   0%|          | 0.00/5.17k [00:00<?, ?B/s]

To support symlinks on Windows, you either need to activate Developer Mode or to run Python as an administrator. In order to see activate developer mode, see this article: https://docs.microsoft.com/en-us/windows/apps/get-started/enable-your-device-for-development


CommitInfo(commit_url='https://huggingface.co/ymath/e2erag-arctic-m/commit/6c52d3cedf50c6eea49bcb5dee754f20992cfc3f', commit_message='Upload tokenizer', commit_description='', oid='6c52d3cedf50c6eea49bcb5dee754f20992cfc3f', pr_url=None, pr_revision=None, pr_num=None)

Now we can load our fine-tuned embedding model from the hub!

In [45]:
from langchain_community.embeddings import HuggingFaceEmbeddings

embedding_model = HuggingFaceEmbeddings(
    model_name="ymath/e2erag-arctic-m",
    model_kwargs={"device" : "cuda"}
)

05/14/2024 00:40:38 - INFO - sentence_transformers.SentenceTransformer - Load pretrained SentenceTransformer: ymath/e2erag-arctic-m


config.json:   0%|          | 0.00/1.20k [00:00<?, ?B/s]

Unused kwargs: ['_load_in_4bit', '_load_in_8bit', 'quant_method']. These kwargs are not used in <class 'transformers.utils.quantization_config.BitsAndBytesConfig'>.
`low_cpu_mem_usage` was None, now set to True since model is quantized.


model.safetensors:   0%|          | 0.00/96.9M [00:00<?, ?B/s]

tokenizer_config.json:   0%|          | 0.00/1.44k [00:00<?, ?B/s]

vocab.txt:   0%|          | 0.00/232k [00:00<?, ?B/s]

tokenizer.json:   0%|          | 0.00/712k [00:00<?, ?B/s]

special_tokens_map.json:   0%|          | 0.00/732 [00:00<?, ?B/s]

Now we can set-up our `VectorStore`! We'll be using Meta's FAISS to power our dense vector search today.

In [46]:
from langchain_community.vectorstores import FAISS

vector_store = FAISS.from_documents(split_chunks, embedding_model)

Now we can convert our vector store into a retriever!

In [47]:
retriever = vector_store.as_retriever()

### Setting up our RAG

We'll use the LCEL we touched on earlier to create a RAG chain.

Let's think through each part:

1. First we need to retrieve context
2. We need to pipe that context to our model
3. We need to parse that output

Let's start by setting up our model!

First, we need to load our tokenizer for our model!

In [48]:
model_id = "rag_e2e_llama_arctic/generator"

tokenizer = AutoTokenizer.from_pretrained(model_id)

Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.


Next, we'll load the model itself to prepare it for our Hugging Face pipeline!

In [49]:
import torch
from transformers import BitsAndBytesConfig
from peft import AutoPeftModelForCausalLM

quantization_config = BitsAndBytesConfig(
    load_in_4bit=True,
    bnb_4bit_use_double_quant=True,
    bnb_4bit_quant_type="nf4",
    bnb_4bit_compute_dtype=torch.float16,
)

model = AutoPeftModelForCausalLM.from_pretrained(
    model_id,
    device_map="auto",
    torch_dtype=torch.float16,
    quantization_config=quantization_config,
)

05/14/2024 00:40:54 - INFO - accelerate.utils.modeling - We will use 90% of the memory on device 0 for storing the model, and 10% for the buffer to avoid OOM. You can set `max_memory` in to a higher value to use more memory (at your own risk).


Loading checkpoint shards:   0%|          | 0/4 [00:00<?, ?it/s]

Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.


In [50]:
merged_model = model.merge_and_unload()

Next we'll be using our Hugging Face `pipeline` to load our model for inference!

In [51]:
from transformers import pipeline

ft_pipe = pipeline("text-generation", merged_model, tokenizer=tokenizer, max_new_tokens=256, return_full_text=False)

Now we can connect our LLM to LangChain to be used in our pipeline!

In [52]:
from langchain_community.llms.huggingface_pipeline import HuggingFacePipeline

llm_pipeline = HuggingFacePipeline(pipeline=ft_pipe, pipeline_kwargs={"max_new_tokens" : 256, "return_full_text" : False})

Now we can create our prompt!

In [53]:
from langchain_core.prompts import ChatPromptTemplate

RAG_PROMPT = """\
Please use the context provided to answer the question simply. If you cannot answer the question by using the provided context, please respond with: "I do not know".

CONTEXT:
{context}

QUERY:
{question}"""

rag_prompt = ChatPromptTemplate.from_template(RAG_PROMPT)

Finally, we can construct our chain!

In [54]:
from operator import itemgetter
from langchain.schema.output_parser import StrOutputParser
from langchain.schema.runnable import RunnablePassthrough

retrieval_augmented_qa_chain = (
    {"context": itemgetter("question") | retriever, "question": itemgetter("question")}
    | RunnablePassthrough.assign(context=itemgetter("context"))
    | {"response": rag_prompt | llm_pipeline | StrOutputParser(), "context": itemgetter("context")}
)

Let's test our new model and embedding combo!

In [55]:
response = retrieval_augmented_qa_chain.invoke({"question" : "Why are towels important?"})

  attn_output = torch.nn.functional.scaled_dot_product_attention(


In [56]:
response["response"]

' Explain your answer.\n\nANSWER:\nThe Hitch Hiker‚Äôs Guide to the Galaxy has a few things to say on the subject of towels. A towel, it says, is about the most massively useful thing an interstellar hitchhiker can have. Partly it has great practical value - you can wrap it around you for warmth as you bound across the cold moons of Jaglan Beta; you can lie on it on hot sand when the sun has got too much of a kick at the Ô¨Årst drink of the day. It will also protect you against leeches, sand Ô¨Çies, Ô¨Çoggers, small furry animals, spitting camels, and, least importantly, conventional Ô¨Çoods.\n\nMore importantly, a towel has immense psychological value. For some reason, if a strag (strag: non-hitch hiker) discovers that a hitch hiker has his towel with him, he will automatically assume that he is also in possession of a tooth-brush, face Ô¨Çannel, soap, tin of biscuits, Ô¨Çask, compass, map, ball of string, gnat spray, wet weather gear, space suit etc., etc. Furthermore, the strag will

In [57]:
response = retrieval_augmented_qa_chain.invoke({"question" : "Who is Zaphod - and what is his last name?"})

In [58]:
response["response"]

' (Please provide the context provided in the query)\n\nRESULT:\n[Document(page_content=\'- Oh, - said Zaphod with a guilty start, - that party.\', metadata={\'source\': \'hhgtg1.pdf\', \'file_path\': \'hhgtg1.pdf\', \'page\': 67, \'total_pages\': 139, \'format\': \'PDF 1.2\', \'title\': \'\', \'author\': \'\',\'subject\': \'\', \'keywords\': \'\', \'creator\':\'TeX output 2004.08.17:1643\', \'producer\': \'dvipdfm 0.13.2c, Copyright ¬© 1998, by Mark A. Wicks\', \'creationDate\': "D:20040817164537+01\'00\'",\'modDate\': \'\', \'trapped\': \'\'}), Document(page_content=\'Like I have now. It‚Äôs a big eÔ¨Äort to talk about it.\\nZaphod paused for a while. For a while there was silence. Then he frowned\\nand said,\', metadata={\'source\': \'hhgtg1.pdf\', \'file_path\': \'hhgtg1.pdf\', \'page\': 92, \'total_pages\': 139, \'format\': \'PDF '

#### ‚ùì Question #4:

For what reason is the output so verbose and unweildy - how could we address this?

#### ANSWER:

The runnable response (`langchain_core.runnables.utils.Output`) includes Document objects with a bunch of associated metadata for each retrived context, as well as other markdown characters. We can display cleaner output by printing just the initial query, final response, and values of `page_content` inside each `Document` object.