# RAG with langchain

The purpose of this exercise is to show how using a tool like LangChain is different from the bare-bones implementation of the previous tutorial and the potential benefits of such a tool.

## Prerequisites

For this notebook we need two more Python modules.

In [None]:
!pip -q install langchain langchain_community

## Baseline

Before getting into RAG with LangChain, let's again establish a baseline by querying our LLM without using RAG. Note that LangChain is essentially a programming environment that wraps all of our other tools, for good or bad. So, instead of importing the client API using `from ollama import Client` as in the previous part, we get a module from LangChain's community…

In [None]:
from langchain_community.llms import Ollama

ollama_host = 'http://10.129.20.4:9090'
ollama_model = 'llama3:70b'

# Simple chain invocation
llm = Ollama(model=ollama_model, base_url=ollama_host)

In [None]:
from langchain_core.prompts import ChatPromptTemplate

prompt = ChatPromptTemplate.from_messages(
    [
        (
            "system",
            """You are an AI assistant. Your task is to understand the user question, and provide an answer.

            Your answers are short, to the point, and written by an domain expert.
            If you don't know the answer, simply state, "I don't know"
            """
        ),
        ("human", "{user_input}"),
    ]
)

Here comes the part from which LangChain takes its name: parts of the application are "chained" together using a syntax reminding of UNIX pipes.

In [None]:
chain = prompt | llm

query = "What is special about HackerNews?"
response = chain.invoke({"user_input": query})
print(response)

# Verify that the answer is "I don't know"
# query = "What do you know about mr. Mjptkck?"
# response = chain.invoke({"user_input": query})
# print(response)

Most tutorials add an output parser at the end of the chain, but in this case it is simply a passthrough, adding nothing of value.

In [None]:
from langchain_core.output_parsers import StrOutputParser

output = StrOutputParser() # Basically a NOP is this example

chain = prompt | llm | output

query = "What is special about HackerNews?"
response = chain.invoke({"user_input": query})
print(response)

## Chunking

For the sake of comparison, we will use the plain python code example from the previous part to split the data into senteces (chunks).

In [None]:
#
# Split the input data into sentence-sized chunks
#
import re
import json

chunks = []
index = 0

filenames = ["newsfaq.json", "newsguidelines.json", "security.json", "legal.json"]
# Iterate over the entries in data/ and read each JSON file in turn
for filename in filenames:
    filepath = f"./data/{filename}"
    with open(filepath) as fd:
        data = json.load(fd)

    url = data['url']
    text = data['content']
    # Split the file's text contents into sentences using python regex:
    #   A sequence of characters is deemed a sentence if followed by a
    #   full stop (.), question mark (?), or an exclamation mark (!)
    #   immediately followed by one or more whitespaces.
    sentences = re.split(r"(?<=\.|\?|!)\s+", text)
    # Each sentence make up a chunk, store it with references (url and id)
    for sentence in sentences:
        chunks.append({'id': index, 'text': sentence, 'url': url})
        index += 1

# Write the resulting array to file:
with open('chunks.json', 'w') as fd:
    json.dump(chunks, fd)

In [None]:
# Just a sanity check, it should be ~570 chunks
len(chunks)

In [None]:
# Gather the sentences from our chunks
sentences = [chunk['text'] for chunk in chunks]

## Embedding and retrieval

In [None]:
# Create a named _collection_ making up our corner of the database (it is a shared resource)
collection_name =
# collection_name = "my_hackernews_250101"

This is where LangChain IMHO gets a wee bit ugly as we must load yet another
module tied to the implementation (_leaky abstraction_) and we must install `langchain-huggingface`.

In [None]:
!pip -q install langchain-huggingface

Then we can continue to create the embeddings using the following code:

In [None]:
from langchain_huggingface import HuggingFaceEmbeddings

embeddings_model_name = 'sentence-transformers/all-mpnet-base-v2'
embeddings_model = HuggingFaceEmbeddings(show_progress=False) # Change to True for visual feedback
embeddings = embeddings_model.embed_documents(sentences)

print(len(embeddings), len(embeddings[0]))

Ignore warnings about `tqdm` etc., nothing to do about it...

The "unit" that LangChain is working with is `Document`, so we'll first have to wrap all chunks/sentences in `Document`s:

In [None]:
from langchain_core.documents import Document

documents = [Document(page_content=sentence) for sentence in sentences]

Again, we'll have to bite the bullet and install LangcChain modules specific to an external service type (Qdrant database), another example of leaky abstractions:

In [None]:
!pip -q install langchain-qdrant

Then we can continue to create the embeddings using the following code:

LangcChain will attach a default id to each document as it is uploaded to qdrant, but we'll be providing integer ids (index of the sentence) to prevent that from happening.

In [None]:
from langchain_qdrant import QdrantVectorStore

# https://python.langchain.com/v0.2/docs/integrations/vectorstores/qdrant/
# https://api.python.langchain.com/en/latest/qdrant/langchain_qdrant.qdrant.QdrantVectorStore.html
vector_store = QdrantVectorStore.from_documents(
    documents,
    embeddings_model,
    url="http://10.129.20.4:6333",
    distance='Euclid',
    collection_name=collection_name,
    ids=list(range(len(documents))),
    force_recreate=True
)

There are two things to note here, besides passing the documents (embeddings) and the requested ids, which is:

1. Setting `force_recreate=True`, which is simply a convenience in a toy example like this, and
2. using the string `'Euclid'` to define the distance metric used,

The second point deserves some explanation:
Using LangChain's own `EmbeddingDistance.EUCLIDEAN` (`from langchain.evaluation import EmbeddingDistance`) results in an error as it evaluates to the string `'euclidean'`, and the officially recommended solution is to import Qdrant's own definition using `from qdrant_client.models import Distance` and use `Distance.EUCLID` which evalutes to `'Euclid'`. Now, that could be the most blatant example of a _leaky abstraction_ that I've ever seen, and the world is a much better place if that wart is ignored and the literal `'Euclid'` is used instead.

<!--
# https://api.python.langchain.com/en/latest/qdrant/langchain_qdrant.qdrant.QdrantVectorStore.html#langchain_qdrant.qdrant.QdrantVectorStore
# from langchain.evaluation import EmbeddingDistance
# from qdrant_client.models import Distance, VectorParams # LEAKY ABSTRACTION (from official docs)

# ValidationError: 1 validation error for VectorParams
# distance
#   Input should be 'Cosine', 'Euclid', 'Dot' or 'Manhattan' [type=enum, input_value=<EmbeddingDistance.EUCLIDEAN: 'euclidean'>, input_type=EmbeddingDistance]
#     For further information visit https://errors.pydantic.dev/2.8/v/enum
#
-->

<div style="background-color:lightblue; padding:5px">

**Sidenote**: Using an existing collection

To use an instance of `langchain_qdrant.Qdrant` on an _existing_ collection without loading any new documents or texts, you can use the `Qdrant.from_existing_collection()` method.
</div>

Anyhow, now we can retrive chunks from the database using ids:

In [None]:
vector_store.get_by_ids([5,6])

Let's do a quick sanity check like before:

In [None]:
query = sentences[5]
print(query)
embedded_query = embeddings_model.embed_query(query)

In [None]:
results = vector_store.similarity_search_with_score(
    query=query, k=2
)
for doc, score in results:
    print(f"* [SIM={score:3f}] {doc.page_content} [{doc.metadata}]")

In [None]:
import math
math.sqrt(sum(((x-y)*(x-y) for x,y in zip(embedded_query, embedded_query))))

With some luck, the result of the distance computation should be 0, just as reported by the `similarity_search_with_score` method for the first result. Now, let's do the same for the second result:

In [None]:
embedded_id6 = embeddings_model.embed_query(sentence[6])
math.sqrt(sum(((x-y)*(x-y) for x,y in zip(embedded_query, embedded_id6)))))

If your computation agrees with the score returned by _LangChain_ congratulations. As of the time of writing the figure reported back by `similarity_search_with_score` is neither the euclidean distance nor its squared value (which makes sense from a numerical perspective).
It is hard to figure out what quantity LangChain returns here, but it is not consistent across backends, see <https://github.com/langchain-ai/langchain/issues/4517>. We can just hope that is consistent with the L2-distance (spoiler: it is not. Pick e.g. the 100th returned result, which has a smaller L2-distance then the 3rd result).

Ah, well, let's get on with our lives…

In order to chain together the vector database with the LLM we need to configure a `retriever` object rather than just a database client (see e.g. <https://python.langchain.com/docs/how_to/vectorstore_retriever/>).

The search_type is set to `"similarity"` (the default) and the number of requested hits are stated in a (kludgy) search kwargs blindly passed down to the underlying database client:

In [None]:
retriever = vector_store.as_retriever(search_type="similarity", search_kwargs={"k": 2})
retrieved_docs = retriever.invoke(query)
retrieved_docs

## LangChain RAG

Putting it together in a complete RAG example:

In [None]:
from langchain.chains import create_retrieval_chain
from langchain.chains.combine_documents import create_stuff_documents_chain
from langchain_core.prompts import ChatPromptTemplate

retriever = vector_store.as_retriever(search_kwargs={"k": 10})

prompt = ChatPromptTemplate.from_messages(
    [
        (
            "system",
            """You are an AI assistant. Your task is to understand the user question, and provide an answer.

            Your answers are short, to the point, and written by an domain expert.
            If you don't know the answer, simply state, "I don't know".

            Use the following pieces of retrieved context to answer the question.


            {context}
            """
        ),
        ("human", "{input}"),
    ]
)

question_answer_chain = create_stuff_documents_chain(llm, prompt)
rag_chain = create_retrieval_chain(retriever, question_answer_chain)

Now we can pose a question to the chain:

In [None]:
query = 'What is special about HackerNews?'
result = rag_chain.invoke({"input": query})

The result returned has three parts; the query, the retrieved context, and the answer from the LLM:

In [None]:
result.keys()

In [None]:
result['input']

In [None]:
result['context']

In [None]:
result['answer']

Now, the above example does not live up to the LangChain name, so we could rewrite it in a "LangChain-y" style (see <https://python.langchain.com/docs/tutorials/rag/>).
NB. Reuse of `retriever`, `prompt`, and `llm` from above.

In [None]:
from langchain_core.output_parsers import StrOutputParser
from langchain_core.runnables import RunnablePassthrough

def format_context(docs):
    return "\n\n".join(doc.page_content for doc in docs)

rag_chain = (
    {"context": retriever | format_context, "input": RunnablePassthrough()}
    | prompt
    | llm
    | StrOutputParser() # Can be left out in this example
)

answer = rag_chain.invoke(query)

In [None]:
answer

## Conclusions

So, what benefits does LangChain bring to the table? I'm not sure. For an example such as this, I would say that it is of no use. Problem is that it is not difficult to find critical comments on its usefulness in large scale deployments (see e.g. [why we no longer use LangChain for building our AI agents](https://www.octomind.dev/blog/why-we-no-longer-use-langchain-for-building-our-ai-agents#:~:text=The%20problem%20with%20LangChain's%20abstractions,understand%20and%20frustrating%20to%20maintain.)). Maybe there is some middle ground where it is useful, I don't know. Probably depends on the use case. YMMV.

At least I would not recommend starting out a new project based on LangChain from day one, but rather migrating to it after getting to understand the problem and making a proper cost/benefit analysis.