# 🦙 RAG for SPARQL queries

Demo of **Retrieval Augmented Generation** (RAG) to help writing SPARQL queries using examples published to a SAPRQL endpoint using SHACL ontology, with conversation memory, running locally, using only open source components:
* [LangChain](https://python.langchain.com) (cf. docs: [RAG with memory](https://python.langchain.com/docs/expression_language/cookbook/retrieval), [streaming RAG](https://python.langchain.com/docs/use_cases/question_answering/streaming))
* [FastEmbed embeddings](https://github.com/qdrant/fastembed)
* [Qdrant vectorstore](https://github.com/qdrant/qdrant)
* [Ollama inference library](https://www.ollama.com)
* [Llama3 8B LLM](https://llama.meta.com/llama3/) or [Mixtral LLM](https://mistral.ai/news/mixtral-of-experts/)

This demo runs locally on CPU and GPU, but will be considerably slow on CPU (a few minutes to answer the question).

You can easily change the different components used in this workflow to use whatever you prefer thanks to LangChain: 
* LLM (e.g. switch to [ChatGPT](https://python.langchain.com/docs/integrations/llms/openai), Claude)
* Vectorstore (e.g. switch to [FAISS](https://python.langchain.com/docs/integrations/vectorstores/faiss), [Chroma](https://python.langchain.com/docs/integrations/vectorstores/chroma), Milvus)
* Embedding model (e.g. switch to [HuggingFace sentence transformer](https://python.langchain.com/docs/integrations/text_embedding/sentence_transformers), OpenAI ADA)

## 📦️ Install and import dependencies

⚠️ Install `ollama`: https://ollama.com/download

And pull the model you will use:

```bash
ollama pull llama3
```

> You can also try with `mixtral:8x22b`, just think to change the call to `ChatOllama` in the code below

In [1]:
import sys
!{sys.executable} -m pip install --quiet langchain langchain-community llama-cpp-python langchain-qdrant fastembed langchain-openai

from operator import itemgetter
from typing import Any

from langchain.globals import set_debug
from langchain.memory import ConversationBufferMemory
from langchain.prompts.prompt import PromptTemplate
from langchain.schema import format_document
from langchain_community.chat_models import ChatOllama
from langchain_community.vectorstores import Qdrant
from langchain_community.embeddings.fastembed import FastEmbedEmbeddings
from langchain_core.output_parsers import StrOutputParser
from langchain_core.prompts import ChatPromptTemplate
from langchain_core.messages import get_buffer_string
from langchain_core.runnables import RunnableLambda, RunnablePassthrough
from langchain_openai import ChatOpenAI
from langchain_rdf import SparqlExamplesLoader

## 🌀 Initialize local vectorstore and LLM

```
flag_embeddings_size = 384
```

In [2]:
embedding_model = FastEmbedEmbeddings(model_name="BAAI/bge-small-en-v1.5", max_length=512)
loader = SparqlExamplesLoader("https://sparql.uniprot.org/sparql/")
docs = loader.load()

# Split the documents into chunks if necessary
vectorstore = Qdrant.from_documents(
    docs,
    embedding_model,
    collection_name="ontologies",
    location=":memory:",
    # path="./data/qdrant",
    # Run Qdrant as a service for production use:
    # url="http://localhost:6333",
    # prefer_grpc=True,
)
# vectorstore = FAISS.from_documents(documents=docs, embedding=flag_embeddings)
# K is the number of source documents retrieved
retriever = vectorstore.as_retriever(search_kwargs={"k": 5})

llm = ChatOllama(model="llama3")
# llm = ChatOpenAI(model="gpt-4o", temperature=0, max_tokens=None, timeout=None, max_retries=2)

Fetching 5 files:   0%|          | 0/5 [00:00<?, ?it/s]

## 🧠 Initialize prompts and memory

In [3]:
# Create the memory object that is used to add messages
memory = ConversationBufferMemory(
    return_messages=True, output_key="answer", input_key="question"
)
# Add a "memory" key to the input object
loaded_memory = RunnablePassthrough.assign(
    chat_history=RunnableLambda(memory.load_memory_variables) | itemgetter("history"),
)

# Prompt to reformulate the question using the chat history
reform_template = """Given the following chat history and a follow up question,
rephrase the follow up question to be a standalone straightforward question, in its original language.
Do not answer the question! Just rephrase reusing informations from the chat history.
Make it one short sentence short and straight to the point.

Chat History:
{chat_history}
Follow Up Input: {question}
Standalone question:"""
REFORM_QUESTION_PROMPT = ChatPromptTemplate.from_template(reform_template)

# Prompt to ask to answer the reformulated question
answer_template = """Briefly answer the question reusing the following context:
{context}

Question: {question}
"""
ANSWER_PROMPT = ChatPromptTemplate.from_template(answer_template)

# Format how the ontology concepts are passed as context to the LLM
DEFAULT_DOCUMENT_PROMPT = PromptTemplate.from_template(
    template="User question: {page_content}\n```sparql\n# {endpoint_url}\n{query}```"
)

def _combine_documents(
    docs, document_prompt=DEFAULT_DOCUMENT_PROMPT, document_separator="\n\n"
):
    doc_strings = [format_document(doc, document_prompt) for doc in docs]
    # print("Formatted docs:", doc_strings)
    return document_separator.join(doc_strings)


## ⛓️ Define the chain

`itemgetter()` is used to retrieve objects defined in the previous step in the chain.

In [4]:
# Reformulate the question using chat history
reformulated_question = {
    "reformulated_question": {
        "question": lambda x: x["question"],
        "chat_history": lambda x: get_buffer_string(x["chat_history"]),
    }
    | REFORM_QUESTION_PROMPT
    | llm
    | StrOutputParser(),
}
# Retrieve the documents using the reformulated question
retrieved_documents = {
    "docs": itemgetter("reformulated_question") | retriever,
    "question": lambda x: print("💭 Reformulated question:", x["reformulated_question"]) or x["reformulated_question"],
    # "question": lambda x: x["reformulated_question"],
}
# Construct the inputs for the final prompt using retrieved documents
final_inputs = {
    "context": lambda x: _combine_documents(x["docs"]),
    "question": itemgetter("question"),
}
# Generate the answer using the retrieved documents and answer prompt
answer = {
    "answer": final_inputs | ANSWER_PROMPT | llm,
    "docs": itemgetter("docs"),
}
# Put the chain together
final_chain = loaded_memory | reformulated_question | retrieved_documents | answer

def stream_chain(final_chain, memory: ConversationBufferMemory, inputs: dict[str, str]) -> dict[str, Any]:
    """Ask question, stream the answer output, and return the answer with source documents."""
    output = {"answer": ""}
    for chunk in final_chain.stream(inputs):
        # print(chunk)
        if "docs" in chunk:
            output["docs"] = [doc.dict() for doc in chunk["docs"]]
            print("📚 Documents retrieved:")
            for doc in output["docs"]:
                print(f"· {doc['page_content']}") # ({doc['metadata']['query']})
            # print(json.dumps(output["docs"], indent=2))
        if "answer" in chunk:
            output["answer"] += chunk["answer"].content
            print(chunk["answer"].content, end="", flush=True)
    # Add message to chat history
    memory.save_context(inputs, {"answer": output["answer"]})
    return output

## 🗨️ Ask questions

In [5]:
# set_debug(True)   # Uncomment to enable detailed LangChain debugging
output = stream_chain(final_chain, memory, {
    "question": "How can I retrieve the HGNC symbol for a protein?"
})

💭 Reformulated question: What is the best way to obtain the Human Genome Organization (HGNC) symbol for a specific protein?
📚 Documents retrieved:
· Map UniProt to HGNC identifiers and Symbols
· Find any uniprot entry, or an uniprot entries domain or component which has a name 'HLA class I histocompatibility antigen, B-73 alpha chain'
· For the human entry P05067 (Amyloid-beta precursor protein) find the gene start ends in WikiData
· Find any uniprot entry which has a name 'HLA class I histocompatibility antigen, B-73 alpha chain'
· Construct new triples of the type 'HumanProtein' from all human UniProt entries
Based on the provided context, you can map UniProt IDs to HGNC identifiers and symbols using the following SPARQL query:

```sparql
PREFIX up: <http://purl.uniprot.org/core/>
PREFIX uniprotkb: <http://purl.uniprot.org/uniprot/>
PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>
SELECT ?uniprot ?hgnc ?hgncSymbol
WHERE {
  VALUES (?acc) {('P05067') ('P00750')}
  BIND(iri(concat(

In [6]:
# set_debug(True)   # Uncomment to enable detailed LangChain debugging
output = stream_chain(final_chain, memory, {
    "question": "Could you write the complete query for protein P68871?"
})

💭 Reformulated question: What is the SPARQL query to retrieve the HGNC symbol for the protein with UniProt ID P68871?
📚 Documents retrieved:
· Map UniProt to HGNC identifiers and Symbols
· Find the orthologous proteins for UniProtKB entry P05067 using the <a href="http://www.orthod.org">OrthoDB database</a>
· Select all human UniProt entries with a sequence variant that leads to a tyrosine to phenylalanine substitution
· Find any uniprot entry, or an uniprot entries domain or component which has a name 'HLA class I histocompatibility antigen, B-73 alpha chain'
· Find the similar proteins for UniProtKB entry P05067 sorted by UniRef cluster identity
You can use the first SPARQL query provided:

```sparql
# https://sparql.uniprot.org/sparql/
PREFIX up: <http://purl.uniprot.org/core/>
PREFIX uniprotkb: <http://purl.uniprot.org/uniprot/>
PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>
SELECT
  ?uniprot
  ?hgnc
  ?hgncSymbol
WHERE
{
  # A space separated list of UniProt primary accessio