<a href="https://colab.research.google.com/github/vectara/example-notebooks/blob/main/notebooks/using-vectara-with-langchain.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Vectara and LangChain

In [None]:
!pip3 install langchain-vectara langgraph langchain langchain_openai langchain_community

## About Vectara

[Vectara](https://vectara.com/) is the trusted AI Assistant and Agent platform which focuses on enterprise readiness for mission-critical applications.
Vectara serverless RAG-as-a-service provides all the components of RAG behind an easy-to-use API, including:
1. A way to extract text from files (PDF, PPT, DOCX, etc)
2. ML-based chunking that provides state of the art performance.
3. The [Boomerang](https://vectara.com/how-boomerang-takes-retrieval-augmented-generation-to-the-next-level-via-grounded-generation/) embeddings model.
4. Its own internal vector database where text chunks and embedding vectors are stored.
5. A query service that automatically encodes the query into embedding, and retrieves the most relevant text segments, including support for [Hybrid Search](https://docs.vectara.com/docs/api-reference/search-apis/lexical-matching) as well as multiple reranking options such as the [multi-lingual relevance reranker](https://www.vectara.com/blog/deep-dive-into-vectara-multilingual-reranker-v1-state-of-the-art-reranker-across-100-languages), [MMR](https://vectara.com/get-diverse-results-and-comprehensive-summaries-with-vectaras-mmr-reranker/), [UDF reranker](https://www.vectara.com/blog/rag-with-user-defined-functions-based-reranking). 
6. An LLM to for creating a [generative summary](https://docs.vectara.com/docs/learn/grounded-generation/grounded-generation-overview), based on the retrieved documents (context), including citations.

For more information:
- [Documentation](https://docs.vectara.com/docs/)
- [API Playground](https://docs.vectara.com/docs/rest-api/)
- [Quickstart](https://docs.vectara.com/docs/quickstart)

The main benefits of using Vectara RAG-as-a-service to build your application are:
* **Accuracy and Quality**: Vectara provides an end-to-end platform that focuses on eliminating hallucinations, reducing bias, and safeguarding copyright integrity.
* **Security**: Vectara's platform provides acess control--protecting against prompt injection attacks--and meets SOC2 and HIPAA compliance.
* **Explainability**: Vectara makes it easy to troubleshoot bad results by clearly explaining rephrased queries, LLM prompts, retrieved results, and agent actions.

In this notebook, we will demonstrate some of the great ways you can use Vectara together with LangChain.

## Getting Started

To get started, use the following steps:
1. If you don't already have one, [Sign up](https://www.vectara.com/integrations/langchain) for your free Vectara trial.
2. Within your account you can create one or more corpora. Each corpus represents an area that stores text data upon ingest from input documents. To create a corpus, use the **"Create Corpus"** button. You then provide a name to your corpus as well as a description. Optionally you can define filtering attributes and apply some advanced options. If you click on your created corpus, you can see its name and corpus ID right on the top.
3. Next you'll need to create API keys to access the corpus. Click on the **"Access Control"** tab in the corpus view and then the **"Create API Key"** button. Give your key a name, and choose whether you want query-only or query+index for your key. Click "Create" and you now have an active API key. Keep this key confidential. 

To use LangChain with Vectara, you'll need to have these two values: `corpus_key` and `api_key`.
You can provide `VECTARA_API_KEY` to LangChain in two ways:

1. Include in your environment these two variables: `VECTARA_API_KEY`.

   For example, you can set these variables using os.environ and getpass as follows:

```python
import os
import getpass

os.environ["VECTARA_API_KEY"] = getpass.getpass("Vectara API Key:")
```

2. Add them to the `Vectara` vectorstore constructor:

```python
vectara = Vectara(
    vectara_api_key=vectara_api_key
)
```

In this notebook we assume they are provided in the environment. Some examples uses OPENAI as well. Please set the OPENAI API KEY as well.

In [2]:
import os

os.environ["VECTARA_API_KEY"] = "<VECTARA_API_KEY>"
os.environ["VECTARA_CORPUS_KEY"] = "VECTARA_CORPUS_KEY"
os.environ["OPENAI_API_KEY"] = "OPENAI_API_KEY"

from langchain_vectara import Vectara
from langchain_vectara.vectorstores import (
    ChainReranker,
    CorpusConfig,
    CustomerSpecificReranker,
    File,
    GenerationConfig,
    MmrReranker,
    SearchConfig,
    VectaraQueryConfig,
)

vectara = Vectara(vectara_api_key=os.getenv("VECTARA_API_KEY"))

## Data Ingestion

You can ingest data into Vectara directly using Vectara's [indexing API](https://docs.vectara.com/docs/api-reference/indexing-apis/indexing) API, using a tool like [vectara-ingest](https://github.com/vectara/vectara-ingest), or via the Vectara Langchain component directly. We will demonstrate data ingest via LangChain `add_files` method.

First we load the state-of-the-union text into Vectara.

Note that we use the `add_files` interface which does not require any local processing or chunking - Vectara receives the file content and performs all the necessary pre-processing, chunking and embedding of the file into its knowledge store.

In this case it uses a .txt file but the same works for many other [file types](https://docs.vectara.com/docs/api-reference/indexing-apis/file-upload/file-upload-filetypes).

In [3]:
corpus_key = os.getenv("VECTARA_CORPUS_KEY")
file_obj = File(
    file_path="../data/state_of_the_union.txt",
    metadata={"source": "text_file"},
)
vectara.add_files([file_obj], corpus_key)

['state_of_the_union.txt']

You can also use `add_texts` (or `add_documents` which is similar with a lightly different interface) method as well for the data ingestion.

For `add_texts` the input is simply a set of text strings:

```python
vectara.add_texts(["to be or not to be", "that is the question"])
```

## Vectara: RAG-as-a-service

Vectara is not a vector DB, it's much more than that - it is a full **RAG-as-a-service** platform. 
Yes, we have our own internal implementation of a scalable and serverless vector database, but that is just one piece of a whole set of components needed to implement RAG. The other components include text extraction, chunking, the Boomerang embedding model, advanced retrieval such as hybrid search or MMR, multi-lingual reranker, and more.

We now create a `VectaraQueryConfig` object to control the retrieval and summarization options:
* We enable summarization, specifying we would like the LLM to pick the top 7 matching chunks and respond in English

Using this configuration, let's create a LangChain `Runnable` object that encpasulates the full Vectara RAG pipeline, using the `as_rag` method:


In [4]:
generation_config = GenerationConfig(
    max_used_search_results=7,
    response_language="eng",
    generation_preset_name="vectara-summary-ext-24-05-med-omni",
    enable_factual_consistency_score=True,
)
search_config = SearchConfig(
    corpora=[CorpusConfig(corpus_key=corpus_key)],
    limit=25,
    reranker=ChainReranker(
        rerankers=[
            CustomerSpecificReranker(reranker_id="rnk_272725719", limit=100),
            MmrReranker(diversity_bias=0.2, limit=100),
        ]
    ),
)

config = VectaraQueryConfig(
    search=search_config,
    generation=generation_config,
)

query_str = "what did Biden say?"

rag = vectara.as_rag(config)
rag.invoke(query_str)["answer"]

"President Biden discussed several key topics in his recent statements. He emphasized the importance of keeping schools open and noted that with a high vaccination rate and reduced hospitalizations, most Americans can safely return to normal activities without masks [1]. He addressed the need to hold social media platforms accountable for their impact on children and called for stronger privacy protections and mental health services [2]. Biden also announced measures against Russia, including preventing its central bank from defending the Ruble and targeting Russian oligarchs' assets, as well as closing American airspace to Russian flights [3], [7]. Additionally, he highlighted the need to protect women's rights, specifically the right to choose as affirmed in Roe v. Wade [5]."

We can also use the streaming interface like this:

In [5]:
output = {}
curr_key = None
for chunk in rag.stream(query_str):
    for key in chunk:
        if key not in output:
            output[key] = chunk[key]
        else:
            output[key] += chunk[key]
        if key == "answer":
            print(chunk[key], end="", flush=True)
        curr_key = key

President Biden discussed several key topics in his recent statements. He emphasized the importance of keeping schools open and noted that with a high vaccination rate and reduced hospitalizations, most Americans can safely return to normal activities [1]. He addressed the need to hold social media platforms accountable for their impact on children and called for stronger privacy protections and mental health services [2]. Biden also announced measures against Russia, including preventing its central bank from defending the Ruble and targeting Russian oligarchs' assets, as well as closing American airspace to Russian flights [3], [7]. Additionally, he reaffirmed the need to protect women's rights, particularly the right to choose as affirmed in Roe v. Wade [5].

Notice how simple the RAG pipeline is here. It does not require access to an OpenAI key or any other external service for that matter, everything gets done inside the Vectara RAG platform. 

To set things up we have configured:
- `GenerationConfig`: used to specify parameters for the generative summarizer, such as the language of the response, the number of top_k results to include in the summary, or the summarizer (prompt) name.
- `SearchConfig`: used to control corpus level parameteres and reranking, providing options like MMR or the multi-lingual reranker
- `VectaraQueryConfig` providing the overall configuration structure to control the RAG pipeline.

With this configuration, all you have to do is call `vectara.as_rag(config)` and you get a LangChain `Runnable` object on which you can run `invoke()` or `stream()`. 

## Hallucination detection and Factual Consistency Score

Vectara created [HHEM](https://huggingface.co/vectara/hallucination_evaluation_model) - an open source model that can be used to evaluate RAG responses for factual consistency.

As part of the Vectara RAG, the "Factual Consistency Score" (or FCS), which is an improved version of the open source HHEM is made available via the API. This is automatically included in the output of the RAG pipeline

In [6]:
resp = rag.invoke(query_str)
print(resp["answer"])
print(f"Vectara FCS = {resp['fcs']}")

President Biden addressed several key issues in his recent statements. He emphasized the importance of keeping schools open and noted that with a high vaccination rate and reduced hospitalizations, most Americans can safely return to normal activities [1]. He also highlighted the need to hold social media platforms accountable for their impact on children and called for stronger privacy protections and mental health services [2]. On international matters, Biden announced measures to weaken Russia's economy and military by targeting Russian oligarchs and closing American airspace to Russian flights [3], [7]. Additionally, he reaffirmed the commitment to protect women's rights, particularly the right to choose as affirmed in Roe v. Wade [5].
Vectara FCS = 0.6191406


## Vectara as a Retriever

You can also integrate Vectara just as a powerful semantic search engine. Similar to other vector stores in Langchain, in this case you can use Vectara as a `retriever`, and take advantage of the stadnard `similarity_search` method (or `similarity_search_with_score`), which takes a query string and returns a list of results:

In [7]:
config.generation = None
config.search.limit = 5
retriever = vectara.as_retriever(config=config)
retriever.invoke(query_str)

[Document(metadata={'X-TIKA:Parsed-By': 'org.apache.tika.parser.csv.TextAndCSVParser', 'Content-Encoding': 'UTF-8', 'X-TIKA:detectedEncoding': 'UTF-8', 'X-TIKA:encodingDetector': 'UniversalEncodingDetector', 'Content-Type': 'text/plain; charset=UTF-8', 'source': 'text_file', 'framework': 'langchain'}, page_content='The U.S. Department of Justice is assembling a dedicated task force to go after the crimes of Russian oligarchs. We are joining with our European allies to find and seize your yachts your luxury apartments your private jets. We are coming for your ill-begotten gains. And tonight I am announcing that we will join our allies in closing off American air space to all Russian flights – further isolating Russia – and adding an additional squeeze –on their economy. The Ruble has lost 30% of its value.'),
 Document(metadata={'X-TIKA:Parsed-By': 'org.apache.tika.parser.csv.TextAndCSVParser', 'Content-Encoding': 'UTF-8', 'X-TIKA:detectedEncoding': 'UTF-8', 'X-TIKA:encodingDetector': '

For backwards compatibility, you can also enable summarization with a retriever, in which case the summary is added as an additional Document object:

In [8]:
config.generation = GenerationConfig()
config.search.limit = 10
retriever = vectara.as_retriever(config=config)
retriever.invoke(query_str)

[Document(metadata={'X-TIKA:Parsed-By': 'org.apache.tika.parser.csv.TextAndCSVParser', 'Content-Encoding': 'UTF-8', 'X-TIKA:detectedEncoding': 'UTF-8', 'X-TIKA:encodingDetector': 'UniversalEncodingDetector', 'Content-Type': 'text/plain; charset=UTF-8', 'source': 'text_file', 'framework': 'langchain'}, page_content='We won’t be able to compete for the jobs of the 21st Century if we don’t fix that. That’s why it was so important to pass the Bipartisan Infrastructure Law—the most sweeping investment to rebuild America in history. This was a bipartisan effort, and I want to thank the members of both parties who worked to make it happen. We’re done talking about infrastructure weeks. We’re going to have an infrastructure decade.'),
 Document(metadata={'X-TIKA:Parsed-By': 'org.apache.tika.parser.csv.TextAndCSVParser', 'Content-Encoding': 'UTF-8', 'X-TIKA:detectedEncoding': 'UTF-8', 'X-TIKA:encodingDetector': 'UniversalEncodingDetector', 'Content-Type': 'text/plain; charset=UTF-8', 'source': 

## Advanced LangChain query pre-processing with Vectara

Vectara's "RAG as a service" does a lot of the heavy lifting in creating question answering or chatbot chains. The integration with LangChain provides the option to use additional capabilities such as query pre-processing  like `SelfQueryRetriever` or `MultiQueryRetriever`. Let's look at an example of using the [MultiQueryRetriever](https://python.langchain.com/docs/modules/data_connection/retrievers/MultiQueryRetriever).

Since MQR uses an LLM we have to set that up - here we choose `ChatOpenAI`:

In [14]:
from langchain.retrievers.multi_query import MultiQueryRetriever
from langchain_openai.chat_models import ChatOpenAI



llm = ChatOpenAI(temperature=0)
mqr = MultiQueryRetriever.from_llm(retriever=retriever, llm=llm)


def get_summary(documents):
    return documents[-1].page_content


(mqr | get_summary).invoke(query_str)

"The latest news on Biden's public statements includes several key announcements. President Biden has announced initiatives to address the climate crisis, including building a national network of 500,000 electric vehicle charging stations and replacing lead pipes to ensure clean water for all Americans. He also plans to fix over 65,000 miles of highway and 1,500 bridges, emphasizing the use of American products to support jobs [1]. Additionally, Biden has announced a crackdown on foreign-owned ocean carriers that have been overcharging American businesses and consumers during the pandemic [2]. Furthermore, he has stated that the U.S. will join European allies in closing American airspace to Russian flights and targeting Russian oligarchs' assets as part of efforts to isolate Russia economically [4]."

## Vectara Chat

In most uses of LangChain to create chatbots, one must integrate a special `memory` component that maintains the history of chat sessions and then uses that history to ensure the chatbot is aware of conversation history.

With Vectara Chat - all of that is performed in the backend by Vectara automatically. You can look at the [Chat](https://docs.vectara.com/docs/api-reference/chat-apis/chat-apis-overview) documentation for the details, to learn more about the internals of how this is implemented, but with LangChain all you have to do is turn that feature on in the Vectara vectorstore.

Let's see an example. We'll create a Chat Runnable using the `as_chat` method:

In [15]:
generation_config = GenerationConfig(
    max_used_search_results=7,
    response_language="eng",
    generation_preset_name="vectara-summary-ext-24-05-med-omni",
    enable_factual_consistency_score=True,
)
search_config = SearchConfig(
    corpora=[CorpusConfig(corpus_key=corpus_key, limit=25)],
    reranker=MmrReranker(diversity_bias=0.2),
)

config = VectaraQueryConfig(
    search=search_config,
    generation=generation_config,
)


bot = vectara.as_chat(config)

In [16]:
bot.invoke("What did the president say about Ketanji Brown Jackson?")["answer"]

'The president stated that nominating someone to serve on the United States Supreme Court is one of the most serious constitutional responsibilities he has. He nominated Circuit Court of Appeals Judge Ketanji Brown Jackson, describing her as one of the nation’s top legal minds who will continue Justice Breyer’s legacy of excellence [1].'

Here's an example of asking a question with some chat history

In [17]:
bot.invoke("Did he mention who she suceeded?")["answer"]

'Yes, the president mentioned that Ketanji Brown Jackson succeeded Justice Breyer [1].'

## Chat with streaming

Of course the chatbot interface also supports streaming.
Instead of the `invoke` method you simply use `stream`:

In [18]:
output = {}
curr_key = None
for chunk in bot.stream("what did he said about the covid?"):
    for key in chunk:
        if key not in output:
            output[key] = chunk[key]
        else:
            output[key] += chunk[key]
        if key == "answer":
            print(chunk[key], end="", flush=True)
        curr_key = key

The president acknowledged the significant impact of COVID-19 on the nation and emphasized the need to stop viewing it as a partisan issue, instead recognizing it as a severe disease that has caused much loss of life. He highlighted the progress made in combating the virus, including vaccination efforts, and noted that severe cases have decreased significantly. The president also mentioned new CDC guidelines allowing most Americans to be mask-free, indicating a move towards more normal routines [3], [4], [7].

## Chaining

For additional capabilities you can use chaining.

In [19]:
from langchain_core.output_parsers import StrOutputParser
from langchain_core.prompts import ChatPromptTemplate
from langchain_openai.chat_models import ChatOpenAI

llm = ChatOpenAI(temperature=0)

prompt = ChatPromptTemplate.from_messages(
    [
        (
            "system",
            "You are a helpful assistant that explains the stuff to a five year old.  Vectara is providing the answer.",
        ),
        ("human", "{vectara_response}"),
    ]
)


def get_vectara_response(question: dict) -> str:
    """
    Calls Vectara as_chat and returns the answer string.  This encapsulates
    the Vectara call.
    """
    try:
        response = bot.invoke(question["question"])
        return response["answer"]
    except Exception as e:
        return "I'm sorry, I couldn't get an answer from Vectara."


# Create the chain
chain = get_vectara_response | prompt | llm | StrOutputParser()


# Invoke the chain
result = chain.invoke({"question": "what did he say about the covid?"})
print(result)

The president talked about how the COVID-19 sickness has affected many people in the country. He said he knows that people are tired and upset about it. The president also mentioned that we are doing better at fighting the virus by giving people vaccines and helping them with money. He said that things are getting better because fewer people are getting very sick, and now most people can stop wearing masks. The president asked everyone to work together and not fight about COVID-19 because it's something we all need to deal with as a team.


## Use Vectara tools to create agent

The code below demonstrates how to use Vectara with LangChain to create an agent.

In [22]:
import json
from langchain_vectara.tools import VectaraRAG
from langchain_core.messages import HumanMessage
from langchain_openai.chat_models import ChatOpenAI
from langgraph.prebuilt import create_react_agent

llm = ChatOpenAI(temperature=0)

vectara_rag_tool = VectaraRAG(
        name="rag-tool",
        description="Get answers about state of the union",
        vectorstore=vectara,
        corpus_key=corpus_key,
        config=config,
    )

# Set up the tools and LLM
tools = [vectara_rag_tool]
llm = ChatOpenAI(model='gpt-4o-mini', temperature=0)

# Construct the ReAct agent
agent_executor = create_react_agent(llm, tools)

question = "What is an API key? What is a JWT token? When should I use one or the other?"
input_data = {"messages": [HumanMessage(content=question)]}


agent_executor.invoke(input_data)

{'messages': [HumanMessage(content='What is an API key? What is a JWT token? When should I use one or the other?', additional_kwargs={}, response_metadata={}, id='ea7f3516-892e-4377-b538-2dcc50e427b7'),
  AIMessage(content="An API key and a JWT (JSON Web Token) are both methods used for authentication and authorization in web applications, but they serve different purposes and have different characteristics.\n\n### API Key\n- **Definition**: An API key is a unique identifier used to authenticate a client making requests to an API. It is typically a long string of characters that is passed along with the API request.\n- **Usage**: API keys are often used for simple authentication scenarios where the client needs to be identified, but there is no need for complex user authentication or session management.\n- **Security**: API keys are generally less secure than JWTs because they do not provide a way to verify the identity of the user making the request. If an API key is compromised, it c