<a href="https://colab.research.google.com/github/vectara/example-notebooks/blob/main/notebooks/using-vectara-with-langchain.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Vectara and LangChain

In [1]:
#!pip install -U langchain langchain_community langchain_openai

## About Vectara

[Vectara](https://vectara.com/) is the trusted AI Assistant and Agent platform which focuses on enterprise readiness for mission-critical applications. 

Vectara provides an end-to-end managed service for Retrieval Augmented Generation or [RAG](https://vectara.com/grounded-generation/), which includes:

1. An integrated API for processing input data, including text extraction from documents and ML-based chunking.

2. The state-of-the-art [Boomerang](https://vectara.com/how-boomerang-takes-retrieval-augmented-generation-to-the-next-level-via-grounded-generation/) embeddings model. Each text chunk is encoded into a vector embedding using Boomerang, and stored in the Vectara internal knowledge (vector+text) store. Thus, when using Vectara with LlamaIndex you do not need to call a separate embedding model - this happens automatically within the Vectara backend.

3. A query service that automatically encodes the query into embeddings and retrieves the most relevant text segmentsthrough [hybrid search](https://docs.vectara.com/docs/api-reference/search-apis/lexical-matching) and a variety of [reranking](https://docs.vectara.com/docs/api-reference/search-apis/reranking) strategies, including a [multilingual reranker](https://docs.vectara.com/docs/learn/vectara-multi-lingual-reranker), [maximal marginal relevance (MMR) reranker](https://docs.vectara.com/docs/learn/mmr-reranker), [user-defined function reranker](https://docs.vectara.com/docs/learn/user-defined-function-reranker), and a [chain reranker](https://docs.vectara.com/docs/learn/chain-reranker) that provides a way to chain together multiple reranking methods to achieve better control over the reranking, combining the strengths of various reranking methods.

4. An option to create a [generative summary](https://docs.vectara.com/docs/learn/grounded-generation/grounded-generation-overview) with a wide selection of LLM summarizers (including Vectara's [Mockingbird](https://vectara.com/blog/mockingbird-is-a-rag-specific-llm-that-beats-gpt-4-gemini-1-5-pro-in-rag-output-quality/), trained specifically for RAG-based tasks), based on the retrieved documents, including citations.

See the [Vectara API documentation](https://docs.vectara.com/docs/) for more information on how to use the API.

The main benefits of using Vectara RAG-as-a-service to build your application are:
* **Accuracy and Quality**: Vectara provides an end-to-end platform that focuses on eliminating hallucinations, reducing bias, and safeguarding copyright integrity.
* **Security**: Vectara's platform provides acess control--protecting against prompt injection attacks--and meets SOC2 and HIPAA compliance.
* **Explainability**: Vectara makes it easy to troubleshoot bad results by clearly explaining rephrased queries, LLM prompts, retrieved results, and agent actions.

In this notebook, we will demonstrate some of the great ways you can use Vectara together with LangChain.

## Setup

You will need a Vectara account to use Vectara with LangChain. To get started, use the following steps:
1. [Sign up](https://console.vectara.com/signup?utm_source=vectara&utm_medium=signup&utm_term=DevRel&utm_content=example-notebooks&utm_campaign=vectara-signup-DevRel-example-notebooks) for a Vectara account if you don't already have one. Once you have completed your sign up you will have a Vectara customer ID. You can find your customer ID by clicking on your name, on the top-right of the Vectara console window.
2. Within your account you can create one or more corpora. Each corpus represents an area that stores text data upon ingest from input documents. To create a corpus, use the **"Create Corpus"** button. You then provide a name to your corpus as well as a description. Optionally you can define filtering attributes and apply some advanced options. If you click on your created corpus, you can see its name and corpus ID right on the top.
3. Next you'll need to create API keys to access the corpus. Click on the **"Access Control"** tab in the corpus view and then the **"Create API Key"** button. Give your key a name, and choose whether you want query only or query+index for your key. Click "Create" and you now have an active API key. Keep this key confidential. Alternatively Vectara also provides a [Personal API key](https://vectara.com/blog/vectaras-new-personal-api-keys/) that is tied to your account and provides broader permissions.

To use LangChain with Vectara, you'll need to provide your `customer ID`, `corpus ID` and an `api_key` to the LangChain Vectara class. You can do this in two ways:

1. Include in your environment these three variables: `VECTARA_CUSTOMER_ID`, `VECTARA_CORPUS_ID` and `VECTARA_API_KEY`.

> For example, you can set these variables using `os.environ` as follows (these credentials point to one to a public Vectara corpus where the contents of [vectara documentation](https://docs.vectara.com/docs/) are indexed):

```python
import os

customer_id = '1366999410'
corpus_id = '1'
api_key = 'zqt_UXrBcnI2UXINZkrv4g1tQPhzj02vfdtqYJIDiA'

os.environ["VECTARA_CUSTOMER_ID"] = customer_id
os.environ["VECTARA_CORPUS_ID"] = corpus_id
os.environ["VECTARA_API_KEY"] = api_key
```

2. Add them explicitly to the Vectara constructor:

```python
vectorstore = Vectara(
                vectara_customer_id=customer_id,
                vectara_corpus_id=corpus_id,
                vectara_api_key=api_key
            )
```

## Vectara: RAG-as-a-service

Vectara is not a vector DB, it's much more than that - it is a full **RAG-as-a-service** platform. 
Yes, we have our own internal implementation of a scalable and serverless vector database, but that is just one piece of a whole set of components needed to implement RAG. The other components include text extraction, chunking, the Boomerang embedding model, advanced retrieval such as hybrid search or MMR, multi-lingual reranker, and more.

You can ingest data into Vectara directly using Vectara's [indexing API](https://docs.vectara.com/docs/api-reference/indexing-apis/indexing) API, using a tool like [vectara-ingest](https://github.com/vectara/vectara-ingest), or via the Vectara Langchain component directly. We will demonstrate data ingest via LangChain later in this notebook - for now let's assume you already ingested data into your Vectara corpus and see how querying works - the fun part!

Throughout this notebook, we will utilize LangChain [LCEL](https://python.langchain.com/docs/expression_language/) which provides a nice syntax for chaining components.

In [2]:
import os

customer_id = '1366999410'
corpus_id = '1'
api_key = 'zqt_UXrBcnI2UXINZkrv4g1tQPhzj02vfdtqYJIDiA'

os.environ["VECTARA_CUSTOMER_ID"] = customer_id
os.environ["VECTARA_CORPUS_ID"] = corpus_id
os.environ["VECTARA_API_KEY"] = api_key

In [3]:
from langchain_community.vectorstores import Vectara
from langchain_community.vectorstores.vectara import (
    RerankConfig,
    SummaryConfig,
    VectaraQueryConfig,
)

# Instantiate the Vectara object, pointing it to the corpus as specified by the environment variables
vectara = Vectara()

# Define configuration for generative summary component and create the "retriever" object
summary_config = SummaryConfig(is_enabled=True, max_results=7, response_lang="eng", 
                               prompt_name="vectara-summary-ext-24-05-med-omni")
rerank_config = RerankConfig(reranker="mmr", rerank_k=50, mmr_diversity_bias=0.2)
config = VectaraQueryConfig(k=10, lambda_val=0.005, summary_config=summary_config, rerank_config=rerank_config)

query_str = "What is Vectara?"
rag = vectara.as_rag(config)
res = rag.invoke(query_str)
print(f"We have {len(res['context'])} context documents")
print(res['answer'])

We have 10 context documents
Vectara is an end-to-end platform designed for product builders to integrate powerful generative AI capabilities into applications. It enhances traditional search methods by understanding the context and meaning of data, providing more accurate responses to user queries. Vectara combines keyword-based and semantic search in a hybrid model, allowing for flexible text retrieval. It supports secure data handling by not training on customer data and offers features like customer-managed keys and encryption. Vectara aims to transform data into insights, assisting in decision-making and improving user experiences with context-aware answers [1], [3], [4], [5].


Notice how simple the RAG pipeline is here. It does not require access to an OpenAI key or any other external service for that matter, everything gets done inside the Vectara RAG platform. 

To set things up we have configured:
- `SummaryConfig`: used to specify parameters for the generative summarizer, such as the language of the response, the number of top_k results to include in the summary, or the summarizer (prompt) name.
- `RerankConfig`: used to control reranking, providing options like MMR or the multi-lingual reranker
- `VectaraQueryConfig` providing the overall configuration structure to control the RAG pipeline.

With this configuration, all you have to do is call `vectara.as_rag(config)` and you get a LangChain `Runnable` object on which you can run `invoke()` or `stream()`. 

To learn more about configuration parameters for summarization see [this document](https://github.com/langchain-ai/langchain/blob/1e748a6d406fc4ed5c3ca1218f4990e6a45530f3/docs/docs/integrations/providers/vectara/index.mdx#vectara-for-retrieval-augmented-generation-rag)

## Hallucination detection and Factual Consistency Score

Vectara created [HHEM](https://huggingface.co/vectara/hallucination_evaluation_model) - an open source model that can be used to evaluate RAG responses for factual consistency.

As part of the Vectara RAG, the "Factual Consistency Score" (or FCS), which is an improved version of the open source HHEM is made available via the API. This is automatically included in the output of the RAG pipeline

In [4]:
resp1 = rag.invoke("What file types are supported?")
print(resp1["answer"] + '\n')
print(f"Vectara FCS = {resp1['fcs']}")

The supported file types include PDFs, Microsoft Word, Text, HTML, and Markdown [5], [6].

Vectara FCS = 0.46104664


## Vectara as a Retriever

You can also integrate Vectara just as a powerful semantic search engine. Similar to other vector stores in Langchain, in this case you can use Vectara as a `retriever`, and take advantage of the stadnard `similarity_search` method (or `similarity_search_with_score`), which takes a query string and returns a list of results:

In [5]:
config.summary_config.is_enabled = False
config.k = 3

retriever = vectara.as_retriever(config=config)
retriever.invoke("is data encrypted?")

[Document(metadata={'title_level': '1', 'is_title': 'true', 'lang': 'eng', 'source': 'docusaurus', 'url': 'https://docs.vectara.com/docs/learn/data-privacy/encryption', 'title': 'Data Encryption | Vectara Docs'}, page_content='Data Encryption | Vectara Docs When you send documents to the index API or file upload API, Vectara indexes both the document text and metadata. If you choose the “textless” option for corpus creation, Vectara converts the document text into vectors for indexing but does not store the text anywhere in the platform.'),
 Document(metadata={'lang': 'eng', 'offset': '2629', 'len': '136', 'source': 'docusaurus', 'url': 'https://docs.vectara.com/docs/learn/authentication/role-based-access-control', 'title': 'Role-Based Access Control (RBAC) | Vectara Docs'}, page_content='• Encoder swapping. Whether the indexing and querying encoders be swapped to support semantic similarity matching in addition to question-answer matching. • Textless. Defines whether corpora be built 

## Vectara Chat

Vectara now supports Chat functionality without any additional components. all you have to do is call `as_chat()` and the resulting bot uses Vectara's native Chat functionality behind the scenes:

In [6]:
summary_config = SummaryConfig(is_enabled=True, max_results=7, response_lang="eng")
rerank_config = RerankConfig(reranker="mmr", rerank_k=50, mmr_diversity_bias=0.2)
config = VectaraQueryConfig(
    k=10, lambda_val=0.005, rerank_config=rerank_config, summary_config=summary_config
)

bot = vectara.as_chat(config)

In [7]:
bot.invoke("What is the FILE_UPLOAD API?")["answer"]

'The FILE_UPLOAD API by Vectara allows extraction of text from unstructured documents like PDFs and Microsoft Word files, supporting common file types[1]. It has a 10 MB file size limit and is recommended when custom extraction logic is not in place[2][4]. The API enables attaching user-defined metadata at the document level for optimized searches[4][6]. When files are sent to the API, both the text and metadata are indexed by Vectara, with an option to convert text into vectors for indexing while not storing the text in the platform if the "textless" option is chosen[7].'

In [8]:
bot.invoke("how is it different than standard indexing API?")["answer"]

'The FILE_UPLOAD API and the standard indexing API offered by Vectara serve different purposes. The FILE_UPLOAD API allows you to extract text from unstructured documents like PDFs or Microsoft Word files, with the option to include user-defined metadata at the document level [6]. It is suitable when you have not created your own extraction logic [5]. On the other hand, the standard indexing API is recommended for structured data and provides more control over data segmentation and indexing processes [3]. It transforms structured data into a searchable format quickly and supports various data formats by allowing specification of multiple document attributes and metadata [3]. Each API caters to specific needs based on the nature of the documents and the level of control required during the indexing process.'

## Advanced LangChain query pre-processing with Vectara

Vectara's "RAG as a service" does a lot of the heavy lifting in creating question answering or chatbot chains. The integration with LangChain provides the option to use additional capabilities such as query pre-processing like SelfQueryRetriever or [MultiQueryRetriever](https://python.langchain.com/v0.1/docs/modules/data_connection/retrievers/MultiQueryRetriever/). Let's look at an example of using the MultiQueryRetriever.

Since MQR uses an LLM we have to set that up - here we choose ChatOpenAI:

In [9]:
from langchain.retrievers.multi_query import MultiQueryRetriever
from langchain_openai.chat_models import ChatOpenAI

llm = ChatOpenAI(temperature=0)
mqr = MultiQueryRetriever.from_llm(retriever=retriever, llm=llm)

def get_summary(documents):
    return documents[-1].page_content

(mqr | get_summary).invoke(query_str)


'Vectara provides a Hybrid Search that offers a powerful and flexible approach to text retrieval. We combine partial, exact, and Boolean text matching with neural models which blends traditional, keyword-based search with semantic search in what is called "hybrid" retrieval model. For example, Vectara enables you to do the following:\n• Include exact keyword matches for occasions where a search term was absent from Vectara\'s training data (e.g. product SKUs)\n• Disable neural retrieval entirely, and instead use exact term matching\n• Incorporate typical keyword modifiers like a function, exact phrase matching, and wildcard prefixes of terms\n\nThe exact and Boolean text matching (similar to a traditional, keyword-based search) is disabled by default and Vectara only uses neural retrieval. You can enable hybrid search by specifying a value, , at query time, specifically under the . This value can range from to (inclusive).'

## Agentic RAG with Vectara

Agentic RAG is a powerful methodology to provide your RAG implementation more agency with approaches like ReAct.
The code below demonstrates how to use Vectara with LangChain to create an agent that uses Vectara for RAG.

In [10]:
from langchain import hub
from langchain.agents import AgentExecutor, create_react_agent
from langchain_openai import ChatOpenAI
from langchain.pydantic_v1 import BaseModel, Field
from langchain.tools import BaseTool, StructuredTool, tool

@tool
def vectara_tool(
    query: str = Field(description="the query string."), 
)-> str:
    """A tool for getting answers to questions about Vectara's product and API API."""
    summary_config = SummaryConfig(is_enabled=True, max_results=5, 
                                   response_lang="eng", prompt_name="vectara-summary-ext-24-05-sml")
    rerank_config = RerankConfig(reranker="mmr", rerank_k=50, mmr_diversity_bias=0.2)
    config = VectaraQueryConfig(
        k=10, lambda_val=0.005, rerank_config=rerank_config, summary_config=summary_config
    )
    
    rag = vectara.as_rag(config)
    return rag.invoke(query)['answer']
    
tools = [vectara_tool]
llm = ChatOpenAI(model='gpt-4o', temperature=0)

prompt = hub.pull("hwchase17/react")
prompt.template = '''
Answer the following question as best you can. 

You have access to the following tools:

{tools}

Use the following format:

Question: the input question you must answer
Thought: you should always think about what to do
Action: the action to take, should be one of [{tool_names}]
Action Input: the input to the action
Observation: the result of the action

(This Thought/Action/Action Input/Observation can repeat N times)

Thought: I now know the final answer
Final Answer: the final answer to the original input question

Instructions:
- If a tool response with an incorrect response, you can rephrase your question and try again.
- Base your response primarily on information provided by tools and not prior knowledge.
- Tools respond better to shorter and more concise queries, so try to break down complex questions into simpler sub-questions.



Begin!
Question: {input}
Thought:{agent_scratchpad}
'''


# Construct the ReAct agent
agent = create_react_agent(llm, tools, prompt)
agent_executor = AgentExecutor(agent=agent, tools=tools, verbose=True)

print(agent_executor.invoke({"input": "What is a an API key? What is a JWT token? when should I use one or the other?"})['output'])


For example, replace imports like: `from langchain.pydantic_v1 import BaseModel`
with: `from pydantic import BaseModel`
or the v1 compatibility namespace if you are working in a code base that has not been fully upgraded to pydantic 2 yet. 	from pydantic.v1 import BaseModel

  exec(code_obj, self.user_global_ns, self.user_ns)




[1m> Entering new AgentExecutor chain...[0m
[32;1m[1;3mTo answer your question, I will first gather information about API keys and JWT tokens using the Vectara tool.

Action: vectara_tool
Action Input: "What is an API key?"
[0m[36;1m[1;3mAn API key is a unique code that grants access to specific functionalities within a system. It allows users to perform various operations such as querying, indexing, and managing data. API keys can be easily created, managed, and revoked to ensure security. They are used for controlled and anonymous access, simplifying integration with external systems like websites. API keys can have different levels of permissions, with some keys providing read-only access while others allow both read and write operations. It is essential to handle API keys with caution, similar to passwords, especially in production environments.[0m[32;1m[1;3mI have gathered information about API keys. Now, I will gather information about JWT tokens using the Vectara too

## LangChain Vectara Templates

LangChain templates offer a collection of easily deployable reference architectures, and there are two templates for using Vectara:
* [RAG](https://github.com/langchain-ai/langchain/tree/master/templates/rag-vectara) template for basic RAG.
* [RAG with multi-query](https://github.com/langchain-ai/langchain/tree/master/templates/rag-vectara-multiquery) for using Vectara RAG with the multi-query retriever.


## Data Ingestion into Vectara with LangChain

Even though it is more common to use Vectara with LangChain for query purposes, it is also possible to ingest data into Vectara via LangChain. There are two main functions that are useful for this purpose: `add_texts` (or `add_documents` which is similar with a lightly different interface) and `add_files`.

For `add_texts` the input is simply a set of text strings:

```python
vectara.add_texts(["to be or not to be", "that is the question"])
```

A common pattern is to use one of LangChain's data upload classes, extract the text from there, and then upload the text. Note that no chunking is necessary (although it is optional) in this case since Vectara performs its own optimal chunking.

Since Vectara supports [file upload](https://docs.vectara.com/docs/api-reference/indexing-apis/file-upload/file-upload) natively, we also added the ability to upload files (PDF, TXT, HTML, PPT, DOC, etc) directly in the LangChain class. When using this method, the file is uploaded directly to the Vectara platform, processed and chunked optimally there, so you don't have to use the LangChain document loader or chunking mechanism.

As an example:

```python
vectara.add_files(["path/to/file1.pdf", "path/to/file2.pdf",...])
```