<a href="https://colab.research.google.com/github/vectara/example-notebooks/blob/main/notebooks/using-vectara-with-langchain.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Vectara and LangChain

In [1]:
!pip install -U langchain langchain_community langchain_openai


[1m[[0m[34;49mnotice[0m[1;39;49m][0m[39;49m A new release of pip is available: [0m[31;49m23.3.2[0m[39;49m -> [0m[32;49m24.0[0m
[1m[[0m[34;49mnotice[0m[1;39;49m][0m[39;49m To update, run: [0m[32;49mpip install --upgrade pip[0m


[Vectara](https://vectara.com/) is a RAG-as-a-service platform for Retrieval Augmented Generation or RAG, which includes:

1. A way to extract text from document files and chunk them into sentences.

2. The state-of-the-art [Boomerang](https://vectara.com/how-boomerang-takes-retrieval-augmented-generation-to-the-next-level-via-grounded-generation/) embeddings model. Each text chunk is encoded into a vector embedding using Boomerang, and stored in the Vectara internal knowledge (vector+text) store

3. A query service that automatically encodes the query into embedding, and retrieves the most relevant text segments (including support for [Hybrid Search](https://docs.vectara.com/docs/api-reference/search-apis/lexical-matching) and [MMR](https://vectara.com/get-diverse-results-and-comprehensive-summaries-with-vectaras-mmr-reranker/))

4. An option to create a [generative summary](https://docs.vectara.com/docs/learn/grounded-generation/grounded-generation-overview), based on the retrieved documents, including citations.

See the [Vectara API documentation](https://docs.vectara.com/docs/) for more information on how to use the API.

This notebook shows some examples of how to use Vectara with langchain.

## Setup

You will need a Vectara account to use Vectara with LangChain. To get started, use the following steps:
1. [Sign up](https://www.vectara.com/integrations/langchain) for a Vectara account if you don't already have one. Once you have completed your sign up you will have a Vectara customer ID. You can find your customer ID by clicking on your name, on the top-right of the Vectara console window.
2. Within your account you can create one or more corpora. Each corpus represents an area that stores text data upon ingest from input documents. To create a corpus, use the **"Create Corpus"** button. You then provide a name to your corpus as well as a description. Optionally you can define filtering attributes and apply some advanced options. If you click on your created corpus, you can see its name and corpus ID right on the top.
3. Next you'll need to create API keys to access the corpus. Click on the **"Access Control"** tab in the corpus view and then the **"Create API Key"** button. Give your key a name, and choose whether you want query only or query+index for your key. Click "Create" and you now have an active API key. Keep this key confidential. 

To use LangChain with Vectara, you'll need to provide your `customer ID`, `corpus ID` and an `api_key` to the LangChain Vectara class. You can do this in two ways:

1. Include in your environment these three variables: `VECTARA_CUSTOMER_ID`, `VECTARA_CORPUS_ID` and `VECTARA_API_KEY`.

> For example, you can set these variables using `os.environ` as follows (this points to one to a public Vectara corpus where the contents of vectara.com are indexed):

```python
import os

customer_id = '1366999410'
corpus_id = '1'
api_key = 'zqt_UXrBcnI2UXINZkrv4g1tQPhzj02vfdtqYJIDiA'

os.environ["VECTARA_CUSTOMER_ID"] = customer_id
os.environ["VECTARA_CORPUS_ID"] = corpus_id
os.environ["VECTARA_API_KEY"] = api_key
```

2. Add them explicitly to the Vectara constructor:

```python
vectorstore = Vectara(
                vectara_customer_id=customer_id,
                vectara_corpus_id=corpus_id,
                vectara_api_key=api_key
            )
```

## Vectara: RAG-as-a-service

Vectara is not a vector DB, it's much more than that - it is a full RAG-as-a-service platform. 
Yes, we have our own internal implementation of a scalable and serverless vector store, but that is just one piece of a whole set of components needed to implement RAG. The other components include text extraction, chunking, the Boomerang embedding model, advanced retrieval such as hybrid search or MMR, and more.

You can ingest data into Vectara directly using Vectara's [indexing API](https://docs.vectara.com/docs/api-reference/indexing-apis/indexing), using a tool like [vectara-ingest](https://github.com/vectara/vectara-ingest), or via the Vectara Langchain component directly. We will explore data ingest later in this notebook - for now let's assume you already ingested data into your Vectara corpus and see how querying works. 

We will utilize LangChain [LCEL](https://python.langchain.com/docs/expression_language/) which provides a nice syntax for chaining in LangChain:

In [2]:
import os

customer_id = '1366999410'
corpus_id = '1'
api_key = 'zqt_UXrBcnI2UXINZkrv4g1tQPhzj02vfdtqYJIDiA'

os.environ["VECTARA_CUSTOMER_ID"] = customer_id
os.environ["VECTARA_CORPUS_ID"] = corpus_id
os.environ["VECTARA_API_KEY"] = api_key

In [3]:
from langchain_community.vectorstores import Vectara

# Instantiate the Vectara object, pointing it to the corpus as specified by the environment variables
vectara = Vectara()

# Define configuration for generative summary component and create the "retriever" object
summary_config = {
    "is_enabled": True, "max_results": 4, 
    "response_lang": "en",
    "prompt_name": "vectara-experimental-summary-ext-2023-10-23-small"
}
retriever = vectara.as_retriever(
    search_kwargs={ "k": 10, "summary_config": summary_config }
)

# The output of a query from Langchain is an array
# The first K documents are the source documents, and entry K+1 is the summary
# So we create convenience functions to grab those two pieces.
def get_sources(documents):
    return documents[:-1]
def get_summary(documents):
    return documents[-1].page_content

# Let's run a query
query_str = "what is Vectara?"
(retriever | get_summary).invoke(query_str)

"Vectara is an end-to-end platform that empowers product builders to embed powerful generative AI capabilities into applications. It offers significant improvements over traditional searches by understanding the context and meaning of data [1]. The platform enables developers to build a wide range of applications with powerful search experiences, without the risk of data or privacy violations [2]. Vectara provides a hybrid search approach that combines partial, exact, and Boolean text matching with neural models, allowing for more flexible and accurate retrieval of information [3]. It never trains on customer data, ensuring the security of user information [4]. Vectara's goal is to deliver contextually accurate responses and actions by deploying advanced zero-shot models and conversational search capabilities [4]. Overall, Vectara aims to revolutionize search technology and provide more accurate and insightful results to assist decision-making processes [1]."

Notice how simple the RAG pipeline is here. It does not require access to an OpenAI key or any other service for that matter, everything gets done inside the Vectara RAG platform. 

All you have to do is specify `summary_config`, with the following arguments:
- `is_enabled`: True or False
- `max_results`: number of results to use for summary generation
- `response_lang`: language of the response summary, in ISO 639-2 format (e.g. 'en', 'fr', 'de', etc)

This also allows us to take advantage of several query pre-processing capabilities that are part of Langchain: Self-query and multi-query.

## Vectara Semantic Search

You can also integrate Vectara just as a powerful semantic search engine. Similar to other `vector store`s in Langchain, in this case you can use the `similarity_search` method (or `similarity_search_with_score`), which takes a query string and returns a list of results:

```python
results = vectara.similarity_score("what is Vectara?")
```
The results are returned as a list of relevant documents, and a relevance score of each document.

In this case, we used the default retrieval parameters, but you can also specify the following additional arguments in `similarity_search` or `similarity_search_with_score`:
- `k`: number of results to return (defaults to 5)
- `lambda_val`: the [lexical matching](https://docs.vectara.com/docs/api-reference/search-apis/lexical-matching) factor for hybrid search (defaults to 0.025)
- `filter`: a [filter](https://docs.vectara.com/docs/common-use-cases/filtering-by-metadata/filter-overview) to apply to the results (default None)
- `n_sentence_context`: number of sentences to include before/after the actual matching segment when returning results. This defaults to 2.
- `mmr_config`: can be used to specify MMR mode in the query.
   - `is_enabled`: True or False
   - `mmr_k`: number of results to use for MMR reranking
   - `diversity_bias`: 0 = no diversity, 1 = full diversity. This is the lambda parameter in the MMR formula and is in the range 0...1


In [4]:
query_str = "what is Vectara?"
(retriever | get_sources).invoke(query_str)

[Document(page_content='What is the Vectara Platform? | Vectara Docs Welcome to the documentation homepage for Vectara , an end-to-end platform for product builders to embed powerful generative AI capabilities into applications with extraordinary results. Vectara offers significant improvements over traditional searches by understanding the context and meaning of your data. This revolutionary technology enables Vectara to drive insights and provide more accurate responses to user queries, assisting decision-making processes.', metadata={'lang': 'eng', 'offset': '0', 'len': '186', 'source': 'docusaurus', 'url': 'https://docs.vectara.com/docs', 'title': 'What is the Vectara Platform? | Vectara Docs'}),
 Document(page_content="The Vectara Generative AI platform enables developers with the flexibility to build a wide range of applications with powerful search experiences. The Vectara platform never trains on customer data which enables businesses to embed generative AI capabilities without

## Using LangChain's MultiQueryRetriever with Vectara

One of the great features of LangChain is the availability of advanced retreivers such as the MultiQuery retreiver.

The MultiQueryRetriever uses an LLM to generate multiple queries from different perspectives for a given user input query. For each query, it retrieves a set of relevant documents and takes the unique union across all queries to get a larger set of potentially relevant documents. 

Let's see how to use MultiQuery Retreiver with Vectara. In this case, you do need to use OpenAI directly from langchain:

In [5]:
from langchain_openai import ChatOpenAI
from langchain.retrievers.multi_query import MultiQueryRetriever

query_str = "Does Vectara support hybrid search and MMR in a single platform?"

llm = ChatOpenAI(temperature=0, model_name="gpt-4-turbo-preview")
mqr = MultiQueryRetriever.from_llm(retriever=retriever, llm=llm)

resp1 = (retriever | get_summary).invoke(query_str)
resp2 = (mqr | get_summary).invoke(query_str)

print(f"Response with regular query = {resp1}\n")

print(f"Response with MQ = {resp2}")

Response with regular query = Vectara supports hybrid search, combining keyword-based search with semantic search in a single platform [3]. It offers a flexible approach to text retrieval, blending traditional, keyword-based search with neural models [2]. By default, Vectara uses semantic similarity, but it can introduce keyword-focused algorithms to improve relevance [4]. This platform enables users to embed powerful hybrid search into their applications through simple APIs [1]. Vectara's hybrid search allows for exact keyword matches, disabling neural retrieval, and incorporating keyword modifiers [3]. It aims to deliver contextually accurate responses and generate better outcomes [2].

Response with MQ = The Vectara platform integrates hybrid search with MMR functionality in one solution. It combines partial, exact, and Boolean text matching with neural models, enabling a powerful and flexible approach to text retrieval [3]. Vectara deploys advanced zero-shot models and conversation

## SelfQuery Retreiver

Another such query pre-processing capability to mention here is the `SelfQueryRetriever`, where a user query can be transformed into a sub-query with a set of filtering conditions. 

For example in an e-commerce dataset a query like "which products cost at least 100 dollars and are blue" might be converted into "blue products" with a filtering condition of "doc.price > 100" (assuming "price" is a meta-data field).

You can find a complete example for using the `SelfQUeryRetriever` [here](https://python.langchain.com/docs/integrations/retrievers/self_query/vectara_self_query)

## LangChain Vectara Templates

LangChain templates offer a collection of easily deployable reference architectures, and there are two templates for using Vectara:
* [RAG](https://github.com/langchain-ai/langchain/tree/master/templates/rag-vectara) template for basic RAG.
* [RAG with multi-query](https://github.com/langchain-ai/langchain/tree/master/templates/rag-vectara-multiquery) for using Vectara RAG with the multi-query retriever.


## Data Ingestion with LangChain

Even though it is more common to use Vectara with LangChain for query purposes, it is also possible to ingest data into Vectara via LangChain. There are two main functions that are useful for this purpose: `add_texts` (or `add_documents` which is similar with a lightly different interface) and `add_files`.

For `add_texts` the input is simply a set of text strings:

```python
vectara.add_texts(["to be or not to be", "that is the question"])
```

A common pattern is to use one of LangChain's data upload classes, extract the text from there, and then upload the text. Note that no chunking is necessary (although it is optional) in this case since Vectara performs its own optimal chunking.

Since Vectara supports [file upload](https://docs.vectara.com/docs/api-reference/indexing-apis/file-upload/file-upload) natively, we also added the ability to upload files (PDF, TXT, HTML, PPT, DOC, etc) directly in the LangChain class. When using this method, the file is uploaded directly to the Vectara platform, processed and chunked optimally there, so you don't have to use the LangChain document loader or chunking mechanism.

As an example:

```python
vectara.add_files(["path/to/file1.pdf", "path/to/file2.pdf",...])
```