<a href="https://colab.research.google.com/github/vectara/example-notebooks/blob/main/notebooks/using-vectara-with-llamaindex.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Vectara and LlamaIndex

## About Vectara

[Vectara](https://vectara.com/) is the trusted GenAI and semantic search platform that provides an easy-to-use API for document indexing and querying. 

Vectara provides an end-to-end managed service for Retrieval Augmented Generation or [RAG](https://vectara.com/grounded-generation/), which includes:

1. A way to extract text from document files and chunk them into sentences.

2. The state-of-the-art [Boomerang](https://vectara.com/how-boomerang-takes-retrieval-augmented-generation-to-the-next-level-via-grounded-generation/) embeddings model. Each text chunk is encoded into a vector embedding using Boomerang, and stored in the Vectara internal knowledge (vector+text) store. Thus, when using Vectara with LlamaIndex you do not need to call a separate embedding model - this happens automatically within the Vectara backend.

3. A query service that automatically encodes the query into embeddings and retrieves the most relevant text segments (including support for [hybrid search](https://docs.vectara.com/docs/api-reference/search-apis/lexical-matching) and [reranking](https://docs.vectara.com/docs/api-reference/search-apis/reranking)).

4. An option to create a [generative summary](https://docs.vectara.com/docs/learn/grounded-generation/grounded-generation-overview) with a wide selection of LLM summarizers (including Vectara's [Mockingbird](https://vectara.com/blog/mockingbird-is-a-rag-specific-llm-that-beats-gpt-4-gemini-1-5-pro-in-rag-output-quality/), trained specifically for RAG-based tasks), based on the retrieved documents, including citations.

See the [Vectara API documentation](https://docs.vectara.com/docs/) for more information on how to use the API.

The main benefits of using Vectara for a RAG application are:
* **Ease of use**: Vectara provides an end-to-end, fully functional, highly scalable, and robust RAG pipeline, so as a user you don't have to code up these pieces and maintain them over time.
* **Scalability and Security**: Building GenAI applications may seem easy at first, but the DIY approach can become overwhelming beyond simple examples. Vectara provides instant scalablility to millions of documents, while maintaing data security and privacy, as well as latency SLAs.

## About LlamaIndex

LlamaIndex is a "data framework" to help you build LLM apps:

1. It includes **data connectors** to ingest your existing data sources and data formats (APIs, PDFs, docs, SQL, etc.)
2. It provides ways to **structure your data** (indices, graphs) so that this data can be easily used with LLMs.
3. It provides an **advanced retrieval/query interface over your data**: Feed in any LLM input prompt, get back retrieved context and knowledge-augmented output.

LlamaIndex's high-level API allows beginner users to use LlamaIndex to ingest and query their data in just a few lines of code, whereas its lower-level APIs allow advanced users to customize and extend any module (data connectors, indices, retrievers, query engines, reranking modules), to fit their needs.

Vectara is implemented in LlamaIndex as a [Managed Service](https://docs.llamaindex.ai/en/stable/community/integrations/managed_indices.html#vectara), abstracting all of Vectara's powerful API so they are easily integrated into LlamaIndex.

In this notebook, we will demonstrate some of the great ways you can use Vectara together with LlamaIndex.

## Getting Started

If you're opening this Notebook on colab, you will probably need to install LlamaIndex 🦙.

To get started with Vectara, [sign up](https://vectara.com/integrations/llamaindex) (if you haven't already) and follow our [quickstart](https://docs.vectara.com/docs/quickstart) guide to create a corpus and an API key. 

Once you have these, you can provide them as environment variables, which will be used by the LlamaIndex code later on:

In [1]:
#!pip install -U llama-index llama-index-indices-managed-vectara arxiv

import os
# os.environ['VECTARA_API_KEY'] = "<YOUR_VECTARA_API_KEY>"
# os.environ['VECTARA_CORPUS_ID'] = "<YOUR_VECTARA_CORPUS_ID>"
# os.environ['VECTARA_CUSTOMER_ID'] = "<YOUR_VECTARA_CUSTOMER_ID>"

## Loading Data Into Vectara

As mentioned above, Vectara is a RAG managed service, and in many cases data may be uploaded to the index ahead of time (e.g. by using [Airbyte](https://docs.airbyte.com/integrations/destinations/vectara), directly via Vectara's [indexing API](https://docs.vectara.com/docs/api-reference/indexing-apis/indexing) or using tools like [vectara-ingest](https://github.com/vectara/vectara-ingest)), but another easy way is via the VectaraIndex constructor: `from_documents()`.

For this notebook, we will assume the Vectara corpus is empty and will load PDF documents from Arxiv, using Python's [arxiv](https://github.com/lukasschwab/arxiv.py) library. We will pull in data from the top papers related to "climate change":

In [2]:
import arxiv

client = arxiv.Client()
search = arxiv.Search(
  query = "(ti:embedding model) OR (ti:sentence embedding)",
  max_results = 100,
  sort_by = arxiv.SortCriterion.Relevance
)
papers = list(client.results(search))

In [3]:
[p.entry_id for p in papers][:5]

['http://arxiv.org/abs/2402.14776v2',
 'http://arxiv.org/abs/2007.01852v2',
 'http://arxiv.org/abs/1910.13291v1',
 'http://arxiv.org/abs/2104.06719v1',
 'http://arxiv.org/abs/1511.08198v3']

Next, download the Arxiv paper, and upload them into Vectara using the `add_file()`. 

In [4]:
import shutil
from llama_index.indices.managed.vectara import VectaraIndex

data_folder = 'temp'
os.makedirs(data_folder, exist_ok=True)

# Create Vectara Index
index = VectaraIndex()

# Upload content for all papers
for paper in papers:
    try:
        paper_fname = paper.download_pdf(data_folder)
    except Exception as e:
        print(f"File {paper_fname} failed to load with error {e}")
        continue
    metadata = {
        'url': paper.pdf_url,
        'title': paper.title,
        'author': str(paper.authors[0]),
        'published': str(paper.published.date())
    }
    index.insert_file(file_path=paper_fname, metadata=metadata)

shutil.rmtree(data_folder)
del papers

[nltk_data] Downloading package punkt_tab to
[nltk_data]     /opt/anaconda3/envs/llama_index/lib/python3.10/site-
[nltk_data]     packages/llama_index/core/_static/nltk_cache...
[nltk_data]   Package punkt_tab is already up-to-date!


File temp/1909.03104v2.Efficient_Sentence_Embedding_using_Discrete_Cosine_Transform.pdf failed to load with error HTTP Error 404: Not Found


Two important things to note here:
1. Vectara processes each file uploaded on the backend, and performs appropriate chunking. So you don't need to apply any local processing, or choose a chunking strategy. 
2. We have used the fields `url`, `title`, `author`, and `published` as metadata fields (for simplicity, author is the first author if there are multiple). You will need to make sure those fields are defined in your Vectara corpus as [filterable metadata fields](https://docs.vectara.com/docs/learn/metadata-search-filtering/filter-overview) to ensure we can filter by them in query time.

So that's it for upload. 

## Querying with the VectaraIndex
We can now ask questions using the `VectaraIndex` object.

In [5]:
query = "What is sentence embedding?"

MAY WANT TO CHANGE THIS TO MOCKINGBIRD FOR DEMONSTRATIVE PURPOSES

In [6]:
query_engine = index.as_query_engine(
    summary_enabled=True, summary_num_results=5,
    summary_response_lang="eng",
    summary_prompt_name="mockingbird-1.0-2024-07-16"
)
res = query_engine.query(query)
print(res.response)

Based on the provided sources, sentence embedding is a technique that represents words or sentences in a text in a vector space, where words or sentences that are closer in the vector space are more similar [2]. It is a vital technique in practical applications, enabling the extraction of parallel data from extensive web corpora across two languages, enhancing the training of high-quality machine translation and multilingual language models [1]. Sentence embedding is also instrumental for zero-shot cross-lingual retrieval, an essential method for enabling cross-lingual searches on e-commerce platforms like Amazon [1].

In the context of machine learning, sentence embedding is a form of word or sentence representation by learned representations that prepare texts in an understandable format for a machine [2]. It is used in various applications, including entailment classification, where a softmax inference classifier is used to classify sentences based on their embeddings [4][5].

Sourc

Note that the response here is fully generated by Vectara. There is no additional LLM involved (or API key you need to setup). The response also includes citations (marked in square brackets), which provide links to references used to generate this response by Vectara. 
<br>
The `res` object includes the actual response to the user query, but also has the citations:

In [9]:
[(inx, n.node.metadata['url']) for inx, n in enumerate(res.source_nodes)]

[(0, 'http://arxiv.org/pdf/2205.15744v2'),
 (1, 'http://arxiv.org/pdf/2206.02690v3'),
 (2, 'http://arxiv.org/pdf/1904.05542v1'),
 (3, 'http://arxiv.org/pdf/1904.05542v1'),
 (4, 'http://arxiv.org/pdf/1904.05542v1'),
 (5, 'http://arxiv.org/pdf/1904.05542v1'),
 (6, 'http://arxiv.org/pdf/1904.05542v1'),
 (7, 'http://arxiv.org/pdf/1904.05542v1'),
 (8, 'http://arxiv.org/pdf/1904.05542v1'),
 (9, 'http://arxiv.org/pdf/2404.03921v2')]

## Using Streaming

You can also stream the Vectara response simply by specifying `streaming=True`:

In [11]:
query_engine = index.as_query_engine(
    summary_enabled=True,
    summary_prompt_name="mockingbird-1.0-2024-07-16",
    streaming=True)

res = query_engine.query(query)

# print streamed output chunk by chunk
for chunk in res.response_gen:
    print(chunk.delta or "", end="", flush=True)

Based on the provided sources, sentence embedding is a technique that represents words or sentences in a text in a vector space, where words or sentences that are closer in the vector space are more similar [2]. It is a vital technique in practical applications, enabling the extraction of parallel data from extensive web corpora across two languages, enhancing the training of high-quality machine translation and multilingual language models [1]. Sentence embedding is also instrumental for zero-shot cross-lingual retrieval, an essential method for enabling cross-lingual searches on e-commerce platforms like Amazon [1].

In the context of machine learning, sentence embedding is a form of word or sentence representation by learned representations that prepare texts in an understandable format for a machine [2]. It is used in various applications, including entailment classification, softmax inference, and language translation [4, 5, 6, 7].

Sources: [1], [2], [4], [5], [6], [7]

## Reranking

Vectara supports three types of [reranking](https://docs.vectara.com/docs/api-reference/search-apis/reranking):
1. [Maximal Marginal Relevance](https://docs.vectara.com/docs/learn/mmr-reranker), or MMR, provides a reranking that can promote diversity in results at the cost of relevance.
2. [Slingshot](https://docs.vectara.com/docs/learn/vectara-multi-lingual-reranker) is a mulitilingual reranker that increases the accuracy of retrieved results across 100+ languages and is available to Vectara Scale customers.
3. [User Defined Functions](https://docs.vectara.com/docs/learn/user-defined-function-reranker) allow you to create your own functions for reranking search results, unlocking better retrieval in a wide variety of use cases, such as sorting by recency or price of a product.

 Let's see an example of how to use MMR: We will run the same query but this time we will use MMR where `mmr_diversity_bias=0.3` provides a tradeoff between relevance and diversity (0.0 is full relevance, 1.0 is only diversity):

In [12]:
query_engine = index.as_query_engine(
    similarity_top_k=5,
    reranker="mmr",
    rerank_k=50,
    mmr_diversity_bias=0.3,
)
response = query_engine.query(query)
print(response)

Sentence embedding is a technique used to represent sentences in a text in a way that encodes the meaning of the sentence in a multi-dimensional space. It allows for the extraction of parallel data across languages, aids in machine translation, multilingual language models, and cross-lingual retrieval. Various models exist, such as LASER, SBERT-distill, and LaBSE, each with different training efficiencies and architectures. Sentence embedding is crucial in natural language processing tasks like document classification and sentiment analysis, ensuring the preservation of original sentence meanings in embedded vectors.


In [13]:
[(inx, n.node.metadata['url']) for inx, n in enumerate(response.source_nodes)]

[(0, 'http://arxiv.org/pdf/2205.15744v2'),
 (1, 'http://arxiv.org/pdf/2205.15744v2'),
 (2, 'http://arxiv.org/pdf/2206.02690v3'),
 (3, 'http://arxiv.org/pdf/1904.05542v1'),
 (4, 'http://arxiv.org/pdf/1808.05505v3')]

As you can see, the results are now reranked in a way that provides more diversity instead of maximizing pure relevance. This in turn results in a different set of chunks used to generate the response.

Now let's see an example with a user defined function. We may be interested in getting results that are the most semantically similar to our question, but we also want the most up-to-date information. Thus, we can bias our search results so that the papers that are not only semantically similar but also published more recently are used to answer our query. We can do this by using the available time functions (to see other built-in functions, see the UDF Reranker [documentation](https://docs.vectara.com/docs/learn/user-defined-function-reranker)).

In [14]:
query_engine = index.as_query_engine(
    similarity_top_k=5,
    reranker="udf",
    udf_expression="max(0, 10 * get('$.score') - hours(seconds((to_unix_timestamp(now()) - to_unix_timestamp(datetime_parse(get('$.published'), 'yyyy-MM-dd'))))) / 24 / 365)",
)

response = query_engine.query("What innovations have been made to sentence embedding models?")
print(response)

Innovations in sentence embedding models include leveraging pooling strategies, similarity fine-tuning in a contrastive framework, utilizing prompts, implementing a two-step training process, and developing models for different languages like French and Japanese. Additionally, advancements involve open-sourcing code and pre-trained models, exploring unsupervised, task-independent models, and enhancing efficiency by using language models like BERT for sentence embeddings. These innovations aim to improve performance and explore new methods for sentence representation [1][2][3][4][5].


In [15]:
[(inx, n.node.metadata['published']) for inx, n in enumerate(response.source_nodes)]

[(0, '2024-05-30'),
 (1, '2023-01-19'),
 (2, '2017-03-07'),
 (3, '2020-05-22'),
 (4, '2024-04-05')]

Notice how many of the papers used to generate the final summary were published in the recent past and they still give us information to generate a relevant response that answers our question.

Also notice how we use a max() function with 0 in our user-defined expression. This is to ensure that all of our reranking scores are non-negative. Additionally, since we multiplied the original score by 10 and its value ranges from 0 to 1, we throw away any search results that are older than 10 years old for generating our final response.

So far we've used Vectara's internal summarization capability, which is the best way for most users.

You can still use Llama-Index's standard VectorStore `as_query_engine()` method, in which case Vectara's summarization won't be used, and you would be using an external LLM (e.g. OpenAI's GPT-4) and a custom prompt from LlamaIndex to generate the summary. For this option just set `summary_enabled=False`

For this functionality, you will need to specify your own OpenAI API key in the environment:

> `os.environ['OPENAI_API_KEY'] = '<YOUR_OPENAI_API_KEY>'`

In [17]:
from llama_index.llms.openai import OpenAI
llm = OpenAI(model="gpt-4-turbo", temperature=0)

query_engine = index.as_query_engine(
    similarity_top_k=5,
    summary_enabled=False,
    llm=llm
)
response = query_engine.query(query)
print(response)

Sentence embedding is a technique used in natural language processing to represent sentences as vectors in a high-dimensional space. These vectors encode the meaning of the sentences, allowing for various applications such as machine translation, information retrieval, and text classification. The goal is for sentences with similar meanings to be close to each other in this vector space, facilitating tasks that require understanding of sentence semantics.


## Using Vectara Chat

Vectara now fully supports Chat in its platform, where the chat history is maintained by Vectara and so you don't have to worry about keeping history and integrating it with your RAG pipeline. 

To use it, simply call `as_chat_engine()`.

(Chat mode always uses Vectara's summarization so you don't have to explicitly specify `summary_enabled=True` like before)

In [18]:
ce = index.as_chat_engine()

In [19]:
questions = [
    'What is a sentence embedding model?',
    'What are some known models?',
    'How are they different than token embedding models'
]

for q in questions:
    print(f"Question: {q}\n")
    response = ce.chat(q).response
    print(f"Response: {response}\n")

Question: What is a sentence embedding model?

Response: A sentence embedding model is a method to represent input sentences as fixed-dimensional vectors, regardless of sentence length. These models have shown significant enhancements in various natural language processing tasks like information retrieval, question answering, and machine translation. They are trained to adapt to specific domains by fine-tuning on synthesized data before further fine-tuning on labeled datasets, leading to improved performance [4] [3]. Additionally, some models are designed for cross-lingual applications, training towards both sentence-level and token-level alignment to improve representation translation [5].

Question: What are some known models?

Response: Well-known sentence embedding models include BERT, RoBERTa, ELMO, GenSen, DSE, FastSent, Quick-Thought, InferSent, Universal Sentence Encoder (USE), XLNet, and Sentence-BERT. These models utilize various methods such as fine-tuning, contextual word r

Of course streaming works as well with Chat:

In [20]:
ce = index.as_chat_engine(streaming=True)

In [21]:
response = ce.stream_chat("Who is behind SBERT?")
for chunk in response.chat_stream:
    print(chunk.delta or "", end="", flush=True)

The creators behind SBERT are Reimers and Gurevych, as mentioned in the search results [3].

# Advanced RAG with Vectara and LLamaIndex

## Agentic RAG

LlamaIndex provides various agent implementations such as ChainOfThough or React.

To use these with Vectara, you would need to use an external LLM as the driver of the agent resoning, and in this example we will be using OpenAI's GPT4o (for this to work, please make sure you have `OPENAI_API_KEY` defined in your environment).

In [24]:
from llama_index.core.agent import ReActAgent
from llama_index.core.tools import QueryEngineTool, ToolMetadata

vectara_tool = QueryEngineTool(
    query_engine=index.as_query_engine(
        summary_enabled=True,
        summary_num_results=5,
        summary_response_lang="en",
        summary_prompt_name="mockingbird-1.0-2024-07-16",
    ),
    metadata=ToolMetadata(
        name="Vectara",
        description="Vectara tool that can answer Questions about Embedding Models, NLP, and related topics.",
    ),
)
agent = ReActAgent.from_tools(
    tools=[vectara_tool],
    llm=llm,
    context="""
        You are a helpful chatbot that answers any user questions around embedding models in NLP using the Vectara tool.
        You break down complex questions into simpler ones and use the vectara tool to answer every question or sub-question.
        You use the Vectara tool to help answer the user question.
    """,
    verbose=True,
    max_iterations=20
)

In [25]:
question = """
    What are the sentence embedding models? 
    what are the best sentence embedding models, who created each model and in what year was the paper published?
    Compare and contrast their architecture, training procedure and performance
"""

print(agent.chat(question).response)

> Running step 89445c46-177a-4d07-938b-ef17201c6a81. Step input: 
    What are the sentence embedding models? 
    what are the best sentence embedding models, who created each model and in what year was the paper published?
    Compare and contrast their architecture, training procedure and performance

[1;3;38;5;200mThought: The current language of the user is: English. I need to use a tool to help me answer the question.
Action: Vectara
Action Input: {'input': 'What are sentence embedding models?'}
[0m[1;3;34mObservation: Sentence embedding models are a type of model that represents a given input sentence using a fixed dimensional vector, independent of the length of the input sentence [4]. These models are trained to obtain state-of-the-art sentence representations for general semantic textual similarity tasks [1]. They have significantly improved performance in numerous downstream NLP tasks such as information retrieval, question answering, and machine translation [4]. Sentence

## Using Auto Retriever with Vectara

LlamaIndex's auto-retriever functionality is really cool. 
It is most useful when you have metadata fields (like in our case of papers from Arxiv), and would like a query that references a metadata field to be automatically interpreted in the right way.

For example, if I ask "what is a paper about climate change risks published after 2020", the auto-retriever would (behind the scences) interpret ths into a query "what is a paper about climate change risks" along with a filter condition of "published > 2020"

Let's see how this works with the Vectara Index.
First, we have to define a `VectorStoreInfo` structure that defines the meta data fields the auto-retriever knows about to do its job:

In [26]:
from llama_index.core.vector_stores.types import MetadataInfo, VectorStoreInfo

vector_store_info = VectorStoreInfo(
    content_info="information about a paper",
    metadata_info=[
        MetadataInfo(
            name="published",
            description="The date the paper was published",
            type="string",
        ),
        MetadataInfo(
            name="author",
            description="The author of the paper",
            type="string",
        ),
        MetadataInfo(
            name="title",
            description="The title of the papers",
            type="string",
        ),
        MetadataInfo(
            name="url",
            description="The URL for this paper",
            type="string",
        ),
    ],
)

Auto-retrieval is implemented before calling Vectara as a query transformation. 

Now we can define the `VectaraAutoRetriever`, which can perform auto-retrieval using Vectara:

In [27]:
from llama_index.indices.managed.vectara import VectaraAutoRetriever

retriever = VectaraAutoRetriever(
    index,
    vector_store_info=vector_store_info,
    llm=llm,
    verbose=True
)
res = retriever.retrieve("What is sentence embedding, based on papers before 2019?")
[(r.metadata['published'], r.text) for r in res]

Using query str: What is sentence embedding?
Using implicit filters: [('published', '<', '2019')]
final filter string: (doc.published < '2019')


[]

As you can see, the Auto Retriever was able to translate the natural language text into a shorter query and a proper condition (in this case `doc.published < 2019`).

We can also of course ask a question directly: we use the `VectaraQueryEngine` which can work with the `VectaraAutoRetriever` directly:

In [28]:
from llama_index.indices.managed.vectara.query import VectaraQueryEngine
from llama_index.indices.managed.vectara import VectaraAutoRetriever

ar = VectaraAutoRetriever(
    index,
    vector_store_info=vector_store_info,
    llm=llm,
    summary_enabled=True,
    summary_num_results=5,
    verbose=True
)

query_engine = VectaraQueryEngine(retriever=ar)
response = query_engine.query("What is sentence embedding, based on papers before 2019?")
print(response)

Using query str: What is sentence embedding
Using implicit filters: [('published', '<', '2019')]
final filter string: (doc.published < '2019')
Sentence embedding is a crucial technique in Natural Language Processing (NLP) that transforms sentences into low-dimensional vector representations to capture their semantic meanings. Various models have been developed to create these embeddings, enhancing the performance of NLP tasks like machine translation, document classification, and sentiment analysis. These models aim to preserve the original sentence meanings effectively in the embedded vectors, allowing for improved analysis and classification of text data.


## Advanced querying with QueryFusionRetriever

The QueryFusion [Retriever](https://docs.llamaindex.ai/en/stable/examples/retrievers/reciprocal_rerank_fusion.html#reciprocal-rerank-fusion-retriever) is an advanced query mechanism whereby the original query is pre-processed to generate N variations. Each of these rephrased queries is then run against the Vectara engine and rank-fusion is used to combine the best results. 

Let's see this in action:

In [29]:
query = "is SBERT a dual encoder? what type of DL architecture does it use?"
query_engine = index.as_query_engine(
    similarity_top_k=3,
    summary_enabled=False,
    llm=llm,
)
response = query_engine.query(query)
print(response)

SBERT employs a siamese network architecture, which involves two parallel encoders that process two inputs independently. The outputs are then compared or combined in some way, typically to compute similarity or difference. This is distinct from a dual-encoder architecture where the focus is on mapping inputs to a unified embedding space for direct comparison or retrieval tasks.


In [30]:
from llama_index.core.retrievers import QueryFusionRetriever
from llama_index.core.query_engine import RetrieverQueryEngine
import nest_asyncio

rf_retriever = QueryFusionRetriever(
    [index.as_retriever(similarity_top_k=2)],
    similarity_top_k=2,
    num_queries=5,  # this includes the origianl query; set this to 1 to disable query generation
    mode="reciprocal_rerank",
    use_async=True,
    verbose=True,
)

nest_asyncio.apply()     # apply nested async to run in a notebook
query_engine = RetrieverQueryEngine.from_args(rf_retriever)
response = query_engine.query(query)
print(response)

Generated queries:
1. What is SBERT and how does it work as a dual encoder?
2. Comparison of SBERT with other dual encoder models in natural language processing.
3. Deep learning architecture used in SBERT for sentence embeddings.
4. Advantages and limitations of using a dual encoder like SBERT in NLP tasks.
SBERT is not a dual encoder. It uses a single embedding space instead of a separate embedding space, which means it does not employ a dual-encoder architecture.


We can see how the QueryFusionRetriever created additional query variations (they are displayed since we used `verbose=True`) and then the overall response includes the results fused together. This is very helpful in this case because the QueryFusionRetriever creates sub-questions that inquire about the specific architecture of SBERT which is relevant context to answering this question properly.

## Summary

In this notebook we've seen various examples for using Vectara with LlamaIndex, which provides the following benefits:
* Vectara provides a complete RAG pipeline, so you don't have to deal with a lot of the details around data ingestion: pre-processing, chunking, embedding, etc. Instead all these steps are handled automatically and efficiently in Vectara. 
* Being a platform, Vectara uses its own internal Embedding model (Boomerang), its own vector storage, and its own LLM (Mockingbird) for summarization, so you don't have to maintain separate API keys and relationships with additional vendors or install other products.
* Vectara is built for large scale GenAI applications, and with the tools provided by LlamaIndex like Auto Retrieval and Query Fusion, you can easily build and test advanced RAG applications at enteprise scale.