<a href="https://colab.research.google.com/github/vectara/example-notebooks/blob/main/notebooks/vectara-python-sdk.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Vectara's Python SDK (Beta)

In this notebook we are going to show how to use Vectara's new Python SDK

## About Vectara

[Vectara](https://vectara.com/) is the trusted AI Assistant and Agent platform which focuses on enterprise readiness for mission-critical applications. 

Vectara provides an end-to-end managed service for Retrieval Augmented Generation or [RAG](https://vectara.com/grounded-generation/), which includes:

1. An integrated API for processing input data, including text extraction from documents and ML-based chunking.

2. The state-of-the-art [Boomerang](https://vectara.com/how-boomerang-takes-retrieval-augmented-generation-to-the-next-level-via-grounded-generation/) embeddings model. Each text chunk is encoded into a vector embedding using Boomerang, and stored in the Vectara internal knowledge (vector+text) store. Thus, when using Vectara with LlamaIndex you do not need to call a separate embedding model - this happens automatically within the Vectara backend.

3. A query service that automatically encodes the query into embeddings and retrieves the most relevant text segmentsthrough [hybrid search](https://docs.vectara.com/docs/api-reference/search-apis/lexical-matching) and a variety of [reranking](https://docs.vectara.com/docs/api-reference/search-apis/reranking) strategies, including a [multilingual reranker](https://docs.vectara.com/docs/learn/vectara-multi-lingual-reranker), [maximal marginal relevance (MMR) reranker](https://docs.vectara.com/docs/learn/mmr-reranker), [user-defined function reranker](https://docs.vectara.com/docs/learn/user-defined-function-reranker), and a [chain reranker](https://docs.vectara.com/docs/learn/chain-reranker) that provides a way to chain together multiple reranking methods to achieve better control over the reranking, combining the strengths of various reranking methods.

4. An option to create a [generative summary](https://docs.vectara.com/docs/learn/grounded-generation/grounded-generation-overview) with a wide selection of LLM summarizers (including Vectara's [Mockingbird](https://vectara.com/blog/mockingbird-is-a-rag-specific-llm-that-beats-gpt-4-gemini-1-5-pro-in-rag-output-quality/), trained specifically for RAG-based tasks), based on the retrieved documents, including citations.

The main benefits of using Vectara RAG-as-a-service to build your application are:
* **Accuracy and Quality**: Vectara provides an end-to-end platform that focuses on eliminating hallucinations, reducing bias, and safeguarding copyright integrity.
* **Security**: Vectara's platform provides acess control--protecting against prompt injection attacks--and meets SOC2 and HIPAA compliance.
* **Explainability**: Vectara makes it easy to troubleshoot bad results by clearly explaining rephrased queries, LLM prompts, retrieved results, and agent actions.

## Getting Started

To get started with Vectara, [sign up](https://console.vectara.com/signup?utm_source=vectara&utm_medium=signup&utm_term=DevRel&utm_content=example-notebooks&utm_campaign=vectara-signup-DevRel-example-notebooks) (if you haven't already) and follow our [quickstart](https://docs.vectara.com/docs/quickstart) guide to create a corpus and an API key. 

Once you have these, you need to first install the Vectara SDK

`pip install vectara`

In [1]:
import os
corpus_key = os.environ['VECTARA_CORPUS_KEY']
api_key = os.environ['VECTARA_API_KEY']

## Creating the SDK client

Before using the SDK, you need to create the Vectara client, and authenticate it.
This can be done using an API key or OAuth, we will use the API key approach:

In [2]:
from vectara import Vectara
client = Vectara(api_key=api_key)

## Loading Data Into Vectara

As with Vectara's API, there are two ways to ingest data into vectara: uploading files or indexing text. Here we will demonstrate the file upload path - we will load PDF documents from Arxiv, using Python's [arxiv](https://github.com/lukasschwab/arxiv.py) library. We will pull in data from the top papers related to "LLM hallucinations":

In [3]:
import arxiv

ax_client = arxiv.Client()
search = arxiv.Search(
  query = "(ti:LLM hallucinations) OR (ti:LLM hallucinations)",
  max_results = 20,
  sort_by = arxiv.SortCriterion.Relevance
)
papers = list(ax_client.results(search))

In [4]:
urls = [p.entry_id for p in papers]
print(urls[:3])

['http://arxiv.org/abs/2407.00215v1', 'http://arxiv.org/abs/2402.02643v1', 'http://arxiv.org/abs/2409.00159v2']


Next, after we have these papers, let's upload them to Vectara

In [5]:
import requests
from slugify import slugify

for url in urls:
    response = requests.get(url)
    content = response.content
    
    client.upload.file(
        corpus_key=corpus_key,
        file=content,
        filename=slugify(url),
    )

For each file, as always, Vectara processes each file uploaded on the backend, and performs appropriate chunking. So you don't need to apply any local processing, or choose a chunking strategy. 

## Running a query:
Once the files are loaded, let's try to run some queries.

In [6]:
from vectara import (
    SearchCorporaParameters, ContextConfiguration, GenerationParameters, 
    KeyedSearchCorpus, CustomerSpecificReranker
)

search = SearchCorporaParameters(
    corpora=[
        KeyedSearchCorpus(
            corpus_key=corpus_key,
            metadata_filter="",
            lexical_interpolation=0.005,
        )
    ],
    context_configuration=ContextConfiguration(
        sentences_before=2,
        sentences_after=2,
    ),
    reranker=CustomerSpecificReranker(
        reranker_id="rnk_272725719"
    ),
)
generation = GenerationParameters(
    response_language="eng",
    enable_factual_consistency_score=True,
)

res = client.query(
    query="What is a hallucination?",
    search=search,
    generation=generation
)
print(res.summary)

Hallucinations are problematic phenomena that can occur in artificial intelligence systems, particularly in large language models (LLMs) used for various tasks like text generation and question-answering [1]. They involve the generation of false or misleading information, impacting the reliability of AI systems [3]. Research is being conducted to understand the causes of AI hallucinations and their significance in the field of artificial intelligence [1]. Efforts include investigating how LLMs respond when providing correct answers versus when hallucinating, aiming to determine the awareness and extent of hallucinations in these models [2]. Detection of hallucinations in LLMs is challenging and requires innovative approaches, such as benchmarking their hallucination tendencies and utilizing automated frameworks for efficient detection [5].


The citations are also easily accessible

In [7]:
print(res.search_results[:2])

[IndividualSearchResult(text='Authors:Alessandro Bruno, Pier Luigi Mazzeo, Aladine Chetouani, Marouane Tliba, Mohamed Amine Kerkouri            View a PDF of the paper titled Insights into Classifying and Mitigating LLMs\' Hallucinations, by Alessandro Bruno and 4 other authors\n    View PDF\n\n\n\n    \n            Abstract:The widespread adoption of large language models (LLMs) across diverse AI applications is proof of the outstanding achievements obtained in several tasks, such as text mining, text generation, and question answering. However, LLMs are not exempt from drawbacks. One of the most concerning aspects regards the emerging problematic phenomena known as "Hallucinations". They manifest in text generation systems, particularly in question-answering systems reliant on LLMs, potentially resulting in false or misleading information propagation. This paper delves into the underlying causes of AI hallucination and elucidates its significance in artificial intelligence.', score=0

The response also include Vectara's FCS (Factual Consistency Score)

In [8]:
res.factual_consistency_score

0.7078585

## Using Streaming with the SDK

It's easy to run a streaming query with the SDK

In [9]:
search = SearchCorporaParameters(
    corpora=[
        KeyedSearchCorpus(
            corpus_key=corpus_key,
            metadata_filter="",
            lexical_interpolation=0.005,
        )
    ],
    context_configuration=ContextConfiguration(
        sentences_before=2,
        sentences_after=2,
    ),
    reranker=CustomerSpecificReranker(
        reranker_id="rnk_272725719"
    ),
)
generation = GenerationParameters(
    response_language="eng",
    enable_factual_consistency_score=True,
)

response = client.query_stream(
    query="What is an LLM hallucination",
    search=search,
    generation=generation
    
)
for chunk in response:
    if chunk.type == 'generation_chunk':
        print(chunk.generation_chunk, end='', flush=True)

Large Language Models (LLMs) can experience hallucinations, where they generate responses that are not real or accurate [3]. These hallucinations pose challenges in practical applications, especially in tasks like code generation, where complex contextual dependencies are involved [4]. To understand LLM hallucinations, researchers investigate how LLMs are aware of hallucination and react differently when providing accurate answers versus when hallucinating [1]. Traditional methods that aim to mitigate LLM hallucinations by grounding them in external knowledge sources may not fully eliminate hallucinations in practice [5]. The phenomenon of LLM hallucinations highlights the need for rethinking generalization strategies to address the challenges posed by these inaccuracies in LLM outputs.

## Using Vectara Chat with the SDK

Now let's see how we can use the SDK's chat functionality.


In [10]:
from vectara import CitationParameters, ChatParameters

search = SearchCorporaParameters(
    corpora=[
        KeyedSearchCorpus(
            corpus_key=corpus_key,
            metadata_filter="",
            lexical_interpolation=0.005,
        )
    ],
    context_configuration=ContextConfiguration(
        sentences_before=2,
        sentences_after=2,
    ),
    reranker=CustomerSpecificReranker(
        reranker_id="rnk_272725719"
    ),
)
generation = GenerationParameters(
    response_language="eng",
    citations=CitationParameters(
        style="none",
    ),
    enable_factual_consistency_score=True,
)
chat = ChatParameters(store=True)

session = client.create_chat_session(
    search=search,
    generation=generation,
    chat_config=chat,
)

With chat, we start by creating a chat session. Once it's there, we can ask the first question, and follow with subsequent questions (turns) in the conversation.

In [11]:
response = session.chat(query="What are hallucinations?")
print(response.answer)

Hallucinations in the context of AI refer to the emergence of problematic phenomena in text generation systems, particularly in question-answering systems reliant on Large Language Models (LLMs) [1]. These hallucinations can lead to the generation of false or misleading information, posing a significant challenge to the reliability of LLMs [1]. Researchers have categorized hallucinations in LLM-generated content and explored strategies to detect and mitigate them, aiming to enhance the overall trustworthiness of AI-generated outputs [3]. Despite efforts to ground LLMs in external knowledge sources, conventional approaches have struggled to fully eliminate hallucinations, indicating the complexity of addressing this issue [5]. By studying hallucinations in a structured form such as graphs, researchers have highlighted the diversity of topological hallucinations produced by modern LLMs, underscoring the need for deeper understanding and innovative solutions to tackle this phenomenon [4].

In [12]:
response = session.chat(query="And why do they occur?")
print(response.answer)

Hallucinations in Artificial Intelligence, particularly in Large Language Models (LLMs), are a concerning phenomenon observed in tasks like code generation and question answering. LLMs can generate outputs that deviate from the intended input, exhibit inconsistencies, or provide factually incorrect information. Traditional approaches have struggled to fully explain why LLMs experience hallucinations in practice, leading researchers to explore the mechanisms and mitigation strategies for these occurrences. Understanding the types and extent of hallucinations in different AI applications is crucial for detecting and addressing these issues effectively [4][5]. The distinction between different types of hallucinations, such as those arising from ignorance or errors in knowledge retrieval, is highlighted as a key aspect in mitigating hallucinations in AI systems [5].
