<a href="https://colab.research.google.com/github/vectara/example-notebooks/blob/main/notebooks/api-examples/4-agent-api.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Vectara Agent API Examples

This notebook demonstrates how to use Vectara's Agent REST APIs directly to create and interact with AI agents.

You'll learn how to:
1. Create an agent with custom instructions
2. Create agent sessions for conversations
3. Send messages to agents and get responses
4. Use streaming for real-time responses
5. Manage conversation history
6. Work with tools and tool servers

## About Vectara

[Vectara](https://vectara.com/) is the Agent Operating System for trusted enterprise AI: a unified Agentic RAG platform with built-in multi-modal retrieval, orchestration, and always-on governance. Deploy it on-prem (air-gapped), in your VPC, or as SaaS. Vectara agents deliver grounded answers and safe actions with source citations, step-level audit trails, fine-grained access controls, and real-time policy and factual-consistency enforcement, so teams ship faster with lower risk, and with trusted, production-grade AI agents at scale.

Vectara provides a complete API-first platform for building production RAG and agentic applications:

- **Simple Integration**: RESTful APIs and SDKs (Python, JavaScript) for quick integration into any stack
- **Flexible Deployment**: Choose SaaS, VPC, or on-premises deployment based on your security requirements
- **Multi-Modal Support**: Index and search across text, tables, and images from PDFs, documents, and structured data
- **Advanced Retrieval**: Hybrid search combining semantic and keyword matching with state-of-the-art reranking
- **Grounded Generation**: LLM responses with citations and factual consistency scores to reduce hallucinations
- **Enterprise-Ready**: Built-in access controls, audit logging, and compliance (SOC2, HIPAA) from day one

## Getting Started

This notebook assumes you've completed Notebooks 1-2, and potentially 3:
- Notebook 1: Created two corpora (ai-research-papers and vectara-docs)
- Notebook 2: Ingested AI research papers and Vectara documentation
- Notebook 3: Queried the data with various techniques

Now we'll create agents that can autonomously search and reason across this data.

## Setup

In [1]:
import os
import requests
import json
import uuid
from datetime import datetime

# Get credentials from environment variables
api_key = os.environ['VECTARA_API_KEY']

# Get corpus keys from environment (from Notebook 1)
research_corpus_key = 'tutorial-ai-research-papers'
docs_corpus_key = 'tutorial-vectara-docs'

# Base API URL
BASE_URL = "https://api.vectara.io/v2"

# Common headers
headers = {
    "x-api-key": api_key,
    "Content-Type": "application/json"
}

## Step 1: Create a Basic Agent

Create an agent with custom instructions that can search your corpus:

In [2]:
# Define agent configuration - this agent can access both corpora
# Agent structure in v2 API uses a first_step with instructions, corpora, and generation preset
agent_name = "RAG Research Assistant"
agent_config = {
    "name": agent_name,
    "description": "Agent that can answer questions about RAG, embeddings, and retrieval from both research papers and documentation",
    "model": { "name": "gpt-4o" },
    "first_step": {
        "type": "conversational",
        "instructions": [
            {
                "type": "inline",
                "name": "first set of instructions",
                "template": """
You are an expert AI research assistant specializing in Retrieval Augmented Generation and AI Agents. 
You have access to both academic research papers and Vectara's technical documentation. 
Provide clear, accurate answers with citations. 
When answering, combine theoretical insights from research with practical implementation guidance from documentation.
                            """
            }
        ],
        "output_parser": {"type": "default"}
    },
    
    "tool_configurations": {
        "research_paper_search": {
            "type": "corpora_search",
            "query_configuration": {
                "search": {
                    "corpora": [
                        {
                            "corpus_key": research_corpus_key
                        }
                    ]
                },
                "generation": {
                    "generation_preset_name": "vectara-summary-table-md-query-ext-jan-2025-gpt-4o",
                    "model_parameters": {
                        "llm_name": "gpt-4o",
                        "temperature": 0.0
                    }
                }
            }
        },
        "vectara_doc_search": {
            "type": "corpora_search",
            "query_configuration": {
                "search": {
                    "corpora": [
                        {
                            "corpus_key": docs_corpus_key
                        }
                    ]
                },
                "generation": {
                    "generation_preset_name": "vectara-summary-table-md-query-ext-jan-2025-gpt-4o",
                    "model_parameters": {
                        "llm_name": "gpt-4o",
                        "temperature": 0.0
                    }
                }
            }
        }

    }
}

# Check if agent already exists
print(f"Checking if agent '{agent_name}' already exists...")
list_url = f"{BASE_URL}/agents"
list_response = requests.get(list_url, headers=headers)

agent_key = None
if list_response.status_code == 200:
    agents = list_response.json().get('agents', [])
    for agent in agents:
        if agent.get('name') == agent_name:
            agent_key = agent.get('key')
            print(f"✓ Agent already exists!")
            print(f"  Agent Key: {agent_key}")
            print(f"  Agent Name: {agent['name']}")
            print(f"  First Step: {agent.get('first_step', {}).get('type', 'N/A')}")
            break

# Create the agent only if it doesn't exist
if not agent_key:
    print(f"Creating new agent '{agent_name}'...")
    url = f"{BASE_URL}/agents"
    response = requests.post(url, headers=headers, json=agent_config)
    
    print(f"Status Code: {response.status_code}")
    if response.status_code == 201:
        agent_data = response.json()
        agent_key = agent_data["key"]
        print(f"✓ Agent Created!")
        print(f"  Agent Key: {agent_key}")
        print(f"  Agent Name: {agent_data['name']}")
        print(f"  First Step: {agent_data.get('first_step', {}).get('type', 'N/A')}")
    else:
        print(f"Error: {response.text}")

Checking if agent 'RAG Research Assistant' already exists...
✓ Agent already exists!
  Agent Key: agt_rag_research_assistant_4627
  Agent Name: RAG Research Assistant
  First Step: conversational


## Step 2: Create an Agent Session

Sessions maintain conversation context across multiple turns:

In [3]:
# Create a new session with a unique name to allow reruns
session_name = f"Technical Support Chat {datetime.now().strftime('%Y%m%d-%H%M%S')}"
session_config = {
    "name": session_name,
    "metadata": {
        "user_type": "developer",
        "session_purpose": "api_questions"
    }
}

url = f"{BASE_URL}/agents/{agent_key}/sessions"
response = requests.post(url, headers=headers, json=session_config)

print(f"Status Code: {response.status_code}")
if response.status_code == 201:
    session_data = response.json()
    session_key = session_data["key"]
    print(f"✓ Session Created!")
    print(f"  Session Name: {session_name}")
    print(f"  Session Key: {session_key}")
else:
    print(f"Error: {response.text}")

Status Code: 201
✓ Session Created!
  Session Name: Technical Support Chat 20251101-120947
  Session Key: ase_technical_support_chat_20251101-120947_a5d9


## Step 3: Send Messages to the Agent

Send a message and get a response:

In [4]:
# Send a message to the agent
# The correct format uses a messages array with message objects
message_data = {
    "messages": [
        {
            "type": "text",
            "content": "What is retrieval augmented generation and how can I implement it with Vectara?"
        }
    ],
    "stream_response": False
}

url = f"{BASE_URL}/agents/{agent_key}/sessions/{session_key}/events"
response = requests.post(url, headers=headers, json=message_data)

if response.status_code == 201:
    event_data = response.json()
    print(f"\nAgent Response:")
    # The response typically contains the assistant's message in the events
    if 'events' in event_data:
        for event in event_data['events']:
            if event.get('type') == 'agent_output':
                print(event.get('content', 'No content'))
    else:
        print(event_data)
else:
    print(f"Error: {response.text}")


Agent Response:
Retrieval-Augmented Generation (RAG) is a model that enhances language generation by combining pre-trained parametric and non-parametric memory. This approach is particularly useful for knowledge-intensive NLP tasks, as it retrieves relevant information from external sources to augment the model's responses [research_1].

To implement RAG with Vectara, you can leverage their advanced capabilities in document indexing, neural retrieval, and RAG enhancement. Vectara's platform supports AI-driven search and retrieval functionality, which is crucial for generative AI applications. Their Python SDK allows you to access semantic search, RAG, and document management capabilities from your Python applications, enabling you to manage corpora and documents, apply metadata, and run advanced queries [vectara_1][vectara_4]. Additionally, Vectara's platform is designed to minimize hallucinations, support over 100 languages, and provide enterprise-grade security for AI solutions [vec

## Step 4: Multi-Turn Conversation

The agent maintains conversation context automatically:

In [5]:
# First message
message_1 = {
    "messages": [
        {
            "type": "text",
            "content": "What is hybrid search?"
        }
    ],
    "stream_response": False
}

url = f"{BASE_URL}/agents/{agent_key}/sessions/{session_key}/events"
response = requests.post(url, headers=headers, json=message_1)

print("User: What is hybrid search?")

if response.status_code == 201:
    event_data = response.json()
    print(f"\nAgent Response:")
    if 'events' in event_data:
        for event in event_data['events']:
            if event.get('type') == 'agent_output':
                print(event.get('content', 'No content'))
    else:
        print(event_data)
else:
    print(f"Error: {response.text}")
    
print("\n" + "="*80 + "\n")

User: What is hybrid search?

Agent Response:
Hybrid search in information retrieval involves combining parametric memory with non-parametric, retrieval-based memories. This approach allows for the direct revision and expansion of knowledge, and the accessed knowledge can be inspected and interpreted. Such hybrid models can address various issues by leveraging both types of memory to enhance the retrieval process [research_1].

In the context of Vectara, hybrid search combines semantic and keyword capabilities, allowing for more comprehensive search results by leveraging both semantic understanding and traditional keyword matching [vectara_2].




In [6]:
# Follow-up message (agent remembers context)
message_2 = {
    "messages": [
        {
            "type": "text",
            "content": "What are its main benefits?"
        }
    ],
    "stream_response": False
}

response = requests.post(url, headers=headers, json=message_2)

print("User: What are its main benefits?")
if response.status_code == 201:
    event_data = response.json()
    print(f"\nAgent Response:")
    if 'events' in event_data:
        for event in event_data['events']:
            if event.get('type') == 'agent_output':
                print(event.get('content', 'No content'))
    else:
        print(event_data)
else:
    print(f"Error: {response.text}")

User: What are its main benefits?

Agent Response:
The main benefits of hybrid search, which combines parametric and non-parametric memory, include:

1. **Enhanced Retrieval Accuracy**: By leveraging both semantic understanding and keyword matching, hybrid search can provide more accurate and relevant search results. This dual approach allows the system to understand the context and meaning behind queries, not just the literal keywords.

2. **Knowledge Expansion and Revision**: Hybrid models allow for the direct revision and expansion of knowledge. This means that the system can continuously learn and update its knowledge base, improving over time.

3. **Interpretability**: The accessed knowledge in hybrid search can be inspected and interpreted, providing transparency in how search results are generated. This can be particularly beneficial in applications where understanding the reasoning behind search results is important.

4. **Flexibility**: By combining different types of memory, 

In [7]:
# Another follow-up
message_3 = {
    "messages": [
        {
            "type": "text",
            "content": "Can you give me an example?"
        }
    ],
    "stream_response": False
}

response = requests.post(url, headers=headers, json=message_3)

print("User: Can you give me an example?")
if response.status_code == 201:
    event_data = response.json()
    print(f"\nAgent Response:")
    # The response typically contains the assistant's message in the events
    if 'events' in event_data:
        for event in event_data['events']:
            if event.get('type') == 'agent_output':
                print(event.get('content', 'No content'))
    else:
        print(event_data)
else:
    print(f"Error: {response.text}")

User: Can you give me an example?

Agent Response:
Certainly! Let's consider an example of a hybrid search system in an e-commerce platform:

Imagine you're shopping online for a new laptop. You type in the search query "lightweight laptop with long battery life." A hybrid search system would process this query using both semantic and keyword-based approaches:

1. **Semantic Search**: The system understands the intent behind your query. It recognizes that "lightweight" refers to the physical weight of the laptop and "long battery life" refers to the duration the laptop can operate on a single charge. It uses this understanding to prioritize laptops that are known for these features, even if the exact words "lightweight" or "long battery life" aren't explicitly mentioned in the product descriptions.

2. **Keyword Search**: Simultaneously, the system performs a traditional keyword search to find products that explicitly mention "lightweight" and "long battery life" in their descriptions 