<a href="https://colab.research.google.com/github/vectara/example-notebooks/blob/main/notebooks/api-examples/5-sub-agents.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Vectara Sub-Agents: Building Modular AI Workflows

This notebook demonstrates how to use Vectara's **sub-agents** capability to build modular, specialized AI workflows. Sub-agents allow a parent agent to delegate tasks to specialized child agents, enabling:

- **Context isolation**: Each sub-agent maintains its own conversation history
- **Specialized configuration**: Each sub-agent can have distinct instructions and tools
- **Reusability**: Build once, invoke from any parent agent
- **Parallel execution**: Run multiple sub-agents simultaneously
- **Better performance**: Smaller, focused agents make fewer mistakes

## About Vectara

[Vectara](https://vectara.com/) is the Agent Operating System for trusted enterprise AI: a unified Agentic RAG platform with built-in multi-modal retrieval, orchestration, and always-on governance. Deploy it on-prem (air-gapped), in your VPC, or as SaaS.

Vectara provides a complete API-first platform for building production RAG and agentic applications:

- **Simple Integration**: RESTful APIs and SDKs (Python, JavaScript) for quick integration into any stack
- **Flexible Deployment**: Choose SaaS, VPC, or on-premises deployment based on your security requirements
- **Multi-Modal Support**: Index and search across text, tables, and images from PDFs, documents, and structured data
- **Advanced Retrieval**: Hybrid search combining semantic and keyword matching with state-of-the-art reranking
- **Grounded Generation**: LLM responses with citations and factual consistency scores to reduce hallucinations
- **Enterprise-Ready**: Built-in access controls, audit logging, and compliance (SOC2, HIPAA) from day one

## Getting Started

This notebook assumes you've completed Notebooks 1-4:
- Notebook 1: Created two corpora (ai-research-papers and vectara-docs)
- Notebook 2: Ingested AI research papers and Vectara documentation
- Notebook 3: Queried the data with various techniques
- Notebook 4: Created agents that can search and reason across data

Now we'll create a multi-agent system where specialized sub-agents handle domain-specific tasks.

## Why Sub-Agents?

When agents face complex, multi-step tasks, they often run into context window limits or need specialized capabilities. Consider a comprehensive research assistant that needs to:

1. Analyze academic papers for theoretical foundations
2. Search product documentation for implementation details
3. Synthesize findings into actionable recommendations

A single monolithic agent trying to handle all of this might:
- Become confused between different instruction sets
- Consume excessive context with domain-specific guidelines
- Produce lower quality results due to competing priorities

**Sub-agents solve this by delegation**: the parent agent orchestrates, while specialized sub-agents focus on their domains.

## Setup

In [1]:
import os
import requests
import json
from datetime import datetime

# Get credentials from environment variables
api_key = os.environ['VECTARA_API_KEY']

# Corpus keys from previous notebooks
research_corpus_key = 'tutorial-ai-research-papers'
docs_corpus_key = 'tutorial-vectara-docs'

# Base API URL
BASE_URL = "https://api.vectara.io/v2"

# Common headers
headers = {
    "x-api-key": api_key,
    "Content-Type": "application/json"
}

print(f"Research Corpus: {research_corpus_key}")
print(f"Docs Corpus: {docs_corpus_key}")

Research Corpus: tutorial-ai-research-papers
Docs Corpus: tutorial-vectara-docs


## Step 1: Create Specialized Sub-Agents

We'll create three specialized agents that will serve as sub-agents:

1. **Research Paper Analyst**: Expert at analyzing academic papers on RAG, embeddings, and retrieval
2. **Documentation Expert**: Expert at finding implementation guidance from Vectara docs
3. **Web Search Expert**: Expert at searching the web for current information and news

Each agent has focused instructions and tools optimized for its domain.

### Sub-Agent 1: Research Paper Analyst

In [2]:
# Helper function to delete and create agent
def delete_and_create_agent(agent_config, agent_name):
    """Delete agent if it exists, then create a new one."""
    # Check if agent already exists and delete it
    list_response = requests.get(f"{BASE_URL}/agents", headers=headers)

    if list_response.status_code == 200:
        agents = list_response.json().get('agents', [])
        for agent in agents:
            if agent.get('name') == agent_name:
                existing_key = agent['key']
                print(f"Deleting existing agent '{agent_name}' ({existing_key})")
                delete_response = requests.delete(f"{BASE_URL}/agents/{existing_key}", headers=headers)
                if delete_response.status_code == 204:
                    print(f"Deleted agent: {existing_key}")
                else:
                    print(f"Error deleting {existing_key}: {delete_response.text}")
                break

    # Create new agent
    response = requests.post(f"{BASE_URL}/agents", headers=headers, json=agent_config)

    if response.status_code == 201:
        agent_data = response.json()
        print(f"Created agent '{agent_name}'")
        print(f"Agent Key: {agent_data['key']}")
        return agent_data['key']
    else:
        print(f"Error creating agent: {response.status_code}")
        print(f"{response.text}")
        return None

In [3]:
# Create Research Paper Analyst sub-agent
reranker_config = {
    "type": "chain",
    "rerankers": [
        {
            "type": "customer_reranker",
            "reranker_id": "rnk_272725719", 
            "limit": 25,
        },
        {
            "type": "mmr",
            "diversity_bias": 0.05
        }
    ],
}

generation_config = {
    "generation_preset_name": "vectara-summary-table-md-query-ext-jan-2025-gpt-4o",
    "max_used_search_results": 10,
    "model_parameters": {
        "llm_name": "gpt-4o",
        "temperature": 0.0
    }
}

research_analyst_config = {
    "name": "Research Paper Analyst",
    "description": "Specialized agent for analyzing academic research papers on RAG, embeddings, and retrieval techniques",
    "model": {"name": "gpt-4o"},
    "first_step": {
        "type": "conversational",
        "instructions": [
            {
                "type": "inline",
                "name": "research_analyst_instructions",
                "template": """You are an expert academic research analyst specializing in AI, machine learning, and natural language processing.

Your expertise includes:
- Retrieval Augmented Generation (RAG) architectures
- Dense and sparse retrieval methods
- Embedding models and vector representations
- Transformer architectures and attention mechanisms
- Information retrieval benchmarks and evaluation metrics

When analyzing research papers:
1. Identify the key contributions and novel techniques
2. Explain technical concepts clearly with examples
3. Highlight practical implications and limitations
4. Compare with related work when relevant
5. Provide citations to the source papers

Always use tools to retrieve relevant content to answer user queries.
Always ground your response in the retrieved content. 
If you cannot answer the user question from the retrieved content from tools, just say "I don't know"

IMPORTANT: When responding, provide a complete, self-contained summary that includes all relevant findings."""
            }
        ],
        "output_parser": {"type": "default"}
    },
    "tool_configurations": {
        "research_search": {
            "type": "corpora_search",
            "query_configuration": {
                "search": {
                    "corpora": [{"corpus_key": research_corpus_key}],
                    "limit": 100,
                    "context_configuration": {
                        "sentences_before": 2,
                        "sentences_after": 2
                    },
                    "reranker": reranker_config,                   
                },
#                "generation": generation_config,
                "save_history": True,
            }
        }
    }
}

research_analyst_key = delete_and_create_agent(research_analyst_config, "Research Paper Analyst")

Created agent 'Research Paper Analyst'
Agent Key: agt_research_paper_analyst_c526


### Sub-Agent 2: Documentation Expert

In [4]:
# Create Documentation Expert sub-agent

docs_expert_config = {
    "name": "Documentation Expert",
    "description": "Specialized agent for finding implementation guidance and best practices from Vectara documentation",
    "model": {"name": "gpt-4o"},
    "first_step": {
        "type": "conversational",
        "instructions": [
            {
                "type": "inline",
                "name": "docs_expert_instructions",
                "template": """You are a Vectara platform expert who helps developers implement AI solutions.

Your expertise includes:
- Vectara API integration (indexing, querying, agents)
- Corpus management and configuration
- Search optimization (hybrid search, reranking, filters)
- RAG implementation best practices
- SDK usage and code examples

When providing guidance:
1. Give specific, actionable implementation steps using the API.
2. Include relevant API endpoints and parameters
3. Your examples should show how to use the API, not using Vectara SDK.
4. Highlight configuration options and trade-offs
5. Point to relevant documentation sections

Always use tools to retrieve relevant content to answer user queries.
Always ground your response in the retrieved content. 
If you cannot answer the user question from the retrieved content from tools, just say "I don't know"

When responding, provide a complete, self-contained answer with all implementation details."""
            }
        ],
        "output_parser": {"type": "default"}
    },
    "tool_configurations": {
        "docs_search": {
            "type": "corpora_search",
            "query_configuration": {
                "search": {
                    "corpora": [{"corpus_key": docs_corpus_key}],
                    "limit": 100,
                    "context_configuration": {
                        "sentences_before": 2,
                        "sentences_after": 2
                    },
                    "reranker": reranker_config,                   
                },
#                "generation": generation_config,
                "save_history": True,
            }
        }
    }
}

docs_expert_key = delete_and_create_agent(docs_expert_config, "Documentation Expert")

Created agent 'Documentation Expert'
Agent Key: agt_documentation_expert_f9f7


### Sub-Agent 3: Web Search Expert

In [5]:
# Create Web Search Expert sub-agent

web_search_expert_config = {
    "name": "Web Search Expert",
    "description": "Specialized agent for searching the web for current information, news, and general knowledge",
    "model": {"name": "gpt-4o"},
    "first_step": {
        "type": "conversational",
        "instructions": [
            {
                "type": "inline",
                "name": "web_search_expert_instructions",
                "template": """You are a web search expert who helps find current and relevant information from the internet.

Your expertise includes:
- Finding up-to-date information on any topic
- Researching current events and news
- Locating authoritative sources and references
- Comparing information across multiple sources
- Fact-checking and verification

When searching and responding:
1. Use web search to find relevant, current information
2. Prioritize authoritative and credible sources
3. Provide context about when information was published
4. Cite your sources with URLs when available
5. Synthesize information from multiple sources when appropriate

Always use the web search tool to find information.
If you cannot find relevant information, say so clearly.

IMPORTANT: When responding, provide a complete, well-sourced answer with citations."""
            }
        ],
        "output_parser": {"type": "default"}
    },
    "tool_configurations": {
        "web_search": {
            "type": "web_search"
        }
    }
}

web_search_expert_key = delete_and_create_agent(web_search_expert_config, "Web Search Expert")

Created agent 'Web Search Expert'
Agent Key: agt_web_search_expert_5b19


## Step 2: Create the Parent Orchestrator Agent

Now we'll create a parent agent that can delegate to both sub-agents. The parent agent:
- Analyzes user requests to determine which sub-agent(s) to invoke
- Delegates domain-specific tasks to the appropriate sub-agent
- Synthesizes responses from multiple sub-agents into a cohesive answer

### Sub-Agent Tool Configuration

Sub-agents are configured as tools using the `sub_agent` type:

```json
{
  "type": "sub_agent",
  "description_template": "Description the LLM sees when deciding to use this tool",
  "sub_agent_configuration": {
    "agent_key": "the_sub_agent_key",
    "session_mode": "llm_controlled"
  }
}
```

### Session Modes

When specifying in the parent agent how sub-agents are to be used, you need to define the "session mode":
- **`llm_controlled`** (default): The LLM decides whether to resume an existing session or create a new one
- **`persistent`**: Always reuse the same session, accumulating knowledge across invocations
- **`ephemeral`**: Create a fresh session every time, ensuring no state leakage

In [6]:
# Create the Orchestrator Agent with sub-agent tools
orchestrator_config = {
    "name": "AI Research Orchestrator",
    "description": "Orchestrator agent that delegates to specialized sub-agents for comprehensive AI research assistance",
    "model": {"name": "gpt-4o"},
    "first_step": {
        "type": "conversational",
        "instructions": [
            {
                "type": "inline",
                "name": "orchestrator_instructions",
                "template": """You are an AI research orchestrator that helps users understand and implement AI technologies.

Your role is to:
1. Analyze the user's question to determine what expertise is needed.
2. Use appropriate sub-agent(s) to get information needed to answer the user query.
3. Synthesize sub-agent responses into a comprehensive answer.
4. Bridge theory and practice when both are relevant.

When synthesizing, clearly indicate which insights come from research, documentation, or web search."""
            }
        ],
        "output_parser": {"type": "default"}
    },
    "tool_configurations": {
        "research_analyst": {
            "type": "sub_agent",
            "description_template": "Delegate academic research analysis tasks to a specialized research paper analyst. Use for: theoretical foundations, algorithm explanations, research paper analysis, academic citations, and comparisons between research approaches.",
            "sub_agent_configuration": {
                "agent_key": research_analyst_key,
                "session_mode": "ephemeral"       # Fresh context for each analysis
            }
        },
        "docs_expert": {
            "type": "sub_agent",
            "description_template": "Delegate implementation and documentation questions to a Vectara documentation expert. Use for: API usage, code examples, configuration guidance, best practices, and troubleshooting.",
            "sub_agent_configuration": {
                "agent_key": docs_expert_key,
                "session_mode": "ephemeral"
            }
        },
        "web_search_expert": {
            "type": "sub_agent",
            "description_template": "Delegate web search tasks to find current information, news, and general knowledge from the internet. Use for: recent developments, current events, fact-checking, finding authoritative sources, and information not available in research papers or documentation.",
            "sub_agent_configuration": {
                "agent_key": web_search_expert_key,
                "session_mode": "ephemeral"
            }
        }
    }
}

orchestrator_key = delete_and_create_agent(orchestrator_config, "AI Research Orchestrator")

Created agent 'AI Research Orchestrator'
Agent Key: agt_ai_research_orchestrator_522e


## Step 3: Test the Multi-Agent Workflow

Now let's test the orchestrator with different types of questions to see how it delegates to sub-agents.

In [7]:
# Helper function to send messages and display responses
def chat_with_agent(agent_key, session_key, message, show_events=False):
    """Send a message to an agent and return the response."""
    message_data = {
        "messages": [
            {
                "type": "text",
                "content": message
            }
        ],
        "stream_response": False
    }
    
    url = f"{BASE_URL}/agents/{agent_key}/sessions/{session_key}/events"
    response = requests.post(url, headers=headers, json=message_data)
    
    if response.status_code == 201:
        event_data = response.json()
        
        if show_events:
            print("\n------ All Events ------")
            for event in event_data.get('events', []):
                event_type = event.get('type', 'INVALID')
                print(f"  Event type: {event_type}")
                if event_type == 'tool_input':
                    print(f"    Tool: {event.get('tool_configuration_name', 'N/A')}")
                    if event.get("tool_input", None):
                        print(f"    Tool input: {event["tool_input"]["message"]}")
                if event_type == 'tool_output':
                    print(f"    Tool: {event.get('tool_configuration_name', 'N/A')}")
                    if event.get("tool_output", None):
                        print(f"    Tool output: {event["tool_output"]["sub_agent_response"]}...")
            print("-"*20)
        
        # Extract agent output
        for event in event_data.get('events', []):
            if event.get('type') == 'agent_output':
                return event.get('content', 'No content')
        
        return "No agent output found"
    else:
        return f"Error: {response.status_code} - {response.text}"

In [8]:
# Create a session for the orchestrator
session_name = f"Orchestrator Session {datetime.now().strftime('%Y%m%d-%H%M%S')}"
session_config = {
    "name": session_name,
    "metadata": {
        "purpose": "sub_agent_demo"
    }
}

response = requests.post(
    f"{BASE_URL}/agents/{orchestrator_key}/sessions",
    headers=headers,
    json=session_config
)

if response.status_code == 201:
    session_data = response.json()
    orchestrator_session_key = session_data["key"]
    print(f"Session Created: {orchestrator_session_key}")
else:
    print(f"Error: {response.text}")

Session Created: ase_orchestrator_session_20251203-225154_0c84


### Test 1: Research-Focused Question

This question should primarily use the research_analyst sub-agent:

In [9]:
query = "What are the key innovations in the original RAG paper?"
print(f"User: {query}")
print("\n" + "="*80)

response = chat_with_agent(
    orchestrator_key,
    orchestrator_session_key,
    query,
    show_events=True
)

print(f"Agent Response:\n{response}")

User: What are the key innovations in the original RAG paper?


------ All Events ------
  Event type: input_message
  Event type: tool_input
    Tool: research_analyst
    Tool input: Provide a detailed summary of the key innovations introduced in the original Retrieval-Augmented Generation (RAG) paper. Identify the novel aspects of the RAG architecture, any significant improvements it brings over existing methods, and its impact on text generation and information retrieval tasks.
  Event type: tool_output
    Tool: research_analyst
    Tool output: The "Retrieval-Augmented Generation" (RAG) model, introduced by Lewis et al. in 2020, presents a novel architecture that combines the flexibility of pre-trained seq2seq parametric models with the ability to augment outputs with non-parametric memory through retrieval from external data sources like Wikipedia. Here are the key innovations and contributions of the RAG architecture:

1. **Hybrid Memory System**: RAG incorporates both parametr

### Test 2: Implementation-Focused Question

This question should primarily use the docs_expert sub-agent:

In [10]:
query = "How do I configure hybrid search with Vectara's API?"
print(f"User: {query}")
print("\n" + "="*80)

response = chat_with_agent(
    orchestrator_key,
    orchestrator_session_key,
    query,
    show_events=True,
)

print(f"Agent Response:\n{response}")

User: How do I configure hybrid search with Vectara's API?


------ All Events ------
  Event type: input_message
  Event type: tool_input
    Tool: docs_expert
    Tool input: Provide step-by-step guidance on configuring hybrid search using Vectara's API. Include any necessary API endpoints, parameters, and best practices for setting up a successful hybrid search configuration.
  Event type: tool_output
    Tool: docs_expert
    Tool output: To configure hybrid search using Vectara's API, you'll need to integrate both semantic and lexical search components to enhance search results based on both keyword matching and semantic understanding. Here's a step-by-step guide to setting it up:

### Step 1: Indexing Your Content
1. **Index Documents:**
   - Use the [Indexing API](https://docs.vectara.com/docs/api-reference/indexing-apis/indexing) to add documents to your corpus. This process transforms your data into a format that enables efficient search and retrieval.
   - Example Request:
  

### Test 3: Comprehensive Question (multiple sub-agents)

This question should use both sub-agents and synthesize their responses:

In [11]:
query = "Explain how dense retrieval works theoretically, and show me how to implement it with Vectara."
print(f"User: {query}")
print("\n" + "="*80)

response = chat_with_agent(
    orchestrator_key,
    orchestrator_session_key,
    query,
    show_events=True
)

print(f"Agent Response:\n{response}")

User: Explain how dense retrieval works theoretically, and show me how to implement it with Vectara.


------ All Events ------
  Event type: input_message
  Event type: tool_input
    Tool: research_analyst
    Tool input: Explain the theoretical foundation of dense retrieval in the context of information retrieval. Focus on how it differs from traditional sparse retrieval methods, the underlying algorithms such as dense vector representations, and their usage in neural networks for search tasks.
  Event type: tool_input
    Tool: docs_expert
    Tool input: Provide a step-by-step guide on implementing dense retrieval using Vectara's API. Include details on setting up the API, necessary endpoints, configurations, and any best practices for achieving optimal performance.
  Event type: tool_output
    Tool: research_analyst
    Tool output: Dense retrieval methods represent a significant advancement in information retrieval, contrasting traditional sparse retrieval approaches like BM25.

### Test 4: Current Information Question (Web Search)

This question asks about recent developments that wouldn't be in our indexed research papers or documentation, demonstrating the value of the web search sub-agent:

In [12]:
query = "What are the latest developments in RAG technology in 2025? Are there any new techniques or frameworks that have emerged recently?"
print(f"User: {query}")
print("\n" + "="*80)

response = chat_with_agent(
    orchestrator_key,
    orchestrator_session_key,
    query,
    show_events=True
)

print(f"Agent Response:\n{response}")

User: What are the latest developments in RAG technology in 2025? Are there any new techniques or frameworks that have emerged recently?


------ All Events ------
  Event type: input_message
  Event type: tool_input
    Tool: web_search_expert
    Tool input: Find the latest developments in Retrieval-Augmented Generation (RAG) technology as of 2025. Look for new techniques, frameworks, advancements in algorithms, and any notable academic or industry applications that have emerged recently.
  Event type: tool_output
    Tool: web_search_expert
    Tool output: Recent advancements in Retrieval-Augmented Generation (RAG) technology as of 2025 indicate significant developments in both the academic and industrial spheres. Here are some key points:

1. **Real-Time Retrieval and Hybrid Search**: RAG technology is being enhanced with capabilities for real-time retrieval and hybrid search. This allows AI systems to incorporate up-to-date information rapidly, improving the accuracy and relevanc

## Cleanup (Optional)

If you want to delete the agents created in this notebook:

In [13]:
# Delete parent first since it depends on sub-agents

agents_to_delete = [
    orchestrator_key,
    research_analyst_key,
    docs_expert_key,
    web_search_expert_key
]

for agent_key in agents_to_delete:
    if agent_key:
        response = requests.delete(f"{BASE_URL}/agents/{agent_key}", headers=headers)
        if response.status_code == 204:
            print(f"Deleted agent: {agent_key}")
        else:
            print(f"Error deleting {agent_key}: {response.text}")

Deleted agent: agt_ai_research_orchestrator_522e
Deleted agent: agt_research_paper_analyst_c526
Deleted agent: agt_documentation_expert_f9f7
Deleted agent: agt_web_search_expert_5b19
