<a href="https://colab.research.google.com/github/vectara/example-notebooks/blob/main/notebooks/api-examples/6-artifacts.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Vectara Artifacts: Working with Files in Agent Sessions

This notebook demonstrates how to use Vectara's **Artifacts** feature to work with files in agent sessions. Artifacts enable agents to access and process files without storing them permanently in a corpus.

You'll learn how to:
1. Upload files (PDFs, images, documents) to an agent session
2. List artifacts in a session
3. Retrieve artifact details and content
4. Create agents with artifact-processing tools
5. Have agents analyze uploaded files and generate new artifacts

## What Are Artifacts?

In Vectara, Artifacts are **session-specific file storage** that enable agents to work with files on-the-fly:

- **Temporary Storage**: Files persist within a session without permanent indexing
- **Multi-Modal Support**: Handle PDFs, DOCX, PPTX, and images (PNG, JPEG, GIF, WebP)
- **Session Scope**: Artifacts remain attached to their specific session
- **Two-Way Flow**: Users can upload files, and agents can generate new artifacts

## Getting Started

This notebook assumes you've completed Notebooks 1-5:
- Notebook 1: Created corpora
- Notebook 2: Ingested data
- Notebook 3: Queried data
- Notebook 4: Created agents and sessions
- Notebook 5: Built multi-agent workflows with sub-agents

Now we'll extend agent capabilities by working with file artifacts.

## Setup

In [17]:
import os
import requests
import json
import base64
from datetime import datetime
import mimetypes

# Get credentials from environment variables
api_key = os.environ['VECTARA_API_KEY']

# Base API URL
BASE_URL = "https://api.vectara.io/v2"

# Common headers (JSON requests)
headers = {
    "x-api-key": api_key,
    "Content-Type": "application/json"
}

## Step 1: Create an Agent with Artifact Tools

To work with artifacts, agents need specific tool configurations. The key artifact-related tools are:

- **`artifact_read`**: Read text content from files (PDF, DOCX, TXT)
- **`image_read`**: Analyze visual files (PNG, JPEG)
- **`document_conversion`**: Convert between formats (e.g., PDF to markdown)
- **`artifact_grep`**: Search for patterns within artifact content
- **`artifact_create`**: Generate new artifacts (reports, summaries, code)

Let's create an agent configured to work with artifacts:

In [18]:
# Helper function to delete and create agent
def delete_and_create_agent(agent_config, agent_name):
    """Delete agent if it exists, then create a new one."""
    list_response = requests.get(f"{BASE_URL}/agents", headers=headers)

    if list_response.status_code == 200:
        agents = list_response.json().get('agents', [])
        for agent in agents:
            if agent.get('name') == agent_name:
                existing_key = agent['key']
                print(f"Deleting existing agent '{agent_name}' ({existing_key})")
                delete_response = requests.delete(f"{BASE_URL}/agents/{existing_key}", headers=headers)
                if delete_response.status_code == 204:
                    print(f"Deleted agent: {existing_key}")
                break

    response = requests.post(f"{BASE_URL}/agents", headers=headers, json=agent_config)

    if response.status_code == 201:
        agent_data = response.json()
        print(f"Created agent '{agent_name}'")
        print(f"Agent Key: {agent_data['key']}")
        return agent_data['key']
    else:
        print(f"Error creating agent: {response.status_code}")
        print(f"{response.text}")
        return None

In [19]:
# Create an agent with artifact processing capabilities
artifact_agent_config = {
    "name": "Document Analyst",
    "description": "Agent that analyzes uploaded documents and images, and can generate reports",
    "model": {"name": "gpt-4o"},
    "first_step": {
        "type": "conversational",
        "instructions": [
            {
                "type": "inline",
                "name": "artifact_instructions",
                "template": """You are a document analysis assistant that helps users understand and process their files.

When a user uploads a file:
1. Acknowledge the upload and identify the file type
2. Use appropriate tools to read and analyze the content
3. Provide clear, structured insights
4. Offer to create derivative artifacts (summaries, reports) if helpful

Always be thorough in your analysis and cite specific sections of documents when relevant."""
            }
        ],
        "output_parser": {"type": "default"}
    },
    "tool_configurations": {
        "artifact_read": {
            "type": "artifact_read"
        },
        "image_read": {
            "type": "image_read"
        },
        "document_conversion": {
            "type": "document_conversion"
        },
        "artifact_grep": {
            "type": "artifact_grep"
        }
        # Note: artifact_create is a built-in tool that doesn't need explicit configuration
    }
}

artifact_agent_key = delete_and_create_agent(artifact_agent_config, "Document Analyst")

if not artifact_agent_key:
    print("\n⚠️  Agent creation failed. Please check the error above.")
    print("    Subsequent cells will not work without a valid agent.")

Created agent 'Document Analyst'
Agent Key: agt_document_analyst_dbbb


## Step 2: Create a Session for Working with Artifacts

Artifacts are session-scoped, so we need to create a session first:

In [20]:
# Create a session for artifact operations
session_key = None  # Initialize to prevent NameError in later cells

if not artifact_agent_key:
    print("⚠️  No agent available. Run the agent creation cell first.")
else:
    session_name = f"Artifact Demo {datetime.now().strftime('%Y%m%d-%H%M%S')}"
    session_config = {
        "name": session_name,
        "metadata": {
            "purpose": "artifact_demo"
        }
    }

    response = requests.post(
        f"{BASE_URL}/agents/{artifact_agent_key}/sessions",
        headers=headers,
        json=session_config
    )

    if response.status_code == 201:
        session_data = response.json()
        session_key = session_data["key"]
        print(f"✓ Session Created: {session_key}")
        print(f"  Session Name: {session_name}")
    else:
        print(f"Error creating session: {response.status_code}")
        print(response.text)

✓ Session Created: ase_artifact_demo_20251215-171157_44f8
  Session Name: Artifact Demo 20251215-171157


## Step 3: Upload an Artifact

Upload files to a session using multipart form data via the **events** endpoint:

```
POST /v2/agents/{agent_key}/sessions/{session_key}/events
```

The upload creates an `ArtifactUploadEvent` that stores the file in the session's workspace.

Let's create a sample text file and upload it:

In [31]:
# Create a sample document to upload
sample_document = """# Quarterly Sales Report - Q3 2024

## Executive Summary
Q3 2024 showed strong growth across all product lines with total revenue of $4.2M,
representing a 23% increase over Q2 2024.

## Key Metrics
- Total Revenue: $4,200,000
- New Customers: 127
- Customer Retention Rate: 94%
- Average Deal Size: $33,071

## Product Performance
1. Enterprise Suite: $2.1M (50% of revenue)
2. Professional Plan: $1.3M (31% of revenue)
3. Starter Plan: $800K (19% of revenue)

## Regional Breakdown
- North America: 45% ($1.89M)
- Europe: 30% ($1.26M)
- Asia Pacific: 25% ($1.05M)

## Q4 Outlook
Based on current pipeline, we project Q4 revenue of $4.8-5.2M.
Key initiatives include:
- Launch of new AI features
- Expansion into Latin American markets
- Enterprise customer success program
"""

# Save to a temporary file
temp_file_path = "/tmp/q3_sales_report.md"
with open(temp_file_path, "w") as f:
    f.write(sample_document)

print(f"Created sample document: {temp_file_path}")
print(f"Document size: {len(sample_document)} characters")

Created sample document: /tmp/q3_sales_report.md
Document size: 774 characters


In [32]:
# Upload the file as an artifact via the events endpoint
artifact_id = None  # Initialize to prevent NameError in later cells

if not artifact_agent_key or not session_key:
    print("⚠️  Agent or session not available. Run previous cells first.")
else:
    upload_url = f"{BASE_URL}/agents/{artifact_agent_key}/sessions/{session_key}/events"    
    filename = "q3_sales_report.md"
    content_type = mimetypes.guess_type(filename)[0] or "application/octet-stream"
    
    with open(temp_file_path, "rb") as f:
        files = {"files": (filename, f, content_type)}
        data = {"stream_response": "false"}    
        resp = requests.post(upload_url, headers=headers, files=files, data=data)
    
    if response.status_code in [200, 201]:
        event_data = response.json()
        print("✓ Artifact uploaded successfully!")
        print(json.dumps(event_data, indent=2))
        
        # Extract artifact_id from the upload event response
        for event in event_data.get("events", []):
            if event.get("type") == "artifact_upload":
                artifact_id = event.get("artifact_id") or event.get("id")
                print(f"\nArtifact ID: {artifact_id}")
                break
    else:
        print(f"Error uploading artifact: {response.status_code}")
        print(response.text)

✓ Artifact uploaded successfully!
{
  "artifacts": [],
  "metadata": {
    "page_key": ""
  }
}


## Step 4: List Artifacts in a Session

Retrieve all artifacts currently stored in a session:

```
GET /v2/agents/{agent_key}/sessions/{session_key}/artifacts
```

In [30]:
# List all artifacts in the session
if not artifact_agent_key or not session_key:
    print("⚠️  Agent or session not available. Run previous cells first.")
else:
    list_url = f"{BASE_URL}/agents/{artifact_agent_key}/sessions/{session_key}/artifacts"
    response = requests.get(list_url, headers=headers)

    if response.status_code == 200:
        artifacts = response.json()
        print("Artifacts in session:")
        print(json.dumps(artifacts, indent=2))
    else:
        print(f"Error listing artifacts: {response.status_code}")
        print(response.text)

Artifacts in session:
{
  "artifacts": [],
  "metadata": {
    "page_key": ""
  }
}


## Step 5: Get a Specific Artifact

Retrieve the details and content of a specific artifact:

```
GET /v2/agents/{agent_key}/sessions/{session_key}/artifacts/{artifact_id}
```

In [8]:
# Get details of a specific artifact
if not artifact_agent_key or not session_key:
    print("⚠️  Agent or session not available. Run previous cells first.")
elif not artifact_id:
    print("⚠️  No artifact_id available. Upload an artifact first.")
else:
    get_url = f"{BASE_URL}/agents/{artifact_agent_key}/sessions/{session_key}/artifacts/{artifact_id}"
    response = requests.get(get_url, headers=headers)

    if response.status_code == 200:
        artifact_details = response.json()
        print("Artifact details:")
        
        # Print metadata (excluding potentially large content)
        for key, value in artifact_details.items():
            if key != "content" and key != "data":
                print(f"  {key}: {value}")
        
        # If content is base64 encoded, decode and show preview
        if "content" in artifact_details:
            try:
                decoded = base64.b64decode(artifact_details["content"]).decode("utf-8")
                print(f"\nContent preview (first 500 chars):\n{decoded[:500]}...")
            except Exception:
                print(f"\nContent (base64): {artifact_details['content'][:100]}...")
    else:
        print(f"Error getting artifact: {response.status_code}")
        print(response.text)

⚠️  No artifact_id available. Upload an artifact first.


## Step 6: Have the Agent Analyze the Artifact

Now let's ask the agent to analyze our uploaded document:

In [9]:
# Helper function to chat with the agent
def chat_with_agent(agent_key, session_key, message, show_events=False):
    """Send a message to an agent and return the response."""
    message_data = {
        "messages": [
            {
                "type": "text",
                "content": message
            }
        ],
        "stream_response": False
    }
    
    url = f"{BASE_URL}/agents/{agent_key}/sessions/{session_key}/events"
    response = requests.post(url, headers=headers, json=message_data)
    
    if response.status_code == 201:
        event_data = response.json()
        
        if show_events:
            print("\n------ Agent Events ------")
            for event in event_data.get('events', []):
                event_type = event.get('type', 'unknown')
                print(f"Event: {event_type}")
                if event_type == 'tool_input':
                    tool_name = event.get('tool_configuration_name', 'N/A')
                    print(f"  Tool: {tool_name}")
                if event_type == 'tool_output':
                    tool_name = event.get('tool_configuration_name', 'N/A')
                    print(f"  Tool: {tool_name}")
            print("-" * 25 + "\n")
        
        # Extract agent output
        for event in event_data.get('events', []):
            if event.get('type') == 'agent_output':
                return event.get('content', 'No content')
        
        return "No agent output found"
    else:
        return f"Error: {response.status_code} - {response.text}"

In [10]:
# Ask the agent to analyze the uploaded document
if not artifact_agent_key or not session_key:
    print("⚠️  Agent or session not available. Run previous cells first.")
else:
    query = "I've uploaded a Q3 sales report. Can you analyze it and tell me the key insights?"
    print(f"User: {query}")
    print("\n" + "="*80 + "\n")

    response = chat_with_agent(
        artifact_agent_key,
        session_key,
        query,
        show_events=True
    )

    print(f"Agent Response:\n\n{response}")

User: I've uploaded a Q3 sales report. Can you analyze it and tell me the key insights?



------ Agent Events ------
Event: input_message
Event: agent_output
-------------------------

Agent Response:

Please provide me with some details about the uploaded file, such as its format (e.g., PDF, Excel, Word), so I can proceed with the analysis using the appropriate tools.


## Step 7: Ask the Agent to Create a New Artifact

Agents can generate new artifacts based on their analysis. Let's ask for an executive summary:

In [11]:
# Ask the agent to create a summary artifact
if not artifact_agent_key or not session_key:
    print("⚠️  Agent or session not available. Run previous cells first.")
else:
    query = "Based on the sales report, please create a one-page executive brief that I can share with the board. Save it as an artifact."
    print(f"User: {query}")
    print("\n" + "="*80 + "\n")

    response = chat_with_agent(
        artifact_agent_key,
        session_key,
        query,
        show_events=True
    )

    print(f"Agent Response:\n\n{response}")

User: Based on the sales report, please create a one-page executive brief that I can share with the board. Save it as an artifact.



------ Agent Events ------
Event: input_message
Event: agent_output
-------------------------

Agent Response:

To create an executive brief from the Q3 sales report, I need to open and analyze the content of the report first. Please let me know the format of the file you've uploaded (e.g., PDF, Word), so I can access the content and begin my analysis.


In [12]:
# List artifacts again to see the new one created by the agent
if not artifact_agent_key or not session_key:
    print("⚠️  Agent or session not available. Run previous cells first.")
else:
    list_url = f"{BASE_URL}/agents/{artifact_agent_key}/sessions/{session_key}/artifacts"
    response = requests.get(list_url, headers=headers)

    if response.status_code == 200:
        artifacts = response.json()
        print("All artifacts in session (including agent-generated):")
        print(json.dumps(artifacts, indent=2))
    else:
        print(f"Error: {response.status_code}")
        print(response.text)

All artifacts in session (including agent-generated):
{
  "artifacts": [],
  "metadata": {
    "page_key": ""
  }
}


## Step 8: Working with Images

Artifacts also support image analysis. Let's create and upload a simple image for the agent to analyze:

In [13]:
# Create a simple chart image using matplotlib (if available)
if not artifact_agent_key or not session_key:
    print("⚠️  Agent or session not available. Run previous cells first.")
else:
    try:
        import matplotlib.pyplot as plt
        import matplotlib
        matplotlib.use('Agg')  # Non-interactive backend
        
        # Create a simple bar chart
        regions = ['North America', 'Europe', 'Asia Pacific']
        revenue = [1.89, 1.26, 1.05]
        
        plt.figure(figsize=(8, 5))
        bars = plt.bar(regions, revenue, color=['#2E86AB', '#A23B72', '#F18F01'])
        plt.title('Q3 2024 Revenue by Region ($M)', fontsize=14, fontweight='bold')
        plt.ylabel('Revenue (Millions USD)')
        plt.ylim(0, 2.5)
        
        # Add value labels on bars
        for bar, val in zip(bars, revenue):
            plt.text(bar.get_x() + bar.get_width()/2, bar.get_height() + 0.05, 
                     f'${val}M', ha='center', va='bottom', fontweight='bold')
        
        plt.tight_layout()
        
        # Save the chart
        chart_path = "/tmp/revenue_chart.png"
        plt.savefig(chart_path, dpi=100, bbox_inches='tight')
        plt.close()
        
        print(f"Created chart: {chart_path}")
        
        # Upload the image artifact via the events endpoint
        upload_url = f"{BASE_URL}/agents/{artifact_agent_key}/sessions/{session_key}/events"
        
        with open(chart_path, "rb") as f:
            # Match the cURL format: file (singular), message, stream_response
            files = {
                "file": ("revenue_chart.png", f, "image/png")
            }
            data = {
                "message": "Please analyze this revenue chart.",
                "stream_response": "false"
            }
            response = requests.post(upload_url, headers=upload_headers, files=files, data=data)
        
        if response.status_code in [200, 201]:
            print("✓ Chart uploaded successfully!")
            event_data = response.json()
            print(json.dumps(event_data, indent=2))
        else:
            print(f"Error uploading chart: {response.status_code}")
            print(response.text)
            
    except ImportError:
        print("matplotlib not available. Skipping image example.")
        print("To run this example, install matplotlib: pip install matplotlib")

Created chart: /tmp/revenue_chart.png
Error uploading chart: 400
{"field_errors":{"body":"Part name \"file\" is unknown or not in the proper order."},"request_id":"a18ac437763dbaf0e59b3615fd67ad52"}


In [14]:
# Ask the agent to analyze the chart
if not artifact_agent_key or not session_key:
    print("⚠️  Agent or session not available. Run previous cells first.")
else:
    query = "I've uploaded a revenue chart. Can you describe what you see and provide any insights?"
    print(f"User: {query}")
    print("\n" + "="*80 + "\n")

    response = chat_with_agent(
        artifact_agent_key,
        session_key,
        query,
        show_events=True
    )

    print(f"Agent Response:\n\n{response}")

User: I've uploaded a revenue chart. Can you describe what you see and provide any insights?



------ Agent Events ------
Event: input_message
Event: tool_input
  Tool: image_read
Event: tool_output
  Tool: image_read
Event: agent_output
Event: tool_input
  Tool: image_read
Event: tool_output
  Tool: image_read
Event: agent_output
-------------------------

Agent Response:

To provide you with insights from the revenue chart, I'll need to view and analyze the image. I'll proceed to open the image and provide you with a detailed description and any insights I can gather. Please hold on momentarily.


## Step 9: Search Within Artifacts

Use `artifact_grep` to search for specific patterns within uploaded documents:

In [15]:
# Ask the agent to search within the document
if not artifact_agent_key or not session_key:
    print("⚠️  Agent or session not available. Run previous cells first.")
else:
    query = "Search the sales report for any mentions of revenue amounts and list them all."
    print(f"User: {query}")
    print("\n" + "="*80 + "\n")

    response = chat_with_agent(
        artifact_agent_key,
        session_key,
        query,
        show_events=True
    )

    print(f"Agent Response:\n\n{response}")

User: Search the sales report for any mentions of revenue amounts and list them all.



------ Agent Events ------
Event: input_message
Event: agent_output
-------------------------

Agent Response:

To proceed with searching the sales report for revenue amounts, I'll need to know the format of the document you uploaded (e.g., PDF, Word). This information will allow me to use the appropriate tools to search through the content effectively. Could you please provide that detail?


## Summary

In this notebook, we explored Vectara's Artifacts feature:

1. **Created an agent** with artifact-processing tools (`artifact_read`, `image_read`, `document_conversion`, `artifact_grep`)
2. **Uploaded files** to a session via the events endpoint
3. **Listed and retrieved** artifacts from the session
4. **Had the agent analyze** uploaded documents and images
5. **Generated new artifacts** (summaries, reports) through agent interactions
6. **Searched within artifacts** using pattern matching

### Key API Endpoints

| Operation | Method | Endpoint |
|-----------|--------|----------|
| Upload | POST | `/v2/agents/{agent_key}/sessions/{session_key}/events` |
| List | GET | `/v2/agents/{agent_key}/sessions/{session_key}/artifacts` |
| Get | GET | `/v2/agents/{agent_key}/sessions/{session_key}/artifacts/{artifact_id}` |

**Note**: Uploads use the `events` endpoint with `multipart/form-data`, which creates an `ArtifactUploadEvent`.

### Artifact Tools

| Tool | Configuration | Purpose |
|------|---------------|---------|
| `artifact_read` | Required | Read text from documents (PDF, DOCX, TXT) |
| `image_read` | Required | Analyze images (PNG, JPEG, GIF, WebP) |
| `document_conversion` | Required | Convert between formats |
| `artifact_grep` | Required | Search patterns in content |
| `artifact_create` | **Built-in** | Generate new artifacts (no config needed) |

## Cleanup (Optional)

Delete the agent created in this notebook:

In [16]:
# Delete the artifact agent
if artifact_agent_key:
    response = requests.delete(f"{BASE_URL}/agents/{artifact_agent_key}", headers=headers)
    if response.status_code == 204:
        print(f"Deleted agent: {artifact_agent_key}")
    else:
        print(f"Error deleting agent: {response.text}")

# Clean up temporary files
import os
for temp_file in ["/tmp/q3_sales_report.md", "/tmp/revenue_chart.png"]:
    if os.path.exists(temp_file):
        os.remove(temp_file)
        print(f"Removed: {temp_file}")

Deleted agent: agt_document_analyst_d9ef
Removed: /tmp/q3_sales_report.md
Removed: /tmp/revenue_chart.png
