**Note**: This notebook uses `await` syntax. Run it in a Jupyter environment with IPython kernel, or use `asyncio.run()` wrapper for regular Python scripts.

# Multimodal Content with v-router 🖼️

This notebook demonstrates how to use v-router with multimodal content (images, PDFs, and Word documents). The v-router library provides a unified interface for sending images and documents to different LLM providers.

## Overview

v-router supports:
- **Images**: JPEG, PNG, GIF, WebP formats
- **Documents**: PDF files and Word documents (.docx)
- **Automatic conversion**: File paths are automatically converted to base64
- **Unified interface**: Same API across all providers

## Setup

First, let's set up our environment and imports:

## Quick Start: Using Local Files

The easiest way to send images and PDFs is by passing file paths directly:

In [1]:
# Example using the PDF file included in this repository
from v_router import Client, LLM

# Create a client
client = Client(
    llm_config=LLM(
        model_name="claude-3-5-sonnet-20241022",
        provider="anthropic"
    )
)

# Send a PDF file by just passing its path
pdf_path = "providers/assets/gameboy_color.pdf"
messages = [
    {
        "role": "user",
        "content": pdf_path  # v-router automatically detects and converts the PDF
    },
    {
        "role": "user", 
        "content": "What is this document about? Give me a brief summary."
    }
]

response = await client.messages.create(messages=messages, max_tokens=200)
print("PDF Summary:")
print(response.content[0].text)

[32m2025-06-12 16:03:26,965 - v_router.router - INFO - Trying primary model: claude-3-5-sonnet-20241022 on anthropic[0m


PDF Summary:
This is an instruction manual/booklet for the Nintendo Game Boy Color handheld video game system. The document covers various aspects of the device including:

1. Introduction and basic features
2. Component descriptions and diagrams
3. Battery installation instructions
4. Usage instructions and setup
6. Information about compatible Game Paks (game cartridges)
7. Two-player gaming setup using Game Link cables
8. Screen color settings
9. Troubleshooting guide
10. Warranty information
11. Parts list and order form

The manual provides detailed technical specifications, safety information, and operating instructions for users of the Game Boy Color, which was a color screen upgrade to Nintendo's original Game Boy handheld gaming system. It includes clear diagrams and step-by-step instructions for various features and functions of the device.


In [2]:
import base64
import httpx
from pathlib import Path

# Import content types if you need to create multimodal messages manually
from v_router.classes.message import TextContent, ImageContent, DocumentContent

## Sending Images

### Method 1: Base64 Encoded Images

You can send images by providing base64-encoded data directly:

In [3]:
# Download a sample image
image_url = "https://upload.wikimedia.org/wikipedia/commons/thumb/a/a7/Camponotus_flavomarginatus_ant.jpg/320px-Camponotus_flavomarginatus_ant.jpg"
response = httpx.get(image_url)
image_data = base64.b64encode(response.content).decode("utf-8")

# Create a client with Anthropic
client = Client(
    llm_config=LLM(
        model_name="claude-3-5-sonnet-20241022",
        provider="anthropic"
    )
)

# Send multimodal message
messages = [
    {
        "role": "user",
        "content": [
            TextContent(text="What animal is in this image? Describe it in detail."),
            ImageContent(data=image_data, media_type="image/jpeg")
        ]
    }
]

response = await client.messages.create(messages=messages, max_tokens=200)
print("Anthropic Response:")
print(response.content[0].text)

[32m2025-06-12 16:03:40,919 - v_router.router - INFO - Trying primary model: claude-3-5-sonnet-20241022 on anthropic[0m


Anthropic Response:
This image shows a carpenter ant (genus Camponotus) in striking detail. The ant is captured in a dynamic pose, appearing to be rearing up on its hind legs, which is a common defensive posture. The ant's characteristic features are clearly visible, including its segmented body, long antennae, and powerful mandibles. The photo has a shallow depth of field, creating a beautifully blurred background that makes the ant stand out in sharp focus. The ant appears to be dark in color, possibly black or dark brown, and its exoskeleton has a slight sheen to it. Carpenter ants are among the larger ant species, and this image really showcases their impressive size and distinctive anatomy.


### Method 2: File Path (Automatic Conversion)

v-router can automatically convert local image files to base64:

In [4]:
# Save the image locally
image_path = Path("/tmp/test_ant.jpg")
# Use the httpx response from earlier, not the v-router response
with open(image_path, "wb") as f:
    f.write(httpx.get(image_url).content)

# Send using file path - v-router will automatically convert to base64
messages = [
    {
        "role": "user",
        "content": str(image_path)  # Just pass the file path as a string
    }
]

response = await client.messages.create(messages=messages, max_tokens=100)
print("Response from file path:")
print(response.content[0].text)

# Clean up
image_path.unlink()

[32m2025-06-12 16:03:47,436 - v_router.router - INFO - Trying primary model: claude-3-5-sonnet-20241022 on anthropic[0m


Response from file path:
This is a detailed macro photograph of an ant, showing its distinctive features like its segmented body, long antennae, and slender legs. The ant appears to be in a dynamic pose, with its body raised and antennae extended, suggesting it might be in a defensive or alert posture. The image has a shallow depth of field, creating a soft, blurred background that makes the ant stand out in sharp focus. The lighting and close-up perspective allow us to see the


## Cross-Provider Compatibility

The same multimodal content works across different providers:

In [5]:
# Prepare the same multimodal message
multimodal_messages = [
    {
        "role": "user",
        "content": [
            TextContent(text="What do you see in this image?"),
            ImageContent(data=image_data, media_type="image/jpeg")
        ]
    }
]

# Test with different providers
providers = [
    ("anthropic", "claude-3-5-sonnet-20241022"),
    ("google", "gemini-1.5-flash"),
    ("openai", "gpt-4o")
]

for provider, model in providers:
    print(f"\n{provider.upper()} ({model}):")
    try:
        client = Client(
            llm_config=LLM(
                model_name=model,
                provider=provider
            )
        )
        response = await client.messages.create(messages=multimodal_messages, max_tokens=50)
        print(f"✓ {response.content[0].text[:100]}...")
    except Exception as e:
        print(f"✗ Error: {e}")

[32m2025-06-12 16:03:53,910 - v_router.router - INFO - Trying primary model: claude-3-5-sonnet-20241022 on anthropic[0m



ANTHROPIC (claude-3-5-sonnet-20241022):


[32m2025-06-12 16:03:58,319 - v_router.router - INFO - Trying primary model: gemini-1.5-flash on google[0m


✓ This is a detailed macro photograph of an ant, showing its distinctive features like its segmented b...

GOOGLE (gemini-1.5-flash):


[32m2025-06-12 16:03:59,745 - v_router.router - INFO - Trying primary model: gpt-4o on openai[0m


✓ That's a close-up image of a carpenter ant (genus *Camponotus*) carrying something.  Specifically, i...

OPENAI (gpt-4o):
✓ The image shows a close-up of an ant standing on a surface. The ant appears to be in a raised positi...


## Word Documents (.docx)

v-router supports Word documents (.docx) by automatically converting them to HTML using the mammoth library. This works across all providers by sending the converted content as text.

### Method 1: File Path (Automatic Conversion)

The easiest way to send Word documents is by passing the file path directly:

In [6]:
# Example using the Word document included in this repository
from v_router import Client, LLM

# Create a client
client = Client(
    llm_config=LLM(
        model_name="claude-3-5-sonnet-20241022",
        provider="anthropic"
    )
)

# Send a Word document by just passing its path
docx_path = "providers/assets/order.docx"
messages = [
    {
        "role": "user",
        "content": docx_path  # v-router automatically detects and converts the Word document
    },
    {
        "role": "user", 
        "content": "What is this document about? Give me a brief summary."
    }
]

response = await client.messages.create(messages=messages, max_tokens=200)
print("Word Document Summary:")
print(response.content[0].text)

[32m2025-06-12 16:04:16,122 - v_router.router - INFO - Trying primary model: claude-3-5-sonnet-20241022 on anthropic[0m


Word Document Summary:
This is a Purchase Order (PO) document with PO number 23781 for automotive parts. The customer, John Smith from Redline Auto Center in Lansing, Michigan, is ordering the following items:

1. 4 units of Brake Discs, Pads & Calipers at $111.36 each
2. 2 units of Control Arm at $60.93 each
3. 2 units of Suspension Lift Kit at $399.83 each

The subtotal is $1,366.96, with a 10% discount, 12% sales tax, additional costs for shipping & handling ($800) and other costs ($500), bringing the total amount to $2,694.30. The payment terms indicate that payment is due 30 days upon receipt of the items, and shipping terms are Freight on Board via Air & Land shipping method.


### Method 2: Base64 Encoded Word Documents

You can also send Word documents by providing base64-encoded data directly:

In [7]:
# Load and encode a Word document
with open("providers/assets/order.docx", "rb") as f:
    docx_data = base64.b64encode(f.read()).decode("utf-8")

# Create a client with Anthropic
client = Client(
    llm_config=LLM(
        model_name="claude-3-5-sonnet-20241022",
        provider="anthropic"
    )
)

# Send multimodal message with Word document
messages = [
    {
        "role": "user",
        "content": [
            TextContent(text="Please analyze this Word document and tell me what type of document it is."),
            DocumentContent(
                data=docx_data, 
                media_type="application/vnd.openxmlformats-officedocument.wordprocessingml.document"
            )
        ]
    }
]

response = await client.messages.create(messages=messages, max_tokens=200)
print("Word Document Analysis:")
print(response.content[0].text)

[32m2025-06-12 16:04:29,247 - v_router.router - INFO - Trying primary model: claude-3-5-sonnet-20241022 on anthropic[0m


Word Document Analysis:
This is a Purchase Order (PO) document. This can be clearly identified by:

1. The title "PURCHASE ORDER" at the top of the document
2. The presence of a PO number (23781)
3. The standard PO structure including:
   - Vendor information
   - Customer information
   - Shipping terms and method
   - Product details with codes, descriptions, quantities, and prices
   - Cost calculations including subtotal, discount, tax, shipping & handling
   - Payment terms in the notes section

The document is formatted as a formal business transaction between a vendor and customer (Redline Auto Center) for the purchase of automotive parts, with detailed pricing and payment terms.


### Cross-Provider Word Document Support

Word documents work across all providers through automatic HTML conversion:

In [8]:
# Prepare the same Word document message
word_doc_messages = [
    {
        "role": "user",
        "content": [
            TextContent(text="What type of document is this?"),
            DocumentContent(
                data=docx_data, 
                media_type="application/vnd.openxmlformats-officedocument.wordprocessingml.document"
            )
        ]
    }
]

# Test with different providers
providers = [
    ("anthropic", "claude-3-5-sonnet-20241022"),
    ("google", "gemini-1.5-flash"),
    ("openai", "gpt-4o")
]

for provider, model in providers:
    print(f"\n{provider.upper()} ({model}):")
    try:
        client = Client(
            llm_config=LLM(
                model_name=model,
                provider=provider
            )
        )
        response = await client.messages.create(messages=word_doc_messages, max_tokens=50)
        print(f"✓ {response.content[0].text[:100]}...")
    except Exception as e:
        print(f"✗ Error: {e}")

[32m2025-06-12 16:04:44,420 - v_router.router - INFO - Trying primary model: claude-3-5-sonnet-20241022 on anthropic[0m



ANTHROPIC (claude-3-5-sonnet-20241022):


[32m2025-06-12 16:04:49,277 - v_router.router - INFO - Trying primary model: gemini-1.5-flash on google[0m


✓ This is a Purchase Order (PO) document. This can be clearly seen in the header of the document where...

GOOGLE (gemini-1.5-flash):


[32m2025-06-12 16:04:50,128 - v_router.router - INFO - Trying primary model: gpt-4o on openai[0m


✓ This is a **Purchase Order (PO)**.  It's a commercial document issued by a buyer (Redline Auto Cente...

OPENAI (gpt-4o):
✓ The document you provided is a "Purchase Order." A purchase order is a commercial document issued by...


### Complex Multimodal Messages with Word Documents

You can combine Word documents with images and other content types:

In [9]:
# Create a complex multimodal message with multiple content types
complex_messages = [
    {
        "role": "user",
        "content": [
            TextContent(text="I'm sharing multiple documents with you:"),
            TextContent(text="1. A Word document:"),
            DocumentContent(
                data=docx_data, 
                media_type="application/vnd.openxmlformats-officedocument.wordprocessingml.document"
            ),
            TextContent(text="2. An image:"),
            ImageContent(data=image_data, media_type="image/jpeg"),
            TextContent(text="Can you briefly describe what each contains?")
        ]
    }
]

try:
    client = Client(
        llm_config=LLM(
            model_name="claude-3-5-sonnet-20241022",
            provider="anthropic"
        )
    )
    response = await client.messages.create(messages=complex_messages, max_tokens=300)
    print("Multi-Content Analysis:")
    print(response.content[0].text)
except Exception as e:
    print(f"Error with complex message: {e}")
    print("\nTrying with a simpler Word document message...")
    
    # Fallback to a simpler message
    simple_messages = [
        {
            "role": "user", 
            "content": [
                TextContent(text="What type of document is this?"),
                DocumentContent(
                    data=docx_data, 
                    media_type="application/vnd.openxmlformats-officedocument.wordprocessingml.document"
                )
            ]
        }
    ]
    response = await client.messages.create(messages=simple_messages, max_tokens=100)
    print(response.content[0].text)

[32m2025-06-12 16:05:29,236 - v_router.router - INFO - Trying primary model: claude-3-5-sonnet-20241022 on anthropic[0m


Multi-Content Analysis:
Let me describe both documents:

1. The Word document appears to be a purchase order template with the following key elements:
- Header section for company details
- PO Number: 23781
- Vendor and customer information sections
- Shipping terms: Freight on Board
- Shipping method: Air & Land
- A list of auto parts being ordered (brake discs, control arm, suspension lift kit)
- Pricing calculations including subtotal, discount, tax, shipping and handling
- Total amount: $2,694.30
- Payment terms of 30 days

2. The image shows a close-up macro photograph of an ant in a dramatic pose. The ant appears to be standing upright on its hind legs in what looks like a defensive or alert posture. The photo has a shallow depth of field with a blurred reddish-brown background, creating an artistic composition that highlights the ant's distinctive features.


## PDF Documents

Some providers (like Anthropic and Google) support PDF documents:

In [13]:
# Download a sample PDF
pdf_url = "https://www.w3.org/WAI/ER/tests/xhtml/testfiles/resources/pdf/dummy.pdf"
try:
    pdf_response = httpx.get(pdf_url, timeout=10.0)
    pdf_data = base64.b64encode(pdf_response.content).decode("utf-8")
except Exception as e:
    print(f"Failed to download PDF: {e}")
    # Use the local PDF as fallback
    with open("providers/assets/gameboy_color.pdf", "rb") as f:
        pdf_data = base64.b64encode(f.read()).decode("utf-8")

# Send PDF to OpenAI
anthropic_client = Client(
    llm_config=LLM(
        model_name="gpt-4o-mini",
        provider="openai"
    )
)

pdf_messages = [
    {
        "role": "user",
        "content": [
            TextContent(text="What is the content of this PDF?"),
            DocumentContent(data=pdf_data, media_type="application/pdf")
        ]
    }
]

response = await anthropic_client.messages.create(messages=pdf_messages, max_tokens=200)
print("PDF Analysis:")
print(response.content[0].text)

[32m2025-06-12 16:09:02,070 - v_router.router - INFO - Trying primary model: gpt-4o-mini on openai[0m


PDF Analysis:
I'm sorry, but I can't view or analyze PDF documents directly. However, if you can provide text or specific details from the PDF, I'd be happy to help you understand or summarize the content!


## Complex Multimodal Conversations

You can combine multiple images and text in a single message:

In [11]:
# Use the ant image from earlier and download a new one
image1_data = image_data  # Reuse the ant image from earlier

# Download a different image
image2_url = "https://upload.wikimedia.org/wikipedia/commons/thumb/3/3a/Cat03.jpg/320px-Cat03.jpg"
try:
    image2_response = httpx.get(image2_url)
    image2_data = base64.b64encode(image2_response.content).decode("utf-8")
except Exception as e:
    print(f"Failed to download second image: {e}")
    # Use a simple fallback example
    image2_data = image1_data

# Create a complex multimodal message
complex_messages = [
    {
        "role": "user",
        "content": [
            TextContent(text="I'm going to show you two images."),
            TextContent(text="First image:"),
            ImageContent(data=image1_data, media_type="image/jpeg"),
            TextContent(text="Second image:"),
            ImageContent(data=image2_data, media_type="image/jpeg"),
            TextContent(text="Can you describe what you see in each image?")
        ]
    }
]

try:
    response = await anthropic_client.messages.create(messages=complex_messages, max_tokens=300)
    print("Comparison Response:")
    print(response.content[0].text)
except Exception as e:
    print(f"Error with complex message: {e}")
    print("\nTrying with a simpler message...")
    
    # Fallback to a simpler message
    simple_messages = [
        {
            "role": "user", 
            "content": [
                TextContent(text="What do you see in this image?"),
                ImageContent(data=image1_data, media_type="image/jpeg")
            ]
        }
    ]
    response = await anthropic_client.messages.create(messages=simple_messages, max_tokens=100)
    print(response.content[0].text)

[32m2025-06-12 16:06:21,918 - v_router.router - INFO - Trying primary model: claude-3-5-sonnet-20241022 on anthropic[0m


Comparison Response:
First image: This is a detailed macro photograph of an ant, showing its distinctive segmented body, thin legs, and antennae. The ant appears to be in a rearing or alert posture, with its front legs raised. The image has a shallow depth of field, creating a blurred background that makes the ant stand out in sharp detail.

Second image: This is a close-up portrait of a ginger/orange cat with striking amber-colored eyes. The cat has a cream-colored face with orange markings, pointed ears, and appears to be looking directly at the camera. The image shows great detail in the cat's fur texture and facial features. The background appears to have some red stripes but is mostly out of focus, putting emphasis on the cat's face.


## Provider-Specific Considerations

### Anthropic
- Supports: Images (JPEG, PNG, GIF, WebP), PDFs, and Word documents (.docx)
- Max image size: 5MB per image
- Multiple images per message: Yes
- Word documents: Converted to HTML automatically

### Google (Gemini)
- Supports: Images, PDFs, and Word documents (.docx)
- Processes images through `inline_data`
- Multiple images per message: Yes
- Word documents: Converted to HTML automatically

### OpenAI
- Supports: Images and Word documents (.docx)
- Uses data URI format for images
- Multiple images per message: Yes
- Word documents: Converted to HTML automatically
- Note: PDFs will show a placeholder message

## Best Practices

1. **Image Optimization**: Resize large images before sending to reduce latency
2. **Error Handling**: Always handle provider-specific limitations
3. **Fallback Strategy**: Use v-router's fallback mechanism for providers that don't support certain content types
4. **Content Validation**: Ensure your content matches supported MIME types
5. **Word Document Format**: Use .docx format (not .doc) for best compatibility

## Example: Building a Visual Question Answering System

In [12]:
async def analyze_image(image_path: str, question: str, provider: str = "anthropic"):
    """Analyze an image and answer a question about it."""
    
    # Read and encode the image
    with open(image_path, "rb") as f:
        image_data = base64.b64encode(f.read()).decode("utf-8")
    
    # Determine MIME type
    import mimetypes
    mime_type, _ = mimetypes.guess_type(image_path)
    
    # Create client with fallback
    client = Client(
        llm_config=LLM(
            model_name="claude-3-5-sonnet-20241022" if provider == "anthropic" else "gpt-4o",
            provider=provider,
            try_other_providers=True  # Enable cross-provider fallback
        )
    )
    
    # Send the question with the image
    messages = [
        {
            "role": "user",
            "content": [
                TextContent(text=question),
                ImageContent(data=image_data, media_type=mime_type or "image/jpeg")
            ]
        }
    ]
    
    response = await client.messages.create(messages=messages, max_tokens=300)
    return response.content[0].text

# Example usage (you would need to provide an actual image path)
# result = await analyze_image("/path/to/image.jpg", "What objects can you identify in this image?")
# print(result)

## Summary

v-router makes it easy to work with multimodal content across different LLM providers:

- **Unified API**: Same interface for all providers
- **Automatic conversion**: File paths are converted to base64 automatically
- **Word document support**: .docx files are automatically converted to HTML using mammoth
- **Provider abstraction**: Handle provider differences transparently
- **Fallback support**: Automatically try other providers if one fails

This enables you to build robust multimodal applications without worrying about provider-specific implementation details.