**Note**: This notebook uses `await` syntax. Run it in a Jupyter environment with IPython kernel, or use `asyncio.run()` wrapper for regular Python scripts.

# Multimodal Content with v-router 🖼️

This notebook demonstrates how to use v-router with multimodal content (images and PDFs). The v-router library provides a unified interface for sending images and documents to different LLM providers.

## Overview

v-router supports:
- **Images**: JPEG, PNG, GIF, WebP formats
- **Documents**: PDF files (provider support varies)
- **Automatic conversion**: File paths are automatically converted to base64
- **Unified interface**: Same API across all providers

## Setup

First, let's set up our environment and imports:

## Quick Start: Using Local Files

The easiest way to send images and PDFs is by passing file paths directly:

In [8]:
# Example using the PDF file included in this repository
from v_router import Client, LLM

# Create a client
client = Client(
    llm_config=LLM(
        model_name="claude-3-5-sonnet-20241022",
        provider="anthropic"
    )
)

# Send a PDF file by just passing its path
pdf_path = "providers/assets/gameboy_color.pdf"
messages = [
    {
        "role": "user",
        "content": pdf_path  # v-router automatically detects and converts the PDF
    },
    {
        "role": "user", 
        "content": "What is this document about? Give me a brief summary."
    }
]

response = await client.messages.create(messages=messages, max_tokens=200)
print("PDF Summary:")
print(response.content[0].text)

[32m2025-05-31 23:36:14,113 - v_router.router - INFO - Trying primary model: claude-3-5-sonnet-20241022 on anthropic[0m


PDF Summary:
This is an instruction manual/booklet for the Nintendo Game Boy Color handheld video game system. The document covers various aspects of the device including:

1. Introduction and basic features
2. Component descriptions and diagrams
3. Battery installation instructions
4. Usage instructions and setup
5. Information about compatible Game Paks (game cartridges)
6. Two-player gaming setup using Game Link cables
7. Troubleshooting guide
8. Warranty information
9. Parts list and order form



In [9]:
import base64
import httpx
from pathlib import Path

# Import content types if you need to create multimodal messages manually
from v_router.classes.message import TextContent, ImageContent, DocumentContent

## Sending Images

### Method 1: Base64 Encoded Images

You can send images by providing base64-encoded data directly:

In [10]:
# Download a sample image
image_url = "https://upload.wikimedia.org/wikipedia/commons/thumb/a/a7/Camponotus_flavomarginatus_ant.jpg/320px-Camponotus_flavomarginatus_ant.jpg"
response = httpx.get(image_url)
image_data = base64.b64encode(response.content).decode("utf-8")

# Create a client with Anthropic
client = Client(
    llm_config=LLM(
        model_name="claude-3-5-sonnet-20241022",
        provider="anthropic"
    )
)

# Send multimodal message
messages = [
    {
        "role": "user",
        "content": [
            TextContent(text="What animal is in this image? Describe it in detail."),
            ImageContent(data=image_data, media_type="image/jpeg")
        ]
    }
]

response = await client.messages.create(messages=messages, max_tokens=200)
print("Anthropic Response:")
print(response.content[0].text)

[32m2025-05-31 23:36:20,849 - v_router.router - INFO - Trying primary model: claude-3-5-sonnet-20241022 on anthropic[0m


Anthropic Response:
This image shows a carpenter ant (genus Camponotus) in striking detail. The ant is captured in a dynamic pose, appearing to be rearing up on its hind legs, which is a common defensive or aggressive posture. The ant's body shows the classic characteristics of carpenter ants: a segmented body with a pronounced thorax, a large head, and long, jointed legs. The ant's antennae are clearly visible, extending forward from its head. The image has a shallow depth of field, creating a beautiful bokeh effect in the background while keeping the ant in sharp focus. The lighting gives the ant's exoskeleton a subtle sheen, and you can make out fine details of its body structure. The overall color palette is warm, with browns and reddish tones dominating the composition.


### Method 2: File Path (Automatic Conversion)

v-router can automatically convert local image files to base64:

In [11]:
# Save the image locally
image_path = Path("/tmp/test_ant.jpg")
# Use the httpx response from earlier, not the v-router response
with open(image_path, "wb") as f:
    f.write(httpx.get(image_url).content)

# Send using file path - v-router will automatically convert to base64
messages = [
    {
        "role": "user",
        "content": str(image_path)  # Just pass the file path as a string
    }
]

response = await client.messages.create(messages=messages, max_tokens=100)
print("Response from file path:")
print(response.content[0].text)

# Clean up
image_path.unlink()

[32m2025-05-31 23:36:25,324 - v_router.router - INFO - Trying primary model: claude-3-5-sonnet-20241022 on anthropic[0m


Response from file path:
This is a detailed macro photograph of an ant, showing its distinctive features like its segmented body, long antennae, and slender legs. The ant appears to be in a rearing or defensive posture, with its front legs raised off the ground. The image has a shallow depth of field, creating a soft, blurred background while keeping the ant in sharp focus. The lighting gives the photograph a warm, brownish tone, and the detail captured allows you to see the ant's


## Cross-Provider Compatibility

The same multimodal content works across different providers:

In [12]:
# Prepare the same multimodal message
multimodal_messages = [
    {
        "role": "user",
        "content": [
            TextContent(text="What do you see in this image?"),
            ImageContent(data=image_data, media_type="image/jpeg")
        ]
    }
]

# Test with different providers
providers = [
    ("anthropic", "claude-3-5-sonnet-20241022"),
    ("google", "gemini-1.5-flash"),
    ("openai", "gpt-4o")
]

for provider, model in providers:
    print(f"\n{provider.upper()} ({model}):")
    try:
        client = Client(
            llm_config=LLM(
                model_name=model,
                provider=provider
            )
        )
        response = await client.messages.create(messages=multimodal_messages, max_tokens=50)
        print(f"✓ {response.content[0].text[:100]}...")
    except Exception as e:
        print(f"✗ Error: {e}")

[32m2025-05-31 23:36:28,256 - v_router.router - INFO - Trying primary model: claude-3-5-sonnet-20241022 on anthropic[0m



ANTHROPIC (claude-3-5-sonnet-20241022):


[32m2025-05-31 23:36:30,752 - v_router.router - INFO - Trying primary model: gemini-1.5-flash on google[0m


✓ This is a detailed macro photograph of an ant, showing its distinctive features like its segmented b...

GOOGLE (gemini-1.5-flash):


[32m2025-05-31 23:36:32,007 - v_router.router - INFO - Trying primary model: gpt-4o on openai[0m


✓ That's a close-up image of a carpenter ant (genus *Camponotus*) carrying something.  Specifically, i...

OPENAI (gpt-4o):
✓ This image shows a close-up of an ant. The ant is standing on a surface, and its body is clearly vis...


## PDF Documents

Some providers (like Anthropic and Google) support PDF documents:

In [13]:
# Download a sample PDF
pdf_url = "https://www.w3.org/WAI/ER/tests/xhtml/testfiles/resources/pdf/dummy.pdf"
try:
    pdf_response = httpx.get(pdf_url, timeout=10.0)
    pdf_data = base64.b64encode(pdf_response.content).decode("utf-8")
except Exception as e:
    print(f"Failed to download PDF: {e}")
    # Use the local PDF as fallback
    with open("providers/assets/gameboy_color.pdf", "rb") as f:
        pdf_data = base64.b64encode(f.read()).decode("utf-8")

# Send PDF to Anthropic
anthropic_client = Client(
    llm_config=LLM(
        model_name="claude-3-5-sonnet-20241022",
        provider="anthropic"
    )
)

pdf_messages = [
    {
        "role": "user",
        "content": [
            TextContent(text="What is the content of this PDF?"),
            DocumentContent(data=pdf_data, media_type="application/pdf")
        ]
    }
]

response = await anthropic_client.messages.create(messages=pdf_messages, max_tokens=200)
print("PDF Analysis:")
print(response.content[0].text)

[32m2025-05-31 23:36:34,661 - v_router.router - INFO - Trying primary model: claude-3-5-sonnet-20241022 on anthropic[0m


PDF Analysis:
The PDF contains only the text "Dummy PDF file" at the top of an otherwise blank page.


## Complex Multimodal Conversations

You can combine multiple images and text in a single message:

In [14]:
# Use the ant image from earlier and download a new one
image1_data = image_data  # Reuse the ant image from earlier

# Download a different image
image2_url = "https://upload.wikimedia.org/wikipedia/commons/thumb/3/3a/Cat03.jpg/320px-Cat03.jpg"
try:
    image2_response = httpx.get(image2_url)
    image2_data = base64.b64encode(image2_response.content).decode("utf-8")
except Exception as e:
    print(f"Failed to download second image: {e}")
    # Use a simple fallback example
    image2_data = image1_data

# Create a complex multimodal message
complex_messages = [
    {
        "role": "user",
        "content": [
            TextContent(text="I'm going to show you two images."),
            TextContent(text="First image:"),
            ImageContent(data=image1_data, media_type="image/jpeg"),
            TextContent(text="Second image:"),
            ImageContent(data=image2_data, media_type="image/jpeg"),
            TextContent(text="Can you describe what you see in each image?")
        ]
    }
]

try:
    response = await anthropic_client.messages.create(messages=complex_messages, max_tokens=300)
    print("Comparison Response:")
    print(response.content[0].text)
except Exception as e:
    print(f"Error with complex message: {e}")
    print("\nTrying with a simpler message...")
    
    # Fallback to a simpler message
    simple_messages = [
        {
            "role": "user", 
            "content": [
                TextContent(text="What do you see in this image?"),
                ImageContent(data=image1_data, media_type="image/jpeg")
            ]
        }
    ]
    response = await anthropic_client.messages.create(messages=simple_messages, max_tokens=100)
    print(response.content[0].text)

[32m2025-05-31 23:36:36,433 - v_router.router - INFO - Trying primary model: claude-3-5-sonnet-20241022 on anthropic[0m


Comparison Response:
First image: This is a detailed macro photograph of an ant, showing its distinctive segmented body, thin legs, and antennae. The ant appears to be in a rearing or alert posture, with its front legs raised. The image has a shallow depth of field, creating a blurred background that makes the ant stand out in sharp detail.

Second image: This is a close-up portrait of a ginger/orange cat with striking amber-colored eyes. The cat has a cream-colored face with orange markings, pointed ears, and appears to be looking directly at the camera. The image shows great detail in the cat's fur texture and facial features. The background appears to have some red stripes but is mostly out of focus, putting emphasis on the cat's face.


## Provider-Specific Considerations

### Anthropic
- Supports: Images (JPEG, PNG, GIF, WebP) and PDFs
- Max image size: 5MB per image
- Multiple images per message: Yes

### Google (Gemini)
- Supports: Images and PDFs
- Processes images through `inline_data`
- Multiple images per message: Yes

### OpenAI
- Supports: Images only (no native PDF support)
- Uses data URI format for images
- Multiple images per message: Yes
- Note: PDFs will show a placeholder message

## Best Practices

1. **Image Optimization**: Resize large images before sending to reduce latency
2. **Error Handling**: Always handle provider-specific limitations
3. **Fallback Strategy**: Use v-router's fallback mechanism for providers that don't support certain content types
4. **Content Validation**: Ensure your content matches supported MIME types

## Example: Building a Visual Question Answering System

In [15]:
async def analyze_image(image_path: str, question: str, provider: str = "anthropic"):
    """Analyze an image and answer a question about it."""
    
    # Read and encode the image
    with open(image_path, "rb") as f:
        image_data = base64.b64encode(f.read()).decode("utf-8")
    
    # Determine MIME type
    import mimetypes
    mime_type, _ = mimetypes.guess_type(image_path)
    
    # Create client with fallback
    client = Client(
        llm_config=LLM(
            model_name="claude-3-5-sonnet-20241022" if provider == "anthropic" else "gpt-4o",
            provider=provider,
            try_other_providers=True  # Enable cross-provider fallback
        )
    )
    
    # Send the question with the image
    messages = [
        {
            "role": "user",
            "content": [
                TextContent(text=question),
                ImageContent(data=image_data, media_type=mime_type or "image/jpeg")
            ]
        }
    ]
    
    response = await client.messages.create(messages=messages, max_tokens=300)
    return response.content[0].text

# Example usage (you would need to provide an actual image path)
# result = await analyze_image("/path/to/image.jpg", "What objects can you identify in this image?")
# print(result)

## Summary

v-router makes it easy to work with multimodal content across different LLM providers:

- **Unified API**: Same interface for all providers
- **Automatic conversion**: File paths are converted to base64 automatically
- **Provider abstraction**: Handle provider differences transparently
- **Fallback support**: Automatically try other providers if one fails

This enables you to build robust multimodal applications without worrying about provider-specific implementation details.