<div align="center">
<p align="center" style="width: 100%;">
    <img src="https://raw.githubusercontent.com/vlm-run/.github/refs/heads/main/profile/assets/vlm-black.svg" alt="VLM Run Logo" width="80" style="margin-bottom: -5px; color: #2e3138; vertical-align: middle; padding-right: 5px;"><br>
</p>
<p align="center"><a href="https://vlm.run"><b>Website</b></a> | <a href="https://app.vlm.run/"><b>Platform</b></a> | <a href="https://github.com/vlm-run/vlmrun-hub"><b>Hub</b></a> | <a href="https://docs.vlm.run/"><b>Docs</b></a> | <a href="https://vlm.run/blog"><b>Blog</b></a> | <a href="https://discord.gg/4jgyECY4rq"><b>Discord</b></a>
</p>
<p align="center">
<a href="https://discord.gg/4jgyECY4rq"><img alt="Discord" src="https://img.shields.io/badge/discord-chat-purple?color=%235765F2&label=discord&logo=discord"></a>
<a href="https://twitter.com/vlmrun"><img alt="Twitter Follow" src="https://img.shields.io/badge/twitter-follow-blue?color=%231DA1F2&label=twitter&logo=twitter"></a>
</p>
</div>

This notebook demonstrates how to use the **VLM Run MCP Server** with OpenAI Responses API. The VLM Run MCP Server provides access to powerful visual AI capabilities through the Model Context Protocol (MCP).

## What is VLM Run MCP Server?

The VLM Run MCP Server transforms any MCP-compatible AI agent into a visual AI powerhouse. It provides access to:

- **Document AI**: Extract structured data from invoices, receipts, contracts, forms
- **Image AI**: Classify images, extract text, analyze visual content, detect objects and faces
- **Video AI**: Transcribe videos with scene descriptions, search content, edit videos
- **Hub Management**: Browse 50+ pre-built domains and create custom schemas

**Server URL**: `https://mcp.vlm.run/mcp/sse`  
**Authentication**: Bearer token (VLM Run API key)

## Prerequisites

- Python 3.9+
- VLM Run API key (get one at [app.vlm.run](https://app.vlm.run))
- OpenAI API key (for OpenAI examples)

## Setup

First, let's install the required packages:

In [4]:
! pip install openai --upgrade --quiet

### 1. OpenAI API Setup

In [None]:
import os
import openai

OPENAI_API_KEY = os.getenv("OPENAI_API_KEY", "")
VLMRUN_API_KEY = os.getenv("VLMRUN_API_KEY", "")
VLMRUN_MCP_SERVER_URL = f"https://mcp.vlm.run/mcp/sse"

# Initialize OpenAI client
client = openai.Client(api_key=OPENAI_API_KEY)


### 2. Image Processing Examples

#### Face Detection and Visualization

In [None]:
response = client.responses.create(
    model="gpt-4.1",
    tools=[{
        "type": "mcp",
        "server_label": "vlm-run-mcp",
        "server_url": VLMRUN_MCP_SERVER_URL,
        "headers": {
            "Authorization": f"Bearer {VLMRUN_API_KEY}"
        },
        "require_approval": "never"
    }],
    input="Load this image (https://storage.googleapis.com/vlm-data-public-prod/hub/examples/media.tv-news/finance_bb_3_speakers.jpg) and detect all the faces in the image, visualize the detected faces, and return the preview URL of the visualized image."
)

print("Face Detection Result:")
print(response.output_text)

Face Detection Result:
Here is the preview URL of the image with all detected faces visualized:
[https://mcp.vlm.run/files/img_159f](https://mcp.vlm.run/files/img_159f)

The image now displays bounding boxes around each detected face.


#### Face Blurring and Privacy Protection

In [None]:
response = client.responses.create(
    model="gpt-4.1",
    tools=[{
        "type": "mcp",
        "server_label": "vlm-run-mcp",
        "server_url": VLMRUN_MCP_SERVER_URL,
        "headers": {
            "Authorization": f"Bearer {VLMRUN_API_KEY}"
        },
        "require_approval": "never"
    }],
    input="Load this image (https://storage.googleapis.com/vlm-data-public-prod/hub/examples/media.tv-news/finance_bb_3_speakers.jpg) and detect all the faces in the image, blur them, and overlay the detected faces on the blurred image, and return the preview URL of the blurred image."
)

print("Face Blurring Result:")
print(response.output_text)

Face Blurring Result:
Here is the preview URL of the image with all detected faces blurred, and bounding boxes with "face" overlays on the blurred regions:
[https://mcp.vlm.run/files/img_a369](https://mcp.vlm.run/files/img_a369)

You will see the faces are blurred and labeled/outlined on the image as requested.


### 3. Document Processing Examples

#### Invoice Extraction

In [None]:
response = client.responses.create(
    model="gpt-4.1",
    tools=[{
        "type": "mcp",
        "server_label": "vlm-run-mcp",
        "server_url": VLMRUN_MCP_SERVER_URL,
        "headers": {
            "Authorization": f"Bearer {VLMRUN_API_KEY}"
        },
        "require_approval": "never"
    }],
    input="Load this invoice (https://storage.googleapis.com/vlm-data-public-prod/hub/examples/document.invoice/google_invoice.pdf) and extract the invoice details (invoice number, total amount, date, etc.) including its grounding information."
)

print("Invoice Extraction Result:")
print(response.output_text)

Invoice Extraction Result:
### Extracted Invoice Details (with Grounding Information)

#### Invoice Metadata
- **Issuer:** Google  
  - **Location:** [Page 0, bbox: [0.06, 0.015, 0.25, 0.067]]
- **Invoice Number:** 23413561D  
  - **Location:** [Page 0, bbox: [0.81, 0.072, 0.124, 0.016]]
- **Invoice Date (Issue):** 2019-09-24 (Sep 24, 2019)  
  - **Location:** [Page 0, bbox: [0.83, 0.138, 0.10, 0.016]]
- **Invoice Due Date:** 2019-09-30 (Sep 30, 2019)  
  - **Location:** [Page 0, bbox: [0.83, 0.166, 0.10, 0.016]]

#### Customer Details
- **Name:** Jane Smith  
  - **Location:** [Page 0, bbox: [0.081, 0.164, 0.093, 0.015]]
- **Billing Address:**  
  - **Street:** 1600 Amphitheatre Pkway, [Page 0, bbox: [0.082, 0.179, 0.191, 0.016]]
  - **City/State/Zip:** Mountain View, CA 94043, [Page 0, bbox: [0.08, 0.195, 0.191, 0.015]]

#### Invoice Line Items

| Description              | Quantity | Unit Price | Total Price | BBox (Description)                     |
|--------------------------|----

#### PII Redaction

In [None]:
response = client.responses.create(
    model="gpt-4.1",
    tools=[{
        "type": "mcp",
        "server_label": "vlm-run-mcp",
        "server_url": VLMRUN_MCP_SERVER_URL,
        "headers": {
            "Authorization": f"Bearer {VLMRUN_API_KEY}"
        },
        "require_approval": "never"
    }],
    input="Load this document (https://storage.googleapis.com/vlm-data-public-prod/hub/examples/document.invoice/invoice_1.jpg) and redact any personal information (name, address, phone number, email, etc.) from the document."
)

print("PII Redaction Result:")
print(response.output_text)

PII Redaction Result:
The personal information has been successfully redacted from the document. You can view the processed document [here](https://mcp.vlm.run/files/img_e751).


### 4. Hub Schema Management

#### List Available Domains

In [None]:
response = client.responses.create(
    model="gpt-4.1",
    tools=[{
        "type": "mcp",
        "server_label": "vlm-run-mcp",
        "server_url": VLMRUN_MCP_SERVER_URL,
        "headers": {
            "Authorization": f"Bearer {VLMRUN_API_KEY}"
        },
        "require_approval": "never"
    }],
    input="List all available hub domains and show me the first 5 domains with their categories. I want to see what schemas are available."
)

print("Available Domains:")
print(response.output_text)

Available Domains:
Here are the first 5 available hub domains with their categories:

1. accounting.form-payslip (Category: accounting)
2. accounting.form-w2 (Category: accounting)
3. aerospace.remote-sensing (Category: aerospace)
4. document.bank-check (Category: document)
5. document.bank-statement (Category: document)

These are schema domains you can use for processing or structuring data related to each category. If you want to see details or fields for any schema, let me know!


#### Get Schema Information

In [None]:
response = client.responses.create(
    model="gpt-4.1",
    tools=[{
        "type": "mcp",
        "server_label": "vlm-run-mcp",
        "server_url": VLMRUN_MCP_SERVER_URL,
        "headers": {
            "Authorization": f"Bearer {VLMRUN_API_KEY}"
        },
        "require_approval": "never"
    }],
    input="Get the JSON schema for the 'document.invoice' domain. Show me the schema structure and available fields."
)

print("Invoice Schema:")
print(response.output_text)

Invoice Schema:
Here is the structure and available fields for the JSON schema of the **document.invoice** domain:

---

### General Information

- **Description:** Comprehensive invoice data extraction system that processes invoice images to extract structured information including invoice metadata, customer details, line items, and financial totals.

---

## Schema Structure & Fields

### Top-Level Invoice Fields

| Field                        | Type        | Description                       |
|------------------------------|-------------|-----------------------------------|
| invoice_id                   | string/null | Unique invoice identifier         |
| period_start                 | string/date | Invoice period start date         |
| period_end                   | string/date | Invoice period end date           |
| invoice_issue_date           | string/date | Issue date of the invoice         |
| invoice_due_date             | string/date | Due date of the invoice           |

### 5. Advanced Examples

#### Face Cropping and Preview

In [None]:
response = client.responses.create(
    model="gpt-4.1",
    tools=[{
        "type": "mcp",
        "server_label": "vlm-run-mcp",
        "server_url": VLMRUN_MCP_SERVER_URL,
        "headers": {
            "Authorization": f"Bearer {VLMRUN_API_KEY}"
        },
        "require_approval": "never"
    }],
    input="Load this image (https://storage.googleapis.com/vlm-data-public-prod/hub/examples/media.tv-news/finance_bb_3_speakers.jpg), detect all the faces in the image, and crop the right-most face, and preview the cropped image in HTML."
)

print("Face Cropping Result:")
print(response.output_text)

Face Cropping Result:
Here is the cropped image of the right-most face, as detected and requested:

<img src="https://mcp.vlm.run/files/img_62b2" alt="Cropped right-most face" />

You can preview it directly above in HTML. Let me know if you want to analyze or process this face further!


#### Video Processing

In [None]:
response = client.responses.create(
    model="gpt-4.1",
    tools=[{
        "type": "mcp",
        "server_label": "vlm-run-mcp",
        "server_url": VLMRUN_MCP_SERVER_URL,
        "headers": {
            "Authorization": f"Bearer {VLMRUN_API_KEY}"
        },
        "require_approval": "never"
    }],
    input="Load this video (https://storage.googleapis.com/vlm-data-public-prod/hub/examples/video/test_video.mp4) and preview the video in HTML."
)

print("Video Preview:")
print(response.output_text)

Video Preview:
Here’s an HTML snippet you can use to preview the video using the provided link:

```html
<video width="640" height="360" controls>
  <source src="https://storage.googleapis.com/vlm-data-public-prod/hub/examples/video/test_video.mp4" type="video/mp4">
  Your browser does not support the video tag.
</video>
```

**How to use:**
- Copy and paste this code into your HTML file or an online HTML editor.
- You’ll see a video player with standard controls (play, pause, etc.) for the video.



### 7. Next Steps and Resources

#### Explore More Capabilities

1. **Advanced Video Processing**: Try video transcription, editing, and analysis
2. **Batch Processing**: Process large volumes of documents efficiently
3. **Real-time Applications**: Build live video processing applications
4. **Multi-modal Analysis**: Combine text, image, and video analysis

#### Tool Categories

- **Image Processing**: Object detection, OCR, face detection, QR code detection, template matching
- **Document Processing**: Invoice extraction, PII redaction, figure detection
- **Video Processing**: Video preview, trimming, watermarking, YouTube transcript extraction
- **Hub Management**: Schema listing, creation, updating, and filtering

#### Resources

- [VLM Run MCP Documentation](https://docs.vlm.run/mcp)
- [FastMCP Documentation](https://gofastmcp.com)
- [VLM Run Hub](https://github.com/vlm-run/vlmrun-hub)
- [Discord Community](https://discord.gg/4jgyECY4rq)

#### Support

- Email: [support@vlm.run](mailto:support@vlm.run)
- Discord: [Join our community](https://discord.gg/4jgyECY4rq)
- GitHub: [Report issues](https://github.com/vlm-run/vlmrun-hub/issues)