MediaClaw

Multimodal Agent Platform

Aggregate full-stack AIGC capabilities to quickly build scenario-adapted multimedia generation solutions

🚀 Introduction

MediaClaw is an OpenClaw-based multimodal agent platform developed by UnicomAI(YuanJing) team. By aggregating full-category AIGC meta-capabilities including image generation, video creation, speech synthesis, digital human, and post-production effects, it forms a unified and flexible toolset (meta-capability pool) that can be called uniformly and combined flexibly.

We have customized the HMI(Human-Machine Interface) and extended functions and opened it to the public through a unified MediaUI. It is specially designed to support Skill customization for various vertical tasks, helping business teams, developers, and ecological partners quickly build multimedia generation solutions that truly adapt to scenarios, simplifying operations, reducing costs and improving efficiency.

MediaClaw Overall Architecture Diagram

✨ Core Features

🎨 Full-stack AIGC Capabilities: Covering full-category multimedia generation capabilities including images, videos, speech, and digital humans
🔌 Plugin Architecture: Developed based on OpenClaw ecosystem, seamlessly integrated into existing OpenClaw deployments
🎯 Multi-provider Support: Support both YuanJing and SGLang backend providers
⚙️ Flexible Configuration: Support configuring different providers and model options by capability dimension
🛠️ Out-of-the-box: Provides a complete WebUI interface, ready to use without complex development
🔧 Skill Extension: Support custom Skill development to quickly adapt to vertical scenario requirements
🎬 Post-processing: Built-in local video processing capabilities such as subtitle burning, green screen matting, video overlay, grading, and audio normalization

📊 Capability Matrix

Feature	Backend Dependency	YuanJing	SGLang	Tool Name
Text to Image	Required	✅	✅	`mediaclaw_text_to_image`
Image QA	Required	✅	✅	`mediaclaw_image_qa`
Text to Video	Required	✅ (Wan/Kling)	✅	`mediaclaw_text_to_video`
Image to Video	Required	✅ (Wan Stylization/Kling Single Image)	✅	`mediaclaw_image_to_video`
Multiple Images to Video	Required	✅ (Wan Multi-image/Kling First-last Frame)	❌	`mediaclaw_images_to_video`
Text to Speech	Required	✅	❌	`mediaclaw_text_to_speech`
Speech Recognition	Required	✅	❌	`mediaclaw_speech_recognition`
Digital Avatar Video	Required	✅	❌	`mediaclaw_digital_avatar`
Subtitle Generation	No required (local processing)	N/A	N/A	`mediaclaw_build_srt`
Subtitle Merge	No required (local processing)	N/A	N/A	`mediaclaw_merge_srt`
Subtitle Burning	No required (local ffmpeg)	N/A	N/A	`mediaclaw_burn_subtitles`
Audio Normalization	No required (local ffmpeg)	N/A	N/A	`mediaclaw_normalize_audio`
Color Grading	No required (local ffmpeg)	N/A	N/A	`mediaclaw_apply_grade`
Video Overlay	No required (local ffmpeg)	N/A	N/A	`mediaclaw_apply_overlay`
Green Screen Background Replacement	No required (local ffmpeg)	N/A	N/A	`mediaclaw_replace_background`
Local Image/Video Processing	No required (local processing)	N/A	N/A	`mediaclaw_local_image`

📦 Installation Guide

Environment Requirements

Node.js 22+
OpenClaw Gateway >= 2026.3.24-beta.2
ffmpeg is required for using mediaclaw_burn_subtitles / mediaclaw_replace_background

Plugin Installation

# Install MediaClaw plugin
openclaw plugins install ./mediaclaw-plugin --force

# Restart OpenClaw gateway
openclaw gateway restart

WebUI Installation

WebUI is customized based on OpenClaw-Admin:

# Enter OpenClaw-Admin directory
cd OpenClaw-Admin
cp .env.example .env

# Edit openclaw auth token in .env file
OPENCLAW_AUTH_TOKEN=YOUR_AUTH_TOKEN

# Install dependencies
npm install

# Start development server
npm run dev:all

After installation, visit http://localhost:3001/ to use.

⚙️ Configuration Guide

Basic Configuration

Edit the openclaw.json configuration file and add MediaClaw related configuration in the plugins node:

"plugins": {
    "mediaclaw": {
      "enabled": true,
      "config": {
        "providers": {
          "yuanjing": {
            "apiKey": "your-yuanjing-token",
            "baseUrl": "https://maas-api.ai-yuanjing.com"
          },
          "sglang": {
            "baseUrl": "http://sglang-default:30010",
            "apiKey": "default-key"
          }
        },
        "capabilities": {
          "textToVideo": {
            "provider": "yuanjing"
          }
        },
        "defaultProvider": "yuanjing"
      }
    },
  },

Configuration Description:

YuanJing is used as the default provider by default (defaultProvider: "yuanjing")
The providers node is the global provider configuration
capabilities.<name>.provider can specify a provider for each capability individually, overriding the global configuration

Video Model Configuration

YuanJing MaaS platform has integrated Kling services, supporting the selection of Wan or Kling models under the YuanJing provider.

Simplified Configuration (Model Only):

{
  "providers": {
    "yuanjing": { "apiKey": "your-yuanjing-key" }
  },
  "capabilities": {
    "textToVideo": { "videoModel": "kling" },
    "imageToVideo": { "videoModel": "kling" },
    "imagesToVideo": { "videoModel": "kling" }
  }
}

Full Configuration (Provider and Model):

{
  "providers": {
    "yuanjing": { "apiKey": "your-yuanjing-key" }
  },
  "capabilities": {
    "textToVideo": {
      "provider": "yuanjing",
      "videoModel": "kling"
    },
    "imageToVideo": {
      "provider": "yuanjing",
      "videoModel": "kling"
    },
    "imagesToVideo": {
      "provider": "yuanjing",
      "videoModel": "kling"
    }
  }
}

Video Model Options:

wan - Wan 2.2 Model (Default)
kling - Kling V3 Model (High-quality Video)

Capability Description:

image_to_video: Single image to video (supports Wan stylization or Kling single image generation)
images_to_video: Multiple images/first-last frame to video (supports Wan multi-image or Kling first-last frame generation)

Configuration Parameter Details

Parameter	Description
`providers.yuanjing.apiKey`	YuanJing API Key (Required)
`providers.yuanjing.baseUrl`	YuanJing API service address
`providers.sglang.baseUrl`	SGLang service address
`providers.sglang.apiKey`	SGLang API Key
`providers.sglang.apiPath`	API path prefix
`capabilities`	Capability configuration node, supports: `textToImage`, `textToVideo`, `imageToVideo`, `imagesToVideo`, `imageQA`, `textToSpeech`, `digitalAvatar`
`capabilities.<name>.videoModel`	Specify video model under YuanJing provider: `wan` or `kling`
`defaultProvider`	Default provider for capabilities not individually configured
`outputDir`	Output directory for generated files
`videoPollInterval`	Video generation polling interval (ms), default 5000
`videoMaxWaitTime`	Maximum waiting time for video generation (ms), default 300000

🔍 SGLang Vision Configuration

The mediaclaw_image_qa capability uses the OpenAI compatible Vision interface in sglang mode:

Interface address: POST /chat/completions

Supports two path formats:

/v1/chat/completions
/openapi/v1/web_control/chat/completions

Configuration Suggestions:

If baseUrl already includes /openapi/v1/web_control, set apiPath to an empty string
If baseUrl is only the host address (e.g., http://127.0.0.1:30010), set apiPath to /v1

🛠️ Built-in Skills

Long Video Generation

Detailed description: skills/unicom-longvideo/SKILL.md

demo_longvide.mp4

Product Poster Generation

Generate product promotional posters with MediaClaw from a product brief, then review each result against a fixed marketing scorecard. The skill supports an iterative generate-review-improve loop for campaign visuals instead of a one-shot image workflow.

Detailed description: mediaclaw-plugin/skills/unicom-product-poster/SKILL.md

Digital Avatar Production

Detailed description: skills/unicom-digital-avatar/SKILL.md

demo_avatar.mp4

Video cut

Detailed description: skills/unicom-video-cut/SKILL.md

🙏 Acknowledgments

The development of MediaClaw is inseparable from the support of the open source community. We would like to express our special thanks to:

OpenClaw - Provides a powerful plugin gateway platform and ecological support, which is the operating foundation of MediaClaw
OpenClaw-Admin - Provides an excellent management interface framework, based on which we have customized and extended AIGC capabilities
All developers who contribute to open source projects

📄 License

MediaClaw is open sourced under the MIT License. You are free to use, modify and distribute it, but please retain the relevant copyright notice and acknowledgment information.

If this project is helpful to you, please give us a ⭐️ Star to support!

Name		Name	Last commit message	Last commit date
Latest commit History 34 Commits
OpenClaw-Admin		OpenClaw-Admin
assets		assets
mediaclaw-plugin		mediaclaw-plugin
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
README_CN.md		README_CN.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

MediaClaw

Multimodal Agent Platform

🚀 Introduction

✨ Core Features

📊 Capability Matrix

📦 Installation Guide

Environment Requirements

Plugin Installation

WebUI Installation

⚙️ Configuration Guide

Basic Configuration

Video Model Configuration

Configuration Parameter Details

🔍 SGLang Vision Configuration

🛠️ Built-in Skills

Long Video Generation

Product Poster Generation

Digital Avatar Production

Video cut

🙏 Acknowledgments

📄 License

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

MediaClaw

Multimodal Agent Platform

🚀 Introduction

✨ Core Features

📊 Capability Matrix

📦 Installation Guide

Environment Requirements

Plugin Installation

WebUI Installation

⚙️ Configuration Guide

Basic Configuration

Video Model Configuration

Configuration Parameter Details

🔍 SGLang Vision Configuration

🛠️ Built-in Skills

Long Video Generation

Product Poster Generation

Digital Avatar Production

Video cut

🙏 Acknowledgments

📄 License

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages