Skip to content

Enhanced MCP code execution. Agent framework-agnostic (optimized for Claude Code). Skills framework (99.6% token reduction), multi-transport, sandboxing

License

Notifications You must be signed in to change notification settings

yoloshii/mcp-code-execution-enhanced

Repository files navigation

MCP Code Execution - Enhanced Edition

99.6% Token Reduction through Skills-based CLI execution and progressive tool discovery for Model Context Protocol (MCP) servers.

License: MIT Python 3.11+ Claude Code Skills Framework

Note: This project is optimized for Claude Code with native Skills framework support. While the core runtime works with any AI agent, the Skills framework (99.6% token reduction) is designed for Claude Code's operational intelligence.


🎯 What This Is

An enhanced implementation of Anthropic's Code Execution with MCP pattern, optimized for Claude Code, combining the best ideas from the MCP community and adding significant improvements:

  • Skills Framework: Pattern for creating reusable CLI-based workflows (99.6% token reduction) - Claude Code optimized
  • Multi-Transport: Full support for stdio, SSE, and HTTP MCP servers
  • Container Sandboxing: Optional rootless isolation with security controls
  • Type Safety: Pydantic models throughout with full validation
  • Production-Ready: 129 passing tests, comprehensive error handling

🤖 Claude Code Integration

The Skills framework is designed to work with Claude Code's operational intelligence:

  • Agents discover skills via filesystem (ls ./skills/)
  • Skills use CLI arguments (immutable templates)
  • Compatible with Claude Code's agent workflow
  • Supports Claude Code's progressive disclosure pattern

Note: Core runtime (script writing, 98.7% reduction) works with any AI agent. Skills framework (99.6% reduction) is Claude Code optimized.


🙏 Acknowledgments

This project builds upon and merges ideas from:

  1. ipdelete/mcp-code-execution - Original implementation of Anthropic's PRIMARY pattern

    • Filesystem-based progressive disclosure
    • Type-safe Pydantic wrappers
    • Schema discovery system
    • Lazy server connections
  2. elusznik/mcp-server-code-execution-mode - Production security patterns

    • Container sandboxing architecture
    • Comprehensive security controls
    • Production deployment patterns

Our contribution: Merged the best of both, added Skills system with CLI-based execution, implemented multi-transport support, and refined the architecture for maximum efficiency.


✨ Key Enhancements

1. Skills Framework (NEW - 99.6% Token Reduction)

A pattern for creating reusable CLI-based workflow templates that agents execute with arguments:

# Simple example (generic)
uv run python -m runtime.harness skills/simple_fetch.py \
    --url "https://example.com"

# Pipeline example (generic)
uv run python -m runtime.harness skills/multi_tool_pipeline.py \
    --repo-path "." \
    --max-commits 5

Benefits over script writing:

  • 18x better tokens: 110 vs 2,000
  • 24x faster: 5 seconds vs 2 minutes
  • Immutable templates: No file editing
  • Reusable workflows: Same logic, different data

What's included:

  • Framework pattern and template (CLI-based, immutable)
  • 2 generic examples (simple_fetch.py, multi_tool_pipeline.py)

2. Multi-Transport Support (NEW)

Full support for all MCP transport types:

{
  "mcpServers": {
    "local-tool": {
      "type": "stdio",
      "command": "uvx",
      "args": ["mcp-server-git"]
    },
    "jina": {
      "type": "sse",
      "url": "https://mcp.jina.ai/sse",
      "headers": {"Authorization": "Bearer YOUR_KEY"}
    },
    "exa": {
      "type": "http",
      "url": "https://mcp.exa.ai/mcp",
      "headers": {"x-api-key": "YOUR_KEY"}
    }
  }
}

3. Container Sandboxing (Enhanced)

Optional rootless container execution with comprehensive security:

# Sandbox mode with security controls
uv run python -m runtime.harness workspace/script.py --sandbox

Security features:

  • Rootless execution (UID 65534:65534)
  • Network isolation (--network none)
  • Read-only root filesystem
  • Memory/CPU/PID limits
  • Capability dropping (--cap-drop ALL)
  • Timeout enforcement

🚀 Quick Start

Prerequisites

  • Claude Code (recommended for Skills framework support)
  • Python 3.11+ (3.14 not recommended due to anyio compatibility)
  • uv package manager
  • (Optional) Docker or Podman for sandboxing

Note: Skills framework (99.6% reduction) requires Claude Code. Core runtime (98.7% reduction) works with any AI agent.

Installation

# Clone repository
git clone https://github.com/yourusername/mcp-code-execution-enhanced.git
cd mcp-code-execution-enhanced

# Install dependencies
uv sync

# Verify installation
uv run python -c "from runtime.mcp_client import get_mcp_client_manager; print('✓ Ready')"

Configuration

Important for Claude Code Users: This project uses its own mcp_config.json for MCP server configuration, separate from Claude Code's global configuration (~/.claude.json). To avoid conflicts, you may want to disable MCP servers in Claude Code's configuration while using this project, or ensure they don't overlap.

Create mcp_config.json in the project root with your MCP servers:

{
  "mcpServers": {
    "git": {
      "type": "stdio",
      "command": "uvx",
      "args": ["mcp-server-git", "--repository", "."]
    },
    "fetch": {
      "type": "stdio",
      "command": "uvx",
      "args": ["mcp-server-fetch"]
    }
  },
  "sandbox": {
    "enabled": false,
    "runtime": "auto",
    "image": "python:3.11-slim"
  }
}

Generate Tool Wrappers

# Auto-generate Python wrappers from your MCP servers
uv run mcp-generate

# This creates typed wrappers in ./servers/

📖 How It Works

PREFERRED: Skills-Based Execution (99.6% reduction)

For multi-step workflows (research, data processing, synthesis):

  1. Discover skills: ls ./skills/ → see available skill templates and examples
  2. Read documentation: cat ./skills/simple_fetch.py → see CLI args and pattern
  3. Execute with parameters:
    uv run python -m runtime.harness skills/simple_fetch.py \
        --url "https://example.com"

Example Skills (Framework Demonstrations):

Generic examples (skills/):

  • simple_fetch.py - Basic single-tool execution pattern
  • multi_tool_pipeline.py - Multi-tool chaining pattern

Note: Skills is a framework - use these examples as templates to create workflows for your specific MCP servers and use cases.

ALTERNATIVE: Direct Script Writing (98.7% reduction)

For simple tasks or novel workflows:

  1. Explore tools: ls ./servers/ → discover available MCP tools
  2. Write script: Create Python script using tool imports
  3. Execute: uv run python -m runtime.harness workspace/script.py

Example script:

import asyncio
from runtime.mcp_client import call_mcp_tool

async def main():
    result = await call_mcp_tool(
        "git__git_log",
        {"repo_path": ".", "max_count": 10}
    )
    print(f"Fetched {len(result)} commits")
    return result

if __name__ == "__main__":
    asyncio.run(main())

🏗️ Architecture

Progressive Disclosure Pattern

Traditional Approach (High Token Usage):

Agent → MCP Server → [Full Tool Schemas 27,300 tokens] → Agent

Skills-Based (99.6% Reduction - PREFERRED):

Agent → Discovers skills → Reads skill docs → Executes with CLI args
Skill → Multi-server orchestration → Returns results
Tokens: ~110 (skill discovery + documentation)
Time: ~5 seconds

Script Writing (98.7% Reduction - ALTERNATIVE):

Agent → Discovers tools → Writes script
Script → MCP Server → Returns data
Agent → Processes/summarizes
Tokens: ~2,000 (tool discovery + script writing)
Time: ~2 minutes

Key Components

  • runtime/mcp_client.py: Lazy-loading MCP client manager with multi-transport support
  • runtime/harness.py: Dual-mode script execution (direct/sandbox)
  • runtime/generate_wrappers.py: Auto-generate typed wrappers from MCP schemas
  • runtime/sandbox/: Container sandboxing with security controls
  • skills/: 8 CLI-based immutable workflow templates

🎓 Skills System

Philosophy

DON'T: Write scripts from scratch each time DO: Use pre-written skills with CLI arguments

Creating Custom Skills

"""
SKILL: Your Skill Name

DESCRIPTION: What it does

CLI ARGUMENTS:
    --query    Research query (required)
    --limit    Max results (default: 10)

USAGE:
    uv run python -m runtime.harness skills/your_skill.py \
        --query "your question" \
        --limit 5
"""

import argparse
import asyncio
import sys

def parse_args():
    parser = argparse.ArgumentParser()
    parser.add_argument("--query", required=True)
    parser.add_argument("--limit", type=int, default=10)

    # Filter script path from args
    args_to_parse = [arg for arg in sys.argv[1:] if not arg.endswith(".py")]
    return parser.parse_args(args_to_parse)

async def main():
    args = parse_args()
    # Your workflow logic here
    return result

if __name__ == "__main__":
    asyncio.run(main())

See skills/README.md for complete documentation.


🔌 Multi-Transport Support

stdio (Subprocess-based)

{
  "type": "stdio",
  "command": "uvx",
  "args": ["mcp-server-name"],
  "env": {"API_KEY": "your-key"}
}

SSE (Server-Sent Events)

{
  "type": "sse",
  "url": "https://mcp.example.com/sse",
  "headers": {"Authorization": "Bearer YOUR_KEY"}
}

HTTP (Streamable HTTP)

{
  "type": "http",
  "url": "https://mcp.example.com/mcp",
  "headers": {"x-api-key": "YOUR_KEY"}
}

See docs/TRANSPORTS.md for detailed information.


🔐 Sandbox Mode

Configuration

{
  "sandbox": {
    "enabled": true,
    "runtime": "auto",
    "image": "python:3.11-slim",
    "memory_limit": "512m",
    "timeout": 30
  }
}

Security Controls

  • Rootless execution: UID 65534:65534 (nobody)
  • Network isolation: --network none
  • Filesystem: Read-only root, writable tmpfs
  • Resource limits: Memory, CPU, PID constraints
  • Capabilities: All dropped (--cap-drop ALL)
  • Security: no-new-privileges, SELinux labels

See SECURITY.md for complete security documentation.


🧪 Testing

# Run all tests (129 total)
uv run pytest

# Unit tests only
uv run pytest tests/unit/

# Integration tests (requires Docker/Podman for sandbox tests)
uv run pytest tests/integration/

# With coverage
uv run pytest --cov=src/runtime

📚 Documentation

  • README.md (this file) - Overview and quick start
  • CLAUDE.md - Quick reference for Claude Code
  • AGENTS.md.template - Template for adapting to other AI frameworks
  • skills/README.md - Skills system guide
  • skills/SKILLS.md - Complete skills documentation
  • docs/USAGE.md - Comprehensive user guide
  • docs/ARCHITECTURE.md - Technical architecture
  • docs/CONFIGURATION.md - MCP server configuration management (Claude Code vs project)
  • docs/TRANSPORTS.md - Transport-specific details
  • SECURITY.md - Security architecture and best practices

🛠️ Development

Code Quality

# Type checking
uv run mypy src/

# Formatting
uv run black src/ tests/

# Linting
uv run ruff check src/ tests/

Project Scripts

# Generate wrappers from tool definitions
uv run mcp-generate

# (Optional) Generate discovery config with LLM parameter generation
uv run mcp-generate-discovery

# (Optional) Execute safe tools and infer schemas
uv run mcp-discover

# Execute a script with MCP tools available
uv run mcp-exec workspace/script.py

# Execute in sandbox mode
uv run mcp-exec workspace/script.py --sandbox

📊 Efficiency Comparison

Approach Tokens Time Use Case
Traditional 27,300 N/A All tool schemas loaded upfront
Skills (NEW) 110 5 sec Multi-step workflows (PREFERRED)
Script Writing 2,000 2 min Novel workflows (ALTERNATIVE)

Skills achieve 99.6% reduction - exceeding Anthropic's 98.7% target!


🎨 What Makes This Enhanced

Beyond Original Projects

From ipdelete/mcp-code-execution:

  • ✅ Filesystem-based progressive disclosure
  • ✅ Type-safe Pydantic wrappers
  • ✅ Lazy server connections
  • ✅ Schema discovery system

From elusznik/mcp-server-code-execution-mode:

  • ✅ Container sandboxing architecture
  • ✅ Security controls and policies
  • ✅ Production deployment patterns

Enhanced in this project:

  • Skills system: CLI-based immutable templates (99.6% reduction)
  • Multi-transport: stdio + SSE + HTTP support (100% server coverage)
  • Dual-mode execution: Direct (fast) + Sandbox (secure)
  • Python 3.11 stable: Avoiding 3.14 anyio compatibility issues
  • Comprehensive testing: 129 tests covering all features
  • Enhanced documentation: Complete guides for all features

Architecture Innovations

Skills vs Scripts:

  • Skills are immutable templates executed with CLI arguments
  • No file editing required (parameters via --query, --num-urls, etc.)
  • Reusable across different queries and contexts
  • Pre-tested and documented workflows

Multi-Transport:

  • Single codebase supports all transport types
  • Automatic transport detection
  • Unified configuration format
  • Seamless server connections

Dual-Mode Execution:

  • Direct mode: Fast, full access (development)
  • Sandbox mode: Secure, isolated (production)
  • Same code, different security postures
  • Runtime selection via flag or config

🔧 Configuration Reference

Minimal Configuration

{
  "mcpServers": {
    "git": {
      "command": "uvx",
      "args": ["mcp-server-git", "--repository", "."]
    }
  }
}

Complete Configuration

{
  "mcpServers": {
    "local-stdio": {
      "type": "stdio",
      "command": "uvx",
      "args": ["mcp-server-name"],
      "env": {"API_KEY": "key"},
      "disabled": false
    },
    "remote-sse": {
      "type": "sse",
      "url": "https://mcp.example.com/sse",
      "headers": {"Authorization": "Bearer KEY"},
      "disabled": false
    },
    "remote-http": {
      "type": "http",
      "url": "https://mcp.example.com/mcp",
      "headers": {"x-api-key": "KEY"},
      "disabled": false
    }
  },
  "sandbox": {
    "enabled": false,
    "runtime": "auto",
    "image": "python:3.11-slim",
    "memory_limit": "512m",
    "cpu_limit": "1.0",
    "timeout": 30,
    "max_timeout": 120
  }
}

📦 Features

Core Features

  • 🦥 Lazy Loading: Servers connect only when tools are called
  • 🔒 Type Safety: Pydantic models for all tool inputs/outputs
  • 🔄 Defensive Coding: Handles variable MCP response structures
  • 📦 Auto-generated Wrappers: Typed Python functions from MCP schemas
  • 🛠️ Field Normalization: Handles inconsistent API casing

Enhanced Features

  • 🎯 Skills Framework: Pattern for CLI-based reusable workflows
  • 🔌 Multi-Transport: stdio, SSE, and HTTP support
  • 🔐 Container Sandboxing: Optional rootless isolation
  • 🧪 Comprehensive Testing: 129 tests with full coverage
  • 📖 Complete Documentation: Guides for every feature

🎓 Examples

See the examples/ directory for:

  • example_progressive_disclosure.py - Classic token reduction pattern
  • example_tool_chaining.py - LLM orchestration pattern
  • example_sandbox_usage.py - Container sandboxing demo
  • example_sandbox_simple.py - Basic sandbox usage

See the skills/ directory for production-ready workflows.


🐛 Troubleshooting

Common Issues

"MCP server not configured"

  • Check mcp_config.json server names match your calls

"Connection closed"

  • Verify server command: which <command>
  • Check server logs for startup errors

"Module not found"

  • Run uv run mcp-generate to regenerate wrappers
  • Ensure src/ is in PYTHONPATH (harness handles this)

Import errors in skills

  • Skills must be run via harness (sets PYTHONPATH)
  • Don't run skills directly: python skills/skill.py
  • Correct: uv run python -m runtime.harness skills/skill.py

Python Version Issues

Python 3.14 compatibility:

  • Not recommended due to anyio <4.9.0 breaking changes
  • Use Python 3.11 or 3.12 for stability
  • See issue tracker for updates

🤝 Contributing

We welcome contributions! Areas of interest:

  • New skills: Add more workflow templates
  • MCP server support: Test with different servers
  • Documentation: Improve guides and examples
  • Testing: Expand test coverage
  • Performance: Optimize token usage further

Development Setup

# Install with dev dependencies
uv sync --all-extras

# Run quality checks
uv run black src/ tests/
uv run mypy src/
uv run ruff check src/ tests/
uv run pytest

📄 License

MIT License - see LICENSE file for details


🔗 References

Original Projects

MCP Resources

Python Resources


🌟 Features Comparison

Feature Original (ipdelete) Bridge (elusznik) Enhanced (this)
Progressive Disclosure ✅ PRIMARY ⚠️ ALTERNATIVE ✅ PRIMARY
Token Reduction 98.7% ~95% 99.6%
Type Safety ✅ Pydantic ⚠️ Basic ✅ Enhanced
Sandboxing ❌ None ✅ Required ✅ Optional
Multi-Transport ❌ stdio only ❌ stdio only ✅ stdio/SSE/HTTP
Skills Framework ❌ None ❌ None ✅ Yes + examples
CLI Execution ❌ None ❌ None ✅ Immutable
Test Coverage ⚠️ Partial ⚠️ Partial ✅ Comprehensive
Python 3.11 ✅ Yes ⚠️ 3.12+ ✅ Stable

💡 Use Cases

Perfect For

  • ✅ AI agents needing to orchestrate multiple MCP tools
  • ✅ Research workflows (web search → read → synthesize)
  • ✅ Data processing pipelines (fetch → transform → output)
  • ✅ Code discovery (search → analyze → recommend)
  • ✅ Production deployments requiring security isolation
  • ✅ Teams needing reproducible research workflows

Not Ideal For

  • ❌ Single tool calls (use MCP directly instead)
  • ❌ Real-time interactive tools (better suited for direct integration)
  • ❌ GUI applications (command-line focused)

🚦 Getting Started Checklist

  • Install Python 3.11+ and uv
  • Clone repository
  • Run uv sync
  • Create mcp_config.json with your MCP servers
  • Run uv run mcp-generate to create wrappers
  • Try a skill: uv run python -m runtime.harness skills/simple_fetch.py --url "https://example.com"
  • Read AGENTS.md for operational guide
  • Explore skills/ for available workflows
  • Review docs/ for detailed documentation

❓ FAQ

Q: Why Skills instead of writing scripts? A: Skills achieve 99.6% token reduction vs 98.7% for scripts, and execute 24x faster (5 sec vs 2 min). They're pre-tested, documented, and immutable.

Q: Can I use this without Claude Code? A: Yes, but with limitations. The core runtime (script writing, 98.7% reduction) works with any AI agent. The Skills framework (99.6% reduction) is optimized for Claude Code's operational intelligence.

Q: Can I still write custom scripts? A: Yes! Skills are PREFERRED for common workflows (with Claude Code), but custom scripts are fully supported for novel use cases and other AI agents.

Q: What's the difference from the original projects? A: We merged the best of both (progressive disclosure + security), added Skills system, multi-transport support, and refined the architecture.

Q: Why Python 3.11 instead of 3.14? A: anyio <4.9.0 has compatibility issues with Python 3.14's asyncio changes. 3.11 is stable and well-tested.

Q: Is sandboxing required? A: No, it's optional. Use direct mode for development (fast), sandbox mode for production (secure).

Q: How do I add my own MCP servers? A: Add them to mcp_config.json, run uv run mcp-generate, and they're ready to use!


🎯 Next Steps

  1. Explore Skills: ls skills/ and cat skills/simple_fetch.py
  2. Try examples: Run the example skills or create your own
  3. Read CLAUDE.md: Quick operational guide (for Claude Code users)
  4. Review docs/: Deep dive into architecture
  5. Create custom skill: Follow the template for your use case

About

Enhanced MCP code execution. Agent framework-agnostic (optimized for Claude Code). Skills framework (99.6% token reduction), multi-transport, sandboxing

Resources

License

Contributing

Security policy

Stars

Watchers

Forks

Packages

No packages published