99.6% Token Reduction through Skills-based CLI execution and progressive tool discovery for Model Context Protocol (MCP) servers.
Note: This project is optimized for Claude Code with native Skills framework support. While the core runtime works with any AI agent, the Skills framework (99.6% token reduction) is designed for Claude Code's operational intelligence.
An enhanced implementation of Anthropic's Code Execution with MCP pattern, optimized for Claude Code, combining the best ideas from the MCP community and adding significant improvements:
- Skills Framework: Pattern for creating reusable CLI-based workflows (99.6% token reduction) - Claude Code optimized
- Multi-Transport: Full support for stdio, SSE, and HTTP MCP servers
- Container Sandboxing: Optional rootless isolation with security controls
- Type Safety: Pydantic models throughout with full validation
- Production-Ready: 129 passing tests, comprehensive error handling
The Skills framework is designed to work with Claude Code's operational intelligence:
- Agents discover skills via filesystem (
ls ./skills/) - Skills use CLI arguments (immutable templates)
- Compatible with Claude Code's agent workflow
- Supports Claude Code's progressive disclosure pattern
Note: Core runtime (script writing, 98.7% reduction) works with any AI agent. Skills framework (99.6% reduction) is Claude Code optimized.
This project builds upon and merges ideas from:
-
ipdelete/mcp-code-execution - Original implementation of Anthropic's PRIMARY pattern
- Filesystem-based progressive disclosure
- Type-safe Pydantic wrappers
- Schema discovery system
- Lazy server connections
-
elusznik/mcp-server-code-execution-mode - Production security patterns
- Container sandboxing architecture
- Comprehensive security controls
- Production deployment patterns
Our contribution: Merged the best of both, added Skills system with CLI-based execution, implemented multi-transport support, and refined the architecture for maximum efficiency.
A pattern for creating reusable CLI-based workflow templates that agents execute with arguments:
# Simple example (generic)
uv run python -m runtime.harness skills/simple_fetch.py \
--url "https://example.com"
# Pipeline example (generic)
uv run python -m runtime.harness skills/multi_tool_pipeline.py \
--repo-path "." \
--max-commits 5Benefits over script writing:
- 18x better tokens: 110 vs 2,000
- 24x faster: 5 seconds vs 2 minutes
- Immutable templates: No file editing
- Reusable workflows: Same logic, different data
What's included:
- Framework pattern and template (CLI-based, immutable)
- 2 generic examples (simple_fetch.py, multi_tool_pipeline.py)
Full support for all MCP transport types:
{
"mcpServers": {
"local-tool": {
"type": "stdio",
"command": "uvx",
"args": ["mcp-server-git"]
},
"jina": {
"type": "sse",
"url": "https://mcp.jina.ai/sse",
"headers": {"Authorization": "Bearer YOUR_KEY"}
},
"exa": {
"type": "http",
"url": "https://mcp.exa.ai/mcp",
"headers": {"x-api-key": "YOUR_KEY"}
}
}
}Optional rootless container execution with comprehensive security:
# Sandbox mode with security controls
uv run python -m runtime.harness workspace/script.py --sandboxSecurity features:
- Rootless execution (UID 65534:65534)
- Network isolation (--network none)
- Read-only root filesystem
- Memory/CPU/PID limits
- Capability dropping (--cap-drop ALL)
- Timeout enforcement
- Claude Code (recommended for Skills framework support)
- Python 3.11+ (3.14 not recommended due to anyio compatibility)
- uv package manager
- (Optional) Docker or Podman for sandboxing
Note: Skills framework (99.6% reduction) requires Claude Code. Core runtime (98.7% reduction) works with any AI agent.
# Clone repository
git clone https://github.com/yourusername/mcp-code-execution-enhanced.git
cd mcp-code-execution-enhanced
# Install dependencies
uv sync
# Verify installation
uv run python -c "from runtime.mcp_client import get_mcp_client_manager; print('✓ Ready')"Important for Claude Code Users: This project uses its own
mcp_config.jsonfor MCP server configuration, separate from Claude Code's global configuration (~/.claude.json). To avoid conflicts, you may want to disable MCP servers in Claude Code's configuration while using this project, or ensure they don't overlap.
Create mcp_config.json in the project root with your MCP servers:
{
"mcpServers": {
"git": {
"type": "stdio",
"command": "uvx",
"args": ["mcp-server-git", "--repository", "."]
},
"fetch": {
"type": "stdio",
"command": "uvx",
"args": ["mcp-server-fetch"]
}
},
"sandbox": {
"enabled": false,
"runtime": "auto",
"image": "python:3.11-slim"
}
}# Auto-generate Python wrappers from your MCP servers
uv run mcp-generate
# This creates typed wrappers in ./servers/For multi-step workflows (research, data processing, synthesis):
- Discover skills:
ls ./skills/→ see available skill templates and examples - Read documentation:
cat ./skills/simple_fetch.py→ see CLI args and pattern - Execute with parameters:
uv run python -m runtime.harness skills/simple_fetch.py \ --url "https://example.com"
Example Skills (Framework Demonstrations):
Generic examples (skills/):
simple_fetch.py- Basic single-tool execution patternmulti_tool_pipeline.py- Multi-tool chaining pattern
Note: Skills is a framework - use these examples as templates to create workflows for your specific MCP servers and use cases.
For simple tasks or novel workflows:
- Explore tools:
ls ./servers/→ discover available MCP tools - Write script: Create Python script using tool imports
- Execute:
uv run python -m runtime.harness workspace/script.py
Example script:
import asyncio
from runtime.mcp_client import call_mcp_tool
async def main():
result = await call_mcp_tool(
"git__git_log",
{"repo_path": ".", "max_count": 10}
)
print(f"Fetched {len(result)} commits")
return result
if __name__ == "__main__":
asyncio.run(main())Traditional Approach (High Token Usage):
Agent → MCP Server → [Full Tool Schemas 27,300 tokens] → Agent
Skills-Based (99.6% Reduction - PREFERRED):
Agent → Discovers skills → Reads skill docs → Executes with CLI args
Skill → Multi-server orchestration → Returns results
Tokens: ~110 (skill discovery + documentation)
Time: ~5 seconds
Script Writing (98.7% Reduction - ALTERNATIVE):
Agent → Discovers tools → Writes script
Script → MCP Server → Returns data
Agent → Processes/summarizes
Tokens: ~2,000 (tool discovery + script writing)
Time: ~2 minutes
runtime/mcp_client.py: Lazy-loading MCP client manager with multi-transport supportruntime/harness.py: Dual-mode script execution (direct/sandbox)runtime/generate_wrappers.py: Auto-generate typed wrappers from MCP schemasruntime/sandbox/: Container sandboxing with security controlsskills/: 8 CLI-based immutable workflow templates
DON'T: Write scripts from scratch each time DO: Use pre-written skills with CLI arguments
"""
SKILL: Your Skill Name
DESCRIPTION: What it does
CLI ARGUMENTS:
--query Research query (required)
--limit Max results (default: 10)
USAGE:
uv run python -m runtime.harness skills/your_skill.py \
--query "your question" \
--limit 5
"""
import argparse
import asyncio
import sys
def parse_args():
parser = argparse.ArgumentParser()
parser.add_argument("--query", required=True)
parser.add_argument("--limit", type=int, default=10)
# Filter script path from args
args_to_parse = [arg for arg in sys.argv[1:] if not arg.endswith(".py")]
return parser.parse_args(args_to_parse)
async def main():
args = parse_args()
# Your workflow logic here
return result
if __name__ == "__main__":
asyncio.run(main())See skills/README.md for complete documentation.
{
"type": "stdio",
"command": "uvx",
"args": ["mcp-server-name"],
"env": {"API_KEY": "your-key"}
}{
"type": "sse",
"url": "https://mcp.example.com/sse",
"headers": {"Authorization": "Bearer YOUR_KEY"}
}{
"type": "http",
"url": "https://mcp.example.com/mcp",
"headers": {"x-api-key": "YOUR_KEY"}
}See docs/TRANSPORTS.md for detailed information.
{
"sandbox": {
"enabled": true,
"runtime": "auto",
"image": "python:3.11-slim",
"memory_limit": "512m",
"timeout": 30
}
}- Rootless execution: UID 65534:65534 (nobody)
- Network isolation:
--network none - Filesystem: Read-only root, writable tmpfs
- Resource limits: Memory, CPU, PID constraints
- Capabilities: All dropped (
--cap-drop ALL) - Security:
no-new-privileges, SELinux labels
See SECURITY.md for complete security documentation.
# Run all tests (129 total)
uv run pytest
# Unit tests only
uv run pytest tests/unit/
# Integration tests (requires Docker/Podman for sandbox tests)
uv run pytest tests/integration/
# With coverage
uv run pytest --cov=src/runtimeREADME.md(this file) - Overview and quick startCLAUDE.md- Quick reference for Claude CodeAGENTS.md.template- Template for adapting to other AI frameworksskills/README.md- Skills system guideskills/SKILLS.md- Complete skills documentationdocs/USAGE.md- Comprehensive user guidedocs/ARCHITECTURE.md- Technical architecturedocs/CONFIGURATION.md- MCP server configuration management (Claude Code vs project)docs/TRANSPORTS.md- Transport-specific detailsSECURITY.md- Security architecture and best practices
# Type checking
uv run mypy src/
# Formatting
uv run black src/ tests/
# Linting
uv run ruff check src/ tests/# Generate wrappers from tool definitions
uv run mcp-generate
# (Optional) Generate discovery config with LLM parameter generation
uv run mcp-generate-discovery
# (Optional) Execute safe tools and infer schemas
uv run mcp-discover
# Execute a script with MCP tools available
uv run mcp-exec workspace/script.py
# Execute in sandbox mode
uv run mcp-exec workspace/script.py --sandbox| Approach | Tokens | Time | Use Case |
|---|---|---|---|
| Traditional | 27,300 | N/A | All tool schemas loaded upfront |
| Skills (NEW) | 110 | 5 sec | Multi-step workflows (PREFERRED) |
| Script Writing | 2,000 | 2 min | Novel workflows (ALTERNATIVE) |
Skills achieve 99.6% reduction - exceeding Anthropic's 98.7% target!
From ipdelete/mcp-code-execution:
- ✅ Filesystem-based progressive disclosure
- ✅ Type-safe Pydantic wrappers
- ✅ Lazy server connections
- ✅ Schema discovery system
From elusznik/mcp-server-code-execution-mode:
- ✅ Container sandboxing architecture
- ✅ Security controls and policies
- ✅ Production deployment patterns
Enhanced in this project:
- ⭐ Skills system: CLI-based immutable templates (99.6% reduction)
- ⭐ Multi-transport: stdio + SSE + HTTP support (100% server coverage)
- ⭐ Dual-mode execution: Direct (fast) + Sandbox (secure)
- ⭐ Python 3.11 stable: Avoiding 3.14 anyio compatibility issues
- ⭐ Comprehensive testing: 129 tests covering all features
- ⭐ Enhanced documentation: Complete guides for all features
Skills vs Scripts:
- Skills are immutable templates executed with CLI arguments
- No file editing required (parameters via
--query,--num-urls, etc.) - Reusable across different queries and contexts
- Pre-tested and documented workflows
Multi-Transport:
- Single codebase supports all transport types
- Automatic transport detection
- Unified configuration format
- Seamless server connections
Dual-Mode Execution:
- Direct mode: Fast, full access (development)
- Sandbox mode: Secure, isolated (production)
- Same code, different security postures
- Runtime selection via flag or config
{
"mcpServers": {
"git": {
"command": "uvx",
"args": ["mcp-server-git", "--repository", "."]
}
}
}{
"mcpServers": {
"local-stdio": {
"type": "stdio",
"command": "uvx",
"args": ["mcp-server-name"],
"env": {"API_KEY": "key"},
"disabled": false
},
"remote-sse": {
"type": "sse",
"url": "https://mcp.example.com/sse",
"headers": {"Authorization": "Bearer KEY"},
"disabled": false
},
"remote-http": {
"type": "http",
"url": "https://mcp.example.com/mcp",
"headers": {"x-api-key": "KEY"},
"disabled": false
}
},
"sandbox": {
"enabled": false,
"runtime": "auto",
"image": "python:3.11-slim",
"memory_limit": "512m",
"cpu_limit": "1.0",
"timeout": 30,
"max_timeout": 120
}
}- 🦥 Lazy Loading: Servers connect only when tools are called
- 🔒 Type Safety: Pydantic models for all tool inputs/outputs
- 🔄 Defensive Coding: Handles variable MCP response structures
- 📦 Auto-generated Wrappers: Typed Python functions from MCP schemas
- 🛠️ Field Normalization: Handles inconsistent API casing
- 🎯 Skills Framework: Pattern for CLI-based reusable workflows
- 🔌 Multi-Transport: stdio, SSE, and HTTP support
- 🔐 Container Sandboxing: Optional rootless isolation
- 🧪 Comprehensive Testing: 129 tests with full coverage
- 📖 Complete Documentation: Guides for every feature
See the examples/ directory for:
example_progressive_disclosure.py- Classic token reduction patternexample_tool_chaining.py- LLM orchestration patternexample_sandbox_usage.py- Container sandboxing demoexample_sandbox_simple.py- Basic sandbox usage
See the skills/ directory for production-ready workflows.
"MCP server not configured"
- Check
mcp_config.jsonserver names match your calls
"Connection closed"
- Verify server command:
which <command> - Check server logs for startup errors
"Module not found"
- Run
uv run mcp-generateto regenerate wrappers - Ensure
src/is in PYTHONPATH (harness handles this)
Import errors in skills
- Skills must be run via harness (sets PYTHONPATH)
- Don't run skills directly:
python skills/skill.py❌ - Correct:
uv run python -m runtime.harness skills/skill.py✅
Python 3.14 compatibility:
- Not recommended due to anyio <4.9.0 breaking changes
- Use Python 3.11 or 3.12 for stability
- See issue tracker for updates
We welcome contributions! Areas of interest:
- New skills: Add more workflow templates
- MCP server support: Test with different servers
- Documentation: Improve guides and examples
- Testing: Expand test coverage
- Performance: Optimize token usage further
# Install with dev dependencies
uv sync --all-extras
# Run quality checks
uv run black src/ tests/
uv run mypy src/
uv run ruff check src/ tests/
uv run pytestMIT License - see LICENSE file for details
- ipdelete/mcp-code-execution - Anthropic's PRIMARY pattern
- elusznik/mcp-server-code-execution-mode - Production security
| Feature | Original (ipdelete) | Bridge (elusznik) | Enhanced (this) |
|---|---|---|---|
| Progressive Disclosure | ✅ PRIMARY | ✅ PRIMARY | |
| Token Reduction | 98.7% | ~95% | 99.6% |
| Type Safety | ✅ Pydantic | ✅ Enhanced | |
| Sandboxing | ❌ None | ✅ Required | ✅ Optional |
| Multi-Transport | ❌ stdio only | ❌ stdio only | ✅ stdio/SSE/HTTP |
| Skills Framework | ❌ None | ❌ None | ✅ Yes + examples |
| CLI Execution | ❌ None | ❌ None | ✅ Immutable |
| Test Coverage | ✅ Comprehensive | ||
| Python 3.11 | ✅ Yes | ✅ Stable |
- ✅ AI agents needing to orchestrate multiple MCP tools
- ✅ Research workflows (web search → read → synthesize)
- ✅ Data processing pipelines (fetch → transform → output)
- ✅ Code discovery (search → analyze → recommend)
- ✅ Production deployments requiring security isolation
- ✅ Teams needing reproducible research workflows
- ❌ Single tool calls (use MCP directly instead)
- ❌ Real-time interactive tools (better suited for direct integration)
- ❌ GUI applications (command-line focused)
- Install Python 3.11+ and uv
- Clone repository
- Run
uv sync - Create
mcp_config.jsonwith your MCP servers - Run
uv run mcp-generateto create wrappers - Try a skill:
uv run python -m runtime.harness skills/simple_fetch.py --url "https://example.com" - Read
AGENTS.mdfor operational guide - Explore
skills/for available workflows - Review
docs/for detailed documentation
Q: Why Skills instead of writing scripts? A: Skills achieve 99.6% token reduction vs 98.7% for scripts, and execute 24x faster (5 sec vs 2 min). They're pre-tested, documented, and immutable.
Q: Can I use this without Claude Code? A: Yes, but with limitations. The core runtime (script writing, 98.7% reduction) works with any AI agent. The Skills framework (99.6% reduction) is optimized for Claude Code's operational intelligence.
Q: Can I still write custom scripts? A: Yes! Skills are PREFERRED for common workflows (with Claude Code), but custom scripts are fully supported for novel use cases and other AI agents.
Q: What's the difference from the original projects? A: We merged the best of both (progressive disclosure + security), added Skills system, multi-transport support, and refined the architecture.
Q: Why Python 3.11 instead of 3.14? A: anyio <4.9.0 has compatibility issues with Python 3.14's asyncio changes. 3.11 is stable and well-tested.
Q: Is sandboxing required? A: No, it's optional. Use direct mode for development (fast), sandbox mode for production (secure).
Q: How do I add my own MCP servers?
A: Add them to mcp_config.json, run uv run mcp-generate, and they're ready to use!
- Explore Skills:
ls skills/andcat skills/simple_fetch.py - Try examples: Run the example skills or create your own
- Read CLAUDE.md: Quick operational guide (for Claude Code users)
- Review docs/: Deep dive into architecture
- Create custom skill: Follow the template for your use case