mcp-server-code-execution-mode

Name: mcp-server-code-execution-mode
Author: elusznik

Verified

An MCP server that executes Python code in isolated rootless containers with optional MCP server proxying. Implementation of Anthropic's and Cloudflare's ideas for reducing MCP tool definitions context bloat.

337stars

29forks

Python

Installation

# Add to your Claude Code skills
git clone https://github.com/elusznik/mcp-server-code-execution-mode

Getting Started

Guides for using ai agents skills like mcp-server-code-execution-mode.

Caveman: Cut Claude Token Use by 65%
How agent-side prompt compression works, when to use it, and when not to.
What is an AI Skills Marketplace?
Definitions, how marketplaces work, and how to choose between them in 2026.
Getting Started with AI Skills

Security ReportVerified

Last scanned: 5/29/2026

{
  "issues": [],
  "status": "PASSED",
  "scannedAt": "2026-05-29T07:57:31.582Z",
  "semgrepRan": false,
  "npmAuditRan": true,
  "pipAuditRan": false
}

README.md

Frequently Asked Questions

What is mcp-server-code-execution-mode?

mcp-server-code-execution-mode is an open-source ai agents skill for AI coding assistants such as Claude Code, Codex CLI, and ChatGPT, built by elusznik. An MCP server that executes Python code in isolated rootless containers with optional MCP server proxying. Implementation of Anthropic's and Cloudflare's ideas for reducing MCP tool definitions context bloat. It has 337 GitHub stars.

Is mcp-server-code-execution-mode safe to use?

Yes. mcp-server-code-execution-mode passed SkillsLLM's automated security scan — a dependency vulnerability audit plus prompt-injection heuristics — with no high-severity issues. You can read the full report in the Security Report section on this page.

How do I install mcp-server-code-execution-mode?

Clone the repository with "git clone https://github.com/elusznik/mcp-server-code-execution-mode" and add it to your Claude Code skills directory (see the Installation section above).

What programming language is mcp-server-code-execution-mode written in?

mcp-server-code-execution-mode is primarily written in Python. It is open-source under elusznik on GitHub, so you can review or fork the full source.

Are there alternatives to mcp-server-code-execution-mode?

Yes. SkillsLLM lists many other AI Agents skills you can browse and compare side by side. Open the AI Agents category from the badge at the top of this page, or use the Related Skills and comparison links further down to weigh mcp-server-code-execution-mode against similar tools.

Agentic AI for Beginners

Build your first AI agent from scratch - tool use, ReAct pattern, memory, deployment

41 minBeginner

Comments (0)

to leave a comment.

No comments yet. Be the first to share your thoughts!

Related Skills

superpowers

by obra

An agentic skills framework & software development methodology that works.

234,966

iai-personal-memory-engine mesh

MCP Code Execution Server: Zero-Context Discovery for 100+ MCP Tools

Stop paying 30,000 tokens per query. This bridge implements Anthropic's discovery pattern with rootless security—reducing MCP context from 30K to 200 tokens while proxying any stdio server.

Overview

This bridge implements the "Code Execution with MCP" pattern, a convergence of ideas from industry leaders:

Apple's CodeAct: "Your LLM Agent Acts Better when Generating Code."
Anthropic's Code execution with MCP: "Building more efficient agents."
Cloudflare's Code Mode: "LLMs are better at writing code to call MCP, than at calling MCP directly."
Docker's Dynamic MCPs: "Stop Hardcoding Your Agents’ World."
Terminal Bench's Terminus: "A realistic terminal environment for evaluating LLM agents."

Instead of exposing hundreds of individual tools to the LLM (which consumes massive context and confuses the model), this bridge exposes one tool: run_python. The LLM writes Python code to discover, call, and compose other tools.

Why This vs. JS "Code Mode"?

While there are JavaScript-based alternatives (like universal-tool-calling-protocol/code-mode), this project is built for Data Science and Security:

Feature	This Project (Python)	JS Code Mode (Node.js)
Native Language	Python (The language of AI/ML)	TypeScript/JavaScript
Data Science	Native (`pandas`, `numpy`, `scikit-learn`)	Impossible / Hacky
Isolation	Hard (Podman/Docker Containers)	Soft (Node.js VM)
Security	Enterprise (Rootless, No Net, Read-Only)	Process-level
Philosophy	Infrastructure (Standalone Bridge)	Library (Embeddable)

Choose this if: You want your agent to analyze data, generate charts, use scientific libraries, or if you require strict container-based isolation for running untrusted code.

What This Solves (That Others Don't)

The Pain: MCP Token Bankruptcy

Connect Claude to 11 MCP servers with ~100 tools = 30,000 tokens of tool schemas loaded into every prompt. That's $0.09 per query before you ask a single question. Scale to 50 servers and your context window breaks.

Why Existing "Solutions" Fail

Docker MCP Gateway: Manages containers beautifully, but still streams all tool schemas into Claude's context. No token optimization.
Cloudflare Code Mode: V8 isolates are fast, but you can't proxy your existing MCP servers (Serena, Wolfram, custom tools). Platform lock-in.
Academic Papers: Describe Anthropic's discovery pattern, but provide no hardened implementation.
Proofs of Concept: Skip security (no rootless), skip persistence (cold starts), skip proxying edge cases.

The Fix: Discovery-First Architecture

Constant 200-token overhead regardless of server count
Proxy any stdio MCP server into rootless containers
Fuzzy search across servers without preloading schemas
Production-hardened with capability dropping and security isolation

Architecture: How It Differs

Traditional MCP (Context-Bound)
┌─────────────────────────────┐
│   LLM Context (30K tokens)  │
│  - serverA.tool1: {...}     │
│  - serverA.tool2: {...}     │
│  - serverB.tool1: {...}     │
│  - … (dozens more)          │
└─────────────────────────────┘
        ↓
  LLM picks tool
        ↓
   Tool executes

This Bridge (Discovery-First)
┌─────────────────────────────┐
│  LLM Context (≈200 tokens)  │
│  “Use discovered_servers(), │
│   query_tool_docs(),        │
│   search_tool_docs()”       │
└─────────────────────────────┘
        ↓
      LLM discovers servers
        ↓
      LLM hydrates schemas
        ↓
      LLM writes Python
        ↓
   Bridge proxies execution

Result: constant overhead. Whether you manage 10 or 1000 tools, the system prompt stays right-sized and schemas flow only when requested.

Comparison At A Glance

Capability	Docker MCP Gateway	Cloudflare Code Mode	Research Patterns	This Bridge
Solves token bloat	❌ Manual preload	❌ Fixed catalog	❌ Theory only	✅ Discovery runtime
Universal MCP proxying	✅ Containers	⚠️ Platform-specific	❌ Not provided	✅ Any stdio server
Rootless security	⚠️ Optional	✅ V8 isolate	❌ Not addressed	✅ Cap-dropped sandbox
Auto-discovery	⚠️ Catalog-bound	❌ N/A	❌ Not implemented	✅ 12+ config paths
Tool doc search	❌	❌	⚠️ Conceptual	✅ `search_tool_docs()`
Production hardening	⚠️ Depends on you	✅ Managed service	❌ Prototype	✅ Tested bridge

Vs. Dynamic Toolsets (Speakeasy)

Speakeasy's Dynamic Toolsets use a 3-step flow: search_tools → describe_tools → execute_tool. While this saves tokens, it forces the agent into a "chatty" loop:

Search: "Find tools for GitHub issues"
Describe: "Get schema for create_issue"
Execute: "Call create_issue"

This Bridge (Code-First) collapses that loop:

Code: "Import mcp_github, search for 'issues', and create one if missing."

The agent writes a single Python script that performs discovery, logic, and execution in one round-trip. It's faster, cheaper (fewer intermediate LLM calls), and handles complex logic (loops, retries) that a simple "execute" tool cannot.

Vs. OneMCP (Gentoro)

OneMCP provides a "Handbook" chat interface where you ask questions and it plans execution. This is great for simple queries but turns the execution into a black box.

This Bridge gives the agent raw, sandboxed control. The agent isn't asking a black box to "do it"; the agent is the programmer, writing the exact code to interact with the API. This allows for precise edge-case handling and complex data processing that a natural language planner might miss.

Unique Features

Two-stage discovery – discovered_servers() reveals what exists; query_tool_docs(name) loads only the schemas you need.

Fuzzy search across servers – let the model find tools without memorising catalog names:

from mcp import runtime

matches = await runtime.search_tool_docs("calendar events", limit=5)
for hit in matches:
    print(hit["server"], hit["tool"], hit.get("description", ""))

Zero-copy proxying – every tool call stays within the sandbox, mirrored over stdio with strict timeouts.
Rootless by default – Podman/Docker containers run with --cap-drop=ALL, read-only root, no-new-privileges, and explicit memory/PID caps.
Compact + TOON output – minimal plain-text responses for most runs, with deterministic TOON blocks available via MCP_BRIDGE_OUTPUT_MODE=toon.

Who This Helps

Teams juggling double-digit MCP servers who cannot afford context bloat.
Agents that orchestrate loops, retries, and conditionals rather than single tool invocations.
Security-conscious operators who need rootless isolation for LLM-generated code.
Practitioners who want to reuse existing MCP catalogs without hand-curating manifests.

Philosophy: The "No-MCP" Approach

This server aligns with the philosophy that you might not need MCP at all for every little tool. Instead of building rigid MCP servers for simple tasks, you can use this server to give your agent raw, sandboxed access to Bash and Python.

Ad-Hoc Tools: Need a script to scrape a site or parse a file? Just write it and run it. No need to deploy a new MCP server.
Composability: Pipe outputs between commands, save intermediate results to files, and use standard Unix tools.
Safety: Unlike giving an agent raw shell access to your machine, this server runs everything in a secure, rootless container. You get the power of "Bash/Code" without the risk.

Key Features

🛡️ Robustness & Reliability

Lazy Runtime Detection: Starts up instantly even if Podman/Docker isn't ready. Checks for runtime only when code execution is requested.
Self-Reference Prevention: Automatically detects and skips configurations that would launch the bridge recursively.
Noise Filtering: Ignores benign JSON parse errors (like blank lines) from chatty MCP clients.
Smart Volume Sharing: Probes Podman VMs to ensure volume sharing works, even on older versions.

🔒 Security First

Rootless containers -