ironbee-cli

Name: ironbee-cli
Author: ironbee-ai

by ironbee-ai

Pending

IronBee CLI - Verification and Intelligence Layer for Agentic Development

136stars

3forks

TypeScript

Added 4/10/2026

View on GitHub Download ZIP

AI Agentsagentic-developmentai-agentbrowser-devtoolsbrowser-testing

Installation

# Add to your Claude Code skills
git clone https://github.com/ironbee-ai/ironbee-cli

Getting Started

Guides for using ai agents skills like ironbee-cli.

Caveman: Cut Claude Token Use by 65%
How agent-side prompt compression works, when to use it, and when not to.
What is an AI Skills Marketplace?
Definitions, how marketplaces work, and how to choose between them in 2026.
Getting Started with AI Skills

README.md

Agentic AI for Beginners

Build your first AI agent from scratch - tool use, ReAct pattern, memory, deployment

41 minBeginner

Comments (0)

to leave a comment.

No comments yet. Be the first to share your thoughts!

Related Skills

ECC

by affaan-m

The agent harness performance optimization system. Skills, instincts, memory, security, and research-first development for Claude Code, Codex, Opencode, Cursor and beyond.

191,281

code-context-engine microfolio

everything-claude-code

by affaan-m

The agent harness performance optimization system. Skills, instincts, memory, security, and research-first development for Claude Code, Codex, Opencode, Cursor and beyond.

185,940

28,768

JavaScript

AI Agentsai-agentsanthropic

View details

Compare

claude-code

by anthropics

Claude Code is an agentic coding tool that lives in your terminal, understands your codebase, and helps you code faster by executing routine tasks, explaining complex code, and handling git workflows - all through natural language commands.

120,031

19,897

Shell

AI Agents

View details

Compare

IronBee ensures that AI agents verify their code changes before completing a task. When an agent edits code, it cannot finish until it navigates to the affected pages, functionally tests the changes, and submits a passing verdict.

No more "it should work" — every change is tested.

IronBee also tracks every verification cycle — coding time, fix time, pass/fail rates, problematic files — and provides session and project-level analytics for LLM-powered semantic insights.

Powered by browser-devtools-mcp — the agent navigates pages, clicks buttons, fills forms, takes screenshots, checks console errors, and writes a structured verdict.

Demo

https://github.com/user-attachments/assets/9d4e602b-6c05-4b48-89a8-3df429d10e00

Supported Clients

| Client | Status | |--------|--------| | Claude Code | Supported | | Cursor | Supported | | Codex | Planned | | OpenCode | Planned |

Quick Start

Install IronBee globally

npm install -g @ironbee-ai/cli

Set up a project

cd your-project
ironbee install

This auto-detects your AI client and writes:

Hook configuration (so the client calls IronBee automatically)
Verification skill/rules (so the agent knows the workflow)
MCP server config (so the agent has browser access)
Browser-devtools permissions

Cursor: additional setup

Cursor requires manual activation of the MCP server after install:

Restart Cursor to load the new hooks and MCP config
Go to Settings → Tools & MCP and verify browser-devtools is enabled
If the server shows as enabled but tools are unavailable, toggle it off and on

Note: This is a known Cursor limitation — MCP servers added via mcp.json may need manual activation.

That's it

The next time your AI agent edits code, IronBee will require browser verification before the task can complete.

Commands

ironbee install [project-dir] [--client <name>]   Set up hooks and config
ironbee uninstall [project-dir] [--client <name>] Remove hooks and config
ironbee status [project-dir]                      Show verdict status for active sessions
ironbee verify [session-id]                       Dry-run verdict validation
ironbee analyze [session-id]                      Analyze session metrics (or all sessions)

Agent Commands (slash commands)

IronBee installs slash commands that the agent can use inside Claude Code or Cursor:

| Command | Description | |---------|-------------| | /ironbee-verify | Verify changes — focused on affected areas (default) | | /ironbee-verify full | Full verification — complete visual + functional + accessibility checklists | | /ironbee-verify visual | Visual-only — contrast, layout, spacing, fonts, images, theming | | /ironbee-verify functional | Functional-only — clicks, forms, navigation, data flow, error handling | | /ironbee-analyze | Run session analytics and provide LLM-powered semantic insights |

/ironbee-verify guides the agent through a systematic verification process. The default mode focuses on what changed, while full runs every checklist item. Use visual or functional to narrow the scope when you know what type of testing is needed.

Configuration

IronBee loads config from two locations (project overrides global):

Global: ~/.ironbee/config.json
Project: <project>/.ironbee/config.json

{
  "verifyPatterns": ["*.ts", "*.tsx", "*.css"],
  "additionalVerifyPatterns": ["*.mdx"],
  "ignoredVerifyPatterns": ["*.test.ts", "*.spec.ts"],
  "maxRetries": 5
}

| Key | Description | Default | |-----|-------------|---------| | verifyPatterns | Glob patterns for files that require verification (replaces defaults) | 40+ code extensions | | additionalVerifyPatterns | Extra patterns added on top of defaults | [] | | ignoredVerifyPatterns | Patterns to exclude from verification (checked first) | [] | | maxRetries | Max retry attempts before allowing completion | 3 |

Default verify patterns

By default, IronBee requires verification for common code file extensions: .ts, .tsx, .js, .jsx, .css, .scss, .html, .py, .go, .rs, .java, .vue, .svelte, and many more.

Non-code files like README.md, package.json, or .gitignore do not trigger verification.

Browser DevTools MCP config

By default, IronBee configures browser-devtools-mcp via npx. To customize the MCP server (e.g., use a local server or HTTP transport), add a browserDevTools key to your config:

{
  "browserDevTools": {
    "mcp": {
      "url": "http://localhost:4000/mcp"
    }
  }
}

Or with a custom stdio command:

{
  "browserDevTools": {
    "mcp": {
      "command": "node",
      "args": ["./my-server.js"],
      "env": { "MY_VAR": "value" }
    }
  }
}

You can also pass extra env vars to the default npx server without replacing it:

{
  "browserDevTools": {
    "env": { "BROWSER_HEADLESS_ENABLE": "true", "OTEL_ENABLE": "true" }
  }
}

| Key | Description | |-----|-------------| | browserDevTools.mcp | Full MCP server config — used as-is when provided. Supports command+args (stdio) or url (HTTP) | | browserDevTools.env | Extra env vars merged into the default config. Only used when mcp is not provided |

Note: IronBee always sets TOOL_NAME_PREFIX=bdt_ and TOOL_INPUT_METADATA_ENABLE=true — these cannot be overridden.

Verification Flow

When the agent tries to complete a task, IronBee runs these checks:

Were code files edited? — If no matching files were changed, the agent completes normally.
Were browser tools used? — The agent must have called: navigate, screenshot, accessibility snapshot, and console check.
Does a verdict exist? — The agent must submit a verdict via ironbee hook submit-verdict after testing.
Is the verdict valid? — Must include session_id, status, pages_tested, checks, console_errors, and network_failures.
Pass or fail? — Pass allows completion. Fail blocks the agent and asks it to fix the issues and re-verify.
Retry limit — After maxRetries failed attempts (default 3), the agent is allowed to complete but must report unresolved issues.

Verdict format

Verdicts are submitted via echo '<json>' | ironbee hook submit-verdict:

{
  "session_id": "<your-session-id>",
  "status": "pass",
  "pages_tested": ["http://localhost:3000/dashboard"],
  "checks": ["form submits successfully", "new item appears in list"],
  "console_errors": 0,
  "network_failures": 0
}

On failure, include an issues array describing what went wrong:

{
  "session_id": "<your-session-id>",
  "status": "fail",
  "pages_tested": ["http://localhost:3000/dashboard"],
  "checks": ["form renders", "submit button unresponsive"],
  "console_errors": 2,
  "network_failures": 0,
  "issues": ["button click handler not firing", "TypeError in console"]
}

On pass after a previous fail, include a fixes array describing what was fixed:

{
  "session_id": "<your-session-id>",
  "status": "pass",
  "pages_tested": ["http://localhost:3000/dashboard"],
  "checks": ["form submits successfully", "new item appears in list"],
  "console_errors": 0,
  "network_failures": 0,
  "fixes": ["reattached click handler to submit button", "fixed TypeError in event handler"]
}

The agent must submit a verdict after every verification attempt — both pass and fail. File edits are blocked until a verdict is submitted after using browser tools.

Session Isolation

Each AI session gets its own directory under .ironbee/sessions/<session-id>/:

.ironbee/sessions/<session-id>/
  actions.jsonl    # Event log (file edits, tool calls, verification markers)
  verdict.json     # Current verdict (cleared on code edit)
  state.json       # Session state (retries, active verification, trace ID, active fix, phase)
  session.log      # Debug log

This means parallel sessions (e.g., multiple Claude Code instances) don't interfere with each other.

Analytics

ironbee analyze provides metrics about verification sessions — how time is spent, how effective verifications are, and how confident we can be in the agent's code.

Usage

ironbee analyze <session-id>                    # single session analysis
ironbee analyze                                 # all sessions (project-level)
ironbee analyze --json                          # JSON output
ironbee analyze --detailed                      # include verdict details (checks, issues, fixes)
ironbee analyze --json --detailed               # JSON with verdict text for LLM semantic analysis
ironbee analyze <session-id> --json --detailed  # single session JSON with verdict details

The --detailed flag includes raw verdict text (checks, issues, fixes) in the output. This is designed for LLM-powered semantic analysis — use /ironbee-analyze in Claude