by sd0xdev
The harness layer for Claude Code — a reference implementation of harness engineering with hook-enforced dual review, state-machine gates that survive context compaction, and fail-closed safety where it counts. Quality gates that AI can't skip.
# Add to your Claude Code skills
git clone https://github.com/sd0xdev/sd0x-dev-flowGuides for using ai agents skills like sd0x-dev-flow.

Language: English | 繁體中文 | 简体中文 | 日本語 | 한국어 | Español
The harness layer for Claude Code.
Quality gates that AI can't skip. A reference implementation of AI Agent Harness Engineering for Claude Code — hook-enforced dual review, state-machine gates that survive context compaction, and fail-closed safety where it counts.
96 skills · 15 agents — ~4% of Claude's context window
Harness engineering is the discipline of engineering everything around the LLM — tool loops, context management, hooks, state machines, safety layers — as opposed to training the model itself. Mitchell Hashimoto coined the term in Feb 2026; Anthropic engineering and Martin Fowler have published on it; arXiv 2603.05344 formalizes it.
sd0x-dev-flow is a reference implementation. Each row below maps a canonical harness sub-problem to concrete code you can study:
| # | Harness sub-problem | sd0x-dev-flow implementation | Code evidence | |---|---------------------|------------------------------|---------------| | 1 | | → auto-loop with sentinel-driven transitions | + | | 2 | | / / gate markers parsed into durable state | (producer) + (parser) | | 3 | | stdout injection after SessionStart(compact) | | | 4 | | 5 hook event types dispatched to 8 scripts: PreToolUse / PostToolUse / Stop / SessionStart / UserPromptSubmit | (8 scripts) + | | 5 | | Skill frontmatter — e.g., has no Edit/Write | 86 of 95 public skills declare | | 6 | | 5 layers: pre-edit-guard → commit-msg-guard → pre-push-gate → stop-guard → sidecar fail-closed marker | + + | | 7 | | Dual review: Codex (primary) + Claude (secondary) dispatched in parallel on every review cycle | + (Dual Review Mode) | | 8 | | + + convergence plateau detection | (exit conditions + strategic reset) | | 9 | | confirmation + for destructive ops | + | | 10 | | Correction → record lesson → promote to rule after 3+ recurrences | |
No comments yet. Be the first to share your thoughts!
/codex-review-fast/precommit✅ Ready⛔ Blocked✅ All Pass[AUTO_LOOP_RESUME]allowed-tools/askallowed-toolsiteration_history.current_roundmax_rounds/dev/ttyAskUserQuestionMost harness projects cover 2–4 of these. sd0x-dev-flow covers all 10 — which makes the code useful as a study target, not just a tool.
| Without guardrails | With sd0x-dev-flow | |---|---| | AI skips review when context is long | Hook-enforced: stop-guard blocks incomplete reviews | | Single reviewer misses issues | Dual dispatch: Codex + secondary in parallel | | "Fixed it" without re-verification | Auto-loop: fix → re-review → pass → continue | | Review state lost after compact | State tracking: SessionStart hook re-injects |
# Install plugin
/plugin marketplace add sd0xdev/sd0x-dev-flow
/plugin install sd0x-dev-flow@sd0xdev-marketplace
# Configure your project
/project-setup
One command auto-detects framework, package manager, database, entrypoints, and scripts. Installs a subset of rules and hooks; the full plugin bundles 14 rules + 9 hooks.
Use --lite to only configure CLAUDE.md (skip rules/hooks).
flowchart LR
P["🎯 Plan"] --> B["🔨 Build"]
B --> G["🛡️ Gate"]
G --> S["🚀 Ship"]
P -.- P1["/codex-brainstorm<br/>/feasibility-study<br/>/tech-spec"]
B -.- B1["/feature-dev<br/>/bug-fix<br/>/codex-implement"]
G -.- G1["/codex-review-fast<br/>/precommit<br/>/codex-test-review"]
S -.- S1["/smart-commit<br/>/push-ci<br/>/create-pr<br/>/pr-review"]
The auto-loop engine enforces quality gates automatically — after code edits, the review command dispatches dual review (Codex MCP + secondary reviewer in parallel) in the same reply. Findings are deduplicated, severity-normalized, and aggregated into a single gate. In strict mode, hooks enforce fail-closed semantics: if the aggregate gate is incomplete, stop-guard blocks. See docs/hooks.md for mode and dependency details.
sequenceDiagram
participant D as Developer
participant C as Claude
participant X as Codex MCP
participant T as Secondary Reviewer
participant H as Hooks
D->>C: Edit code
H->>H: Track file change
C->>H: emit-review-gate PENDING
par Dual Review
C->>X: Codex review (sandbox)
and
C->>T: Task(code-reviewer)
end
X-->>C: Findings (primary)
T-->>C: Findings (secondary)
C->>C: Aggregate + dedup + gate
C->>H: emit-review-gate READY/BLOCKED
alt Issues found
C->>C: Fix all issues
C->>X: --continue threadId
X-->>C: Re-verify
end
C->>C: /precommit (auto)
C-->>D: ✅ All gates passed
Note over H: Strict mode: incomplete gate → blocked
v2.0 dispatches two independent reviewers in parallel — dual-review by default with degraded fallback modes:
| Reviewer | Role | Fallback | |----------|------|----------| | Codex MCP | Primary (sandbox, full diff) | Single-reviewer mode if unavailable | | Secondary (pr-review-toolkit) | Confidence-scored review | strict-reviewer → single mode |
Findings are severity-normalized (P0-Nit), deduplicated (file + issue key, ±5 line tolerance), and source-attributed (codex | toolkit | both).
Gate: ✅ Ready or ⛔ Blocked — in strict mode, incomplete gate = blocked.
| Capability | sd0x-dev-flow | gstack | Generic prompts | |---|---|---|---| | Enforced review gates | Hook + behavior layer | Suggestion only | None | | Dual-reviewer | Codex + secondary (parallel) | Single /review | None | | Auto-fix loop | Fix → re-review → pass | Manual | None | | Multi-agent research | /deep-research (3 agents) | None | None | | Adversarial validation | Nash equilibrium debate | None | None | | Self-improvement | Lesson log + rule promotion | /retro stats only | None | | Cross-tool support | Codex/Cursor/Windsurf | Claude/Codex/Gemini/Cursor | N/A |
| Good Fit | Not Ideal | |----------|-----------| | Solo or small-team projects with Claude Code | Teams not using Claude Code | | Projects needing automated review gates | One-off scripts with no CI | | Codex CLI / Cursor / Windsurf users (skills subset) | Projects requiring custom LLM providers | | Repos where quality gates prevent regressions | Repos with no test infrastructure |
# Install individual skills via Agent Skills standard
npx skills add sd0xdev/sd0x-dev-flow
# Generate AGENTS.md + install hooks (in Claude Code)
/codex-setup init
| Method | Tools | Coverage |
|--------|-------|----------|
| Plugin install | Claude Code | Full (96 skills, hooks, rules, auto-loop) |
| npx skills add | Codex CLI, Cursor, Windsurf, Aider | Skills only (96 skills) |
| /codex-setup init | Codex CLI | AGENTS.md kernel + git hooks |
Requirements: Claude Code 2.1+ | Codex MCP (optional — /codex-* skills require it; without it, review falls back to single-reviewer mode)
| Workflow | Commands | Gate | Enforced By |
|----------|----------|------|-------------|
| Feature | /feature-dev → /verify → /codex-review-fast → /precommit | ✅/⛔ | Hook + Behavior |
| Bug Fix | /issue-analyze → /bug-fix → /verify → /precommit | ✅/⛔ | Hook + Behavior |
| Auto-Loop | Code edit → /codex-review-fast → /precommit | ✅/⛔ | Hook |
| Doc Review | .md edit → /codex-review-doc | ✅/⛔ | Hook |
| Planning | /codex-brainstorm → /feasibility-study → /tech-spec | — | — |
| Onboarding | /project-setup → /repo-intake | — | — |
flowchart TD
subgraph feat ["🔨 Feature Development"]
F1["/feature-dev"] --> F2["Code + Tests"]
F2 --> F3["/verify"]
F3 --> F4["/codex-review-fast"]
F4 --> F5["/precommit"]
F5 --> F6["/upda