by sd0xdev
The harness layer for Claude Code — a reference implementation of harness engineering with hook-enforced dual review, state-machine gates that survive context compaction, and fail-closed safety where it counts. Quality gates that AI can't skip.
# Add to your Claude Code skills
git clone https://github.com/sd0xdev/sd0x-dev-flowGuides for using ai agents skills like sd0x-dev-flow.
Last scanned: 5/30/2026
{
"issues": [],
"status": "PASSED",
"scannedAt": "2026-05-30T16:01:46.688Z",
"npmAuditRan": true,
"pipAuditRan": true
}sd0x-dev-flow is an open-source ai agents skill for AI coding assistants such as Claude Code, Codex CLI, and ChatGPT, built by sd0xdev. The harness layer for Claude Code — a reference implementation of harness engineering with hook-enforced dual review, state-machine gates that survive context compaction, and fail-closed safety where it counts. Quality gates that AI can't skip. It has 155 GitHub stars.
Yes. sd0x-dev-flow passed SkillsLLM's automated security scan — a dependency vulnerability audit plus prompt-injection heuristics — with no high-severity issues. You can read the full report in the Security Report section on this page.
Clone the repository with "git clone https://github.com/sd0xdev/sd0x-dev-flow" and add it to your Claude Code skills directory (see the Installation section above).
sd0x-dev-flow is primarily written in JavaScript. It is open-source under sd0xdev on GitHub, so you can review or fork the full source.
Yes. SkillsLLM lists many other AI Agents skills you can browse and compare side by side. Open the AI Agents category from the badge at the top of this page, or use the Related Skills and comparison links further down to weigh sd0x-dev-flow against similar tools.
No comments yet. Be the first to share your thoughts!

Language: English | 繁體中文 | 简体中文 | 日本語 | 한국어 | Español
The harness layer for Claude Code.
Quality gates that AI can't skip. A reference implementation of AI Agent Harness Engineering for Claude Code — hook-enforced dual review, state-machine gates that survive context compaction, and fail-closed safety where it counts.
96 bundled · 96 public skills · 15 agents — ~4% of Claude's context window
Harness engineering is the discipline of engineering everything around the LLM — tool loops, context management, hooks, state machines, safety layers — as opposed to training the model itself. Mitchell Hashimoto coined the term in Feb 2026; Anthropic engineering and Martin Fowler have published on it; arXiv 2603.05344 formalizes it.
sd0x-dev-flow is a reference implementation. Each row below maps a canonical harness sub-problem to concrete code you can study:
| # | Harness sub-problem | sd0x-dev-flow implementation | Code evidence |
|---|---|---|---|
| 1 | Tool loop control | /codex-review-fast → /precommit auto-loop with sentinel-driven transitions |
rules/auto-loop.md + hooks/post-tool-review-state.sh |
| 2 | Sentinel-driven state machine | ✅ Ready / ⛔ Blocked / ✅ All Pass gate markers parsed into durable state |
scripts/emit-review-gate.sh (producer) + hooks/post-tool-review-state.sh (parser) |
| 3 | Context recovery across compaction | [AUTO_LOOP_RESUME] stdout injection after SessionStart(compact) |
hooks/post-compact-auto-loop.sh |
| 4 | Lifecycle interceptors | 5 hook event types dispatched to 8 scripts: PreToolUse / PostToolUse / Stop / SessionStart / UserPromptSubmit | hooks/ (8 scripts) + .claude/settings.json |
| 5 | Capability-based tool gating | Skill frontmatter allowed-tools — e.g., /ask has no Edit/Write |
86 of 95 public skills declare allowed-tools |
| 6 | Defense-in-depth safety | 5 layers: pre-edit-guard → commit-msg-guard → pre-push-gate → stop-guard → sidecar fail-closed marker | scripts/pre-push-gate.sh + scripts/commit-msg-guard.sh + hooks/stop-guard.sh |
| 7 | Generator-evaluator split | Dual review: Codex (primary) + Claude (secondary) dispatched in parallel on every review cycle | rules/codex-invocation.md + rules/auto-loop.md (Dual Review Mode) |
| 8 | Incremental progress tracking | iteration_history.current_round + max_rounds + convergence plateau detection |
rules/auto-loop.md (exit conditions + strategic reset) |
| 9 | Human-in-the-loop safety gates | /dev/tty confirmation + AskUserQuestion for destructive ops |
scripts/pre-push-gate.sh + skills/push-ci/SKILL.md |
| 10 | Self-improvement loop | Correction → record lesson → promote to rule after 3+ recurrences | rules/self-improvement.md |
Most harness projects cover 2–4 of these. sd0x-dev-flow covers all 10 — which makes the code useful as a study target, not just a tool.
| Without guardrails | With sd0x-dev-flow |
|---|---|
| AI skips review when context is long | Hook-enforced: stop-guard blocks incomplete reviews |
| Single reviewer misses issues | Dual dispatch: Codex + secondary in parallel |
| "Fixed it" without re-verification | Auto-loop: fix → re-review → pass → continue |
| Review state lost after compact | State tracking: SessionStart hook re-injects |
# Install plugin
/plugin marketplace add sd0xdev/sd0x-dev-flow
/plugin install sd0x-dev-flow@sd0xdev-marketplace
# Configure your project
/project-setup
One command auto-detects framework, package manager, database, entrypoints, and scripts. Installs a subset of rules and hooks; the full plugin bundles 14 rules + 9 hooks.
Use --lite to only configure CLAUDE.md (skip rules/hooks).
flowchart LR
P["🎯 Plan"] --> B["🔨 Build"]
B --> G["🛡️ Gate"]
G --> S["🚀 Ship"]
P -.- P1["/codex-brainstorm<br/>/feasibility-study<br/>/tech-spec"]
B -.- B1["/feature-dev<br/>/bug-fix<br/>/codex-implement"]
G -.- G1["/codex-review-fast<br/>/precommit<br/>/codex-test-review"]
S -.- S1["/smart-commit<br/>/push-ci<br/>/create-pr<br/>/pr-review"]
The auto-loop engine enforces quality gates automatically — after code edits, the review command dispatches dual review (Codex MCP + secondary reviewer in parallel) in the same reply. Findings are deduplicated, severity-normalized, and aggregated into a single gate. In strict mode, hooks enforce fail-closed semantics: if the aggregate gate is incomplete, stop-guard blocks. See docs/hooks.md for mode and dependency details.
sequenceDiagram
participant D as Developer
participant C as Claude
participant X as Codex MCP
participant T as Secondary Reviewer
participant H as Hooks
D->>C: Edit code
H->>H: Track file change
C->>H: emit-review-gate PENDING
par Dual Review
C->>X: Codex review (sandbox)
and
C->>T: Task(code-reviewer)
end
X-->>C: Findings (primary)
T-->>C: Findings (secondary)
C->>C: Aggregate + dedup + gate
C->>H: emit-review-gate READY/BLOCKED
alt Issues found
C->>C: Fix all issues
C->>X: --continue threadId
X-->>C: Re-verify
end
C->>C: /precommit (auto)
C-->>D: ✅ All gates passed
Note over H: Strict mode: incomplete gate → blocked
v2.0 dispatches two independent reviewers in parallel — dual-review by default with degraded fallback modes:
| Reviewer | Role | Fallback |
|---|---|---|
| Codex MCP | Primary (sandbox, full diff) | Single-reviewer mode if unavailable |
| Secondary (pr-review-toolkit) | Confidence-scored review | strict-reviewer → single mode |
Findings are severity-normalized (P0-Nit), deduplicated (file + issue key, ±5 line tolerance), and source-attributed (codex | toolkit | both).
Gate: ✅ Ready or ⛔ Blocked — in strict mode, incomplete gate = blocked.
| Capability | sd0x-dev-flow | gstack | Generic prompts |
|---|---|---|---|
| Enforced review gates | Hook + behavior layer | Suggestion only | None |
| Dual-reviewer | Codex + secondary (parallel) | Single /review | None |
| Auto-fix loop | Fix → re-review → pass | Manual | None |
| Multi-agent research | /deep-research (3 agents) | None | None |
| Adversarial validation | Nash equilibrium debate | None | None |
| Self-improvement | Lesson log + rule promotion | /retro stats only | None |
| Cross-tool support | Codex/Cursor/Windsurf | Claude/Codex/Gemini/Cursor | N/A |
| Good Fit | Not Ideal |
|---|---|
| Solo or small-team projects with Claude Code | Teams not using Claude Code |
| Projects needing automated review gates | One-off scripts with no CI |
| Codex CLI / Cursor / Windsurf users (skills subset) | Projects requiring custom LLM providers |
| Repos where quality gates prevent regressions | Repos with no test infrastructure |
# Install individual skills via Agent Skills standard
npx skills add sd0xdev/sd0x-dev-flow
# Generate AGENTS.md + install hooks (in Claude Code)
/codex-setup init
| Method | Tools | Coverage |
|---|---|---|
| Plugin install | Claude Code | Full (96 bundled skills, hooks, rules, auto-loop) |
npx skills add |
Codex CLI, Cursor, Windsurf, Aider | Skills only (96 public skills) |
/codex-setup init |
Codex CLI | AGENTS.md kernel + git hooks |
Requirements: Claude Code 2.1+ | Codex MCP (optional — /codex-* skills require it; without it, review falls back to single-reviewer mode)
| Workflow | Commands | Gate | Enforced By |
|---|---|---|---|
| Feature | /feature-dev → /verify → /codex-review-fast → /precommit |
✅/⛔ | Hook + Behavior |
| Bug Fix | /issue-analyze → /bug-fix → /verify → /precommit |
✅/⛔ | Hook + Behavior |
| Auto-Loop | Code edit → /codex-review-fast → /precommit |
✅/⛔ | Hook |
| Doc Review | .md edit → /codex-review-doc |
✅/⛔ | Hook |
| Planning | /codex-brainstorm → /feasibility-study → /tech-spec |
— | — |
| Onboarding | /project-setup → /repo-intake |
— | — |
flowchart TD
subgraph feat ["🔨 Feature Development"]
F1["/feature-dev"] --> F2["Code + Tests"]
F2 --> F3["/verify"]
F3 --> F4["/codex-review-fast"]
F4 --> F5["/p