A Claude Code plugin that turns natural language into blueprints, blueprints into parallel build plans, and build plans into working software with automated iteration, validation, and cross-model peer review.
# Add to your Claude Code skills
git clone https://github.com/JuliusBrussee/cavekitLast scanned: 5/5/2026
{
"issues": [],
"status": "PASSED",
"scannedAt": "2026-05-05T06:18:37.514Z",
"semgrepRan": false,
"npmAuditRan": true,
"pipAuditRan": true
}No comments yet. Be the first to share your thoughts!
Plan-then-execute forgets. SDD remembers — but most SDD frameworks bury that value under agent swarms, dashboards, and ceremony that costs more tokens than it saves.
Cavekit 4 is a rewrite from the ground up. It keeps only what earns its place:
SPEC.md at repo root survives context resets.§B entry; classes
of bug become §V invariants the spec never forgets.That's the whole pitch.
| cmd | job |
|---|---|
| /ck:spec | create / amend / backprop SPEC.md. Sole mutator. |
| /ck:build | native plan → execute against spec. Auto-backprops on failure. |
| /ck:check | read-only drift report. Lists §V / §I / §T violations. |
One line, via the skills CLI:
npx skills add JuliusBrussee/cavekit
Installs five skills into ~/.claude/skills/: spec, build, check
(the workflow) plus caveman and backprop (the utilities). Claude
activates each when its trigger context matches — e.g. "write a spec
for…" invokes spec, "build the next task" invokes build. Claude Code
picks them up on next launch.
Or via the Claude Code marketplace (also adds /ck:spec, /ck:build,
/ck:check slash commands):
/plugin marketplace add juliusbrussee/cavekit
/plugin install ck@cavekit
Or clone directly:
git clone https://github.com/juliusbrussee/cavekit.git ~/.claude/plugins/cavekit
See FORMAT.md. Fixed sections: §G goal, §C constraints,
§I interfaces, §V invariants, §T tasks (pipe table), §B bugs (pipe table).
FORMAT.md spec schema + caveman encoding rules
commands/ three slash-command entry points (/ck:spec, /ck:build, /ck:check)
skills/spec spec mutator (mirrors commands/spec.md as a skill)
skills/build plan-execute skill (mirrors commands/build.md)
skills/check drift report skill (mirrors commands/check.md)
skills/caveman encoding utility
skills/backprop bug → spec protocol (six steps)
cat SPEC.md is the dashboard.The previous generation is not deprecated — it is frozen at tag
v3.1.0 and
remains a fully working plugin.
What it is:
Spec-driven AI development with an autonomous execution loop. Four-command Hunt lifecycle (
/ck:sketch→/ck:map→/ck:make→/ck:check), plus/ck:ship,/ck:review,/ck:revise,/ck:status,/ck:design,/ck:research,/ck:init,/ck:config,/ck:resume,/ck:help— 16 slash commands total. 12 named sub-agents. Per-task token budgets, stop-hook state machine, model-tier routing, auto-backpropagation from test failures, tool-result caching, Codex peer review, Karpathy behavioral guardrails, caveman token compression, knowledge-graph integration, and design-system enforcement. Parallel wave execution and team mode.
Pick v3.1.0 if you want the full autonomous loop, parallel agents, peer review, or design-system workflow. Pick v4 if you want the distilled core — one spec, three commands, no orchestration.
Marketplace:
/plugin marketplace add juliusbrussee/cavekit@v3.1.0
/plugin install ck@cavekit
Git:
git clone -b v3.1.0 https://github.com/juliusbrussee/cavekit.git
Full docs live at the tag — git checkout v3.1.0 and read the README
there for command reference, skill catalog, and the Hunt lifecycle guide.
See UPGRADE.md. Honest framing:
context/kits/ investment.SPEC.md is plain markdown; nothing traps
you in either direction.Cavekit is one rock in the caveman family:
| repo | what | |---|---| | caveman | output compression skill — why use many token when few do trick | | cavemem | cross-agent persistent memory — why agent forget when agent can remember | | cavekit (you here) | spec-driven build loop — why agent guess when agent can know | | cavegemma | Gemma 4 31B fine-tuned on caveman pairs — why prompt every turn when weight remember |
The spec is the only artifact that earns its tokens. Everything else that costs tokens must either save more tokens later, or the user's attention, or it gets cut.
See CHANGELOG.md for the full v3 → v4 break.
MIT.