by uditgoenka
Claude Autoresearch Skill — Autonomous goal-directed iteration for Claude Code. Inspired by Karpathy's autoresearch. Modify → Verify → Keep/Discard → Repeat forever.
# Add to your Claude Code skills
git clone https://github.com/uditgoenka/autoresearchLast scanned: 4/20/2026
{
"issues": [],
"status": "PASSED",
"scannedAt": "2026-04-20T06:19:31.908Z",
"semgrepRan": false,
"npmAuditRan": true,
"pipAuditRan": true
}No comments yet. Be the first to share your thoughts!
Based on votes and bookmarks from developers who liked this skill
30 days in the Featured rail · terms & refunds
Turn Claude Code, OpenCode, or OpenAI Codex into a relentless improvement engine.
Based on Karpathy's autoresearch — constraint + mechanical metric + autonomous iteration = compounding gains.
"Set the GOAL → The agent runs the LOOP → You wake up to results"
You don't need AGI. You need a goal, a metric, and a loop that never quits.
Supports Claude Code, OpenCode, and OpenAI Codex. 13 commands. 9 safety hooks. 95% fewer tokens per invocation.
How It Works · Commands · Quick Start · Guides · FAQ
PLAN LOOP DEBUG FIX SECURE SHIP
┌──────────┐ ┌──────────┐ ┌──────────┐ ┌──────────┐ ┌──────────┐ ┌──────────┐
│ Goal │ │ Modify │ │ Find │ │ Fix │ │ STRIDE │ │ Stage │
│ Metric │────▶│ Verify │────▶│ Bugs │────▶│ Errors │────▶│ OWASP │────▶│ Deploy │
│ Scope │ │ Keep/ │ │ Trace │ │ Repair │ │ Red │ │ Release │
└──────────┘ │ Discard │ └──────────┘ └──────────┘ │ Team │ └──────────┘
/autoresearch: └──────────┘ /autoresearch: /autoresearch: └──────────┘ /autoresearch:
plan /autoresearch debug fix /autoresearch: ship
security
┌──────────┐ ┌──────────┐ ┌──────────┐ ┌──────────┐ ┌──────────┐ ┌──────────┐
│ Probe │ │ Scenario │ │ Predict │ │ Learn │ │ Reason │ │ Improve │
│ Require- │ │ Edge │ │ 5-Expert │ │ Docs │ │ Debate │ │ Research │
│ ments │ │ Cases │ │ Swarm │ │ Gen │ │ Converge │ │ PRDs │
└──────────┘ └──────────┘ └──────────┘ └──────────┘ └──────────┘ └──────────┘
/autoresearch: /autoresearch: /autoresearch: /autoresearch: /autoresearch: /autoresearch:
probe scenario predict learn reason improve
┌──────────┐
│ Evals │
│ Analyze │
│ Results │
└──────────┘
/autoresearch:
evals
Karpathy's autoresearch demonstrated that a 630-line Python script could autonomously improve ML models overnight — 100 experiments per night — by following simple principles: one metric, constrained scope, fast verification, automatic rollback, git as memory.
Claude Autoresearch generalizes these principles to ANY domain. Not just ML — code, content, marketing, sales, HR, DevOps, or anything with a number you can measure.
v2.1.0 is a major architecture rebuild. The monolithic SKILL.md (813 lines, ~100K tokens per invocation) is replaced with a thin 41-line routing file and 12 self-contained command files (94–120 lines each, ~5–8K tokens per invocation). That is a 95% token reduction with the same capability surface.
LOOP (N iterations or until done):
1. Review current state + git history + results log
2. Pick the next change (based on what worked, what failed, what's untried)
3. Make ONE focused change
4. Git commit (before verification)
5. Run mechanical verification (tests, benchmarks, scores)
6. If improved → keep. If worse → git revert. If crashed → fix or skip.
7. Log the result
8. Repeat until N iterations complete or goal is met.
Every improvement stacks. Every failure auto-reverts. Progress is logged in TSV format.
Before looping, Claude performs a one-time setup:
| # | Rule |
|---|---|
| 1 | Bounded by default — every command has a default iteration count; unlimited is opt-in via Iterations: unlimited |
| 2 | Read before write — understand full context before modifying |
| 3 | One change per iteration — atomic changes; if it breaks, you know why |
| 4 | Mechanical verification only — no subjective "looks good"; use metrics |
| 5 | Automatic rollback — failed changes revert instantly |
| 6 | Simplicity wins — equal results + less code = keep |
| 7 | Git is memory — experiments committed with experiment: prefix; agent reads git log + git diff before each iteration |
| 8 | When stuck, think harder — re-read, combine near-misses, try radical changes |
v2.1.1 ships a 9-hook safety system that protects your sessions automatically. Hooks fire on every session — not just during autoresearch commands.
| Hook | What it does | Event |
|---|---|---|
| scout-block | Blocks node_modules/, .git/, pycache/, etc. from filling your context | PreToolUse |
| privacy-block | Blocks .env, SSH keys, credentials from being read in sessions | PreToolUse |
| dangerous-cmd-block | Blocks force-push, rm -rf, git reset --hard |
PreToolUse |
| iteration-context | Injects recent TSV iteration data after context compaction | UserPromptSubmit |
| subagent-context | Gives subagents awareness of active loop state | SubagentStart |
| dev-rules-reminder | Re-injects plan path and code standards after compaction | UserPromptSubmit |
| simplify-gate | Warns at 400 LOC, blocks at 800 LOC before shipping | UserPromptSubmit |
| session-init | Sets up project context at session start | SessionStart |
| stop-notify | Terminal notification + optional webhook on session end | SessionEnd |
All hooks are on by default. Disable individually:
# Disable a specific hook
export AR_DISABLE_SCOUT_BLOCK=1
export AR_DISABLE_PRIVACY_BLOCK=1
export AR_DISABLE_DANGEROUS_CMD_BLOCK=1
# ... etc for each hook name
Optional webhook for session completion notifications:
export AR_NOTIFY_WEBHOOK=https://hooks.slack.com/services/...
Customize blocked directories with a .ckignore file (gitignore syntax) at your project root.
See guide/hooks.md for full reference.
| Command | What it does | Default Iterations |
|---|---|---|
/autoresearch |
Core iterate loop: modify → verify → keep/discard | 25 |
/autoresearch:plan |
Convert goal into validated config | one-shot |
/autoresearch:debug |
Hunt bugs via hypothesis iteration | 15 |
/autoresearch:fix |
Crush errors one-by-one to zero | 20 |
/autoresearch:security |
STRIDE + OWASP audit with red-team | 15 |
/autoresearch:ship |
Ship through 8 phases | linear |
/autoresearch:scenario |
Generate edge cases across 12 dimensions | 20 |
/autoresearch:predict |
5 expert personas debate | one-shot |
/autoresearch:learn |
Scout → generate docs → validate → fix | 10 |
/autoresearch:reason |
Adversarial debate with blind judges | 8 |
/autoresearch:probe |
8 personas interrogate requirements | 15 |
/autoresearch:improve |
Research ICP, discover improvements, generate PRDs | 15 |
/autoresearch:evals |
Analyze iteration results: trends, plateaus | one-shot |
Universal flags: Iterations: N, Iterations: unlimited, --evals, --evals-interval N, --chain <targets>, --<subcommand> shorthand.
All commands use interactive setup when invoked without arguments. Just type the command — the agent asks for what it needs with smart defaults based on your codebase.
OpenCode users: Commands use underscore naming (
/autoresearch_debug,/autoresearch_fix, etc.). All 13 commands available.Codex users: Invoke via
$autoresearchmention syntax. Subcommands are keywords:$autoresearch debug,$autoresearch plan, etc.
| I want to... | Use |
|---|---|
| Improve test coverage / reduce bundle size / any metric | `/autorese |