by uditgoenka
Claude Autoresearch Skill — Autonomous goal-directed iteration for Claude Code. Inspired by Karpathy's autoresearch. Modify → Verify → Keep/Discard → Repeat forever.
# Add to your Claude Code skills
git clone https://github.com/uditgoenka/autoresearchLast scanned: 4/20/2026
{
"issues": [],
"status": "PASSED",
"scannedAt": "2026-04-20T06:19:31.908Z",
"semgrepRan": false,
"npmAuditRan": true,
"pipAuditRan": true
}Turn Claude Code, OpenCode, or OpenAI Codex into a relentless improvement engine.
Based on Karpathy's autoresearch — constraint + mechanical metric + autonomous iteration = compounding gains.
"Set the GOAL → The agent runs the LOOP → You wake up to results"
You don't need AGI. You need a goal, a metric, and a loop that never quits.
Now supports Claude Code, OpenCode, and OpenAI Codex.
How It Works · Commands · Quick Start · Guides · FAQ
PLAN LOOP DEBUG FIX SECURE SHIP
┌──────────┐ ┌──────────┐ ┌──────────┐ ┌──────────┐ ┌──────────┐ ┌──────────┐
│ Goal │ │ Modify │ │ Find │ │ Fix │ │ STRIDE │ │ Stage │
│ Metric │────▶│ Verify │────▶│ Bugs │────▶│ Errors │────▶│ OWASP │────▶│ Deploy │
│ Scope │ │ Keep/ │ │ Trace │ │ Repair │ │ Red │ │ Release │
└──────────┘ │ Discard │ └──────────┘ └──────────┘ │ Team │ └──────────┘
/autoresearch: └──────────┘ /autoresearch: /autoresearch: └──────────┘ /autoresearch:
plan /autoresearch debug fix /autoresearch: ship
security
┌──────────┐ ┌──────────┐ ┌──────────┐ ┌──────────┐ ┌──────────┐
│ Probe │ │ Scenario │ │ Predict │ │ Learn │ │ Reason │
│ Require- │ │ Edge │ │ 5-Expert │ │ Docs │ │ Debate │
│ ments │ │ Cases │ │ Swarm │ │ Gen │ │ Converge │
└──────────┘ └──────────┘ └──────────┘ └──────────┘ └──────────┘
/autoresearch: /autoresearch: /autoresearch: /autoresearch: /autoresearch:
probe scenario predict learn reason
Karpathy's autoresearch demonstrated that a 630-line Python script could autonomously improve ML models overnight — 100 experiments per night — by following simple principles: one metric, constrained scope, fast verification, automatic rollback, git as memory.
Claude Autoresearch generalizes these principles to ANY domain. Not just ML — code, content, marketing, sales, HR, DevOps, or anything with a number you can measure.
LOOP (FOREVER or N times):
1. Review current state + git history + results log
2. Pick the next change (based on what worked, what failed, what's untried)
3. Make ONE focused change
4. Git commit (before verification)
5. Run mechanical verification (tests, benchmarks, scores)
6. If improved → keep. If worse → git revert. If crashed → fix or skip.
7. Log the result
8. Repeat. Never stop until you interrupt (or N iterations complete).
Every improvement stacks. Every failure auto-reverts. Progress is logged in TSV format.
Before looping, Claude performs a one-time setup:
| # | Rule |
|---|------|
| 1 | Loop until done — unbounded: forever. Bounded: N times then summarize |
| 2 | Read before write — understand full context before modifying |
| 3 | One change per iteration — atomic changes. If it breaks, you know why |
| 4 | Mechanical verification only — no subjective "looks good." Use metrics |
| 5 | Automatic rollback — failed changes revert instantly |
| 6 | Simplicity wins — equal results + less code = KEEP |
| 7 | Git is memory — experiments committed with experiment: prefix, git revert preserves failed experiments in history, agent MUST read git log + git diff before each iteration |
| 8 | When stuck, think harder — re-read, combine near-misses, try radical changes |
| Command | What it does |
|---------|--------------|
| /autoresearch | Run the autonomous iteration loop (unlimited) |
| Iterations: N | Add to inline config to run exactly N iterations then stop |
| /autoresearch:plan | Interactive wizard: Goal → Scope, Metric, Verify config |
| /autoresearch:security | Autonomous STRIDE + OWASP + red-team security audit |
| /autoresearch:ship | Universal shipping workflow (code, content, marketing, sales, research, design) |
| /autoresearch:debug | Autonomous bug-hunting loop — scientific method + iterative investigation |
| /autoresearch:fix | Autonomous fix loop — iteratively repair errors until zero remain |
| /autoresearch:scenario | Scenario-driven use case generator — explore situations, edge cases, derivative scenarios |
| /autoresearch:predict | Multi-persona prediction | Pre-analyze code from 5 expert perspectives before acting |
| /autoresearch:learn | Autonomous documentation engine — scout codebase, generate/update docs, validate, fix loop |
| /autoresearch:reason | Adversarial refinement — blind judge panel converges subjective content through isolated multi-agent debate |
| /autoresearch:probe | Adversarial requirement / assumption interrogation — 8 personas probe user + codebase until net-new constraints saturate, emits ready-to-run autoresearch config |
| Guard: <command> | Optional safety net — must pass for changes to be kept |
All commands use interactive setup when invoked without arguments. Just type the command — the agent will ask you what you need step by step with smart defaults based on your codebase. Power users can skip the wizard by providing flags inline.
OpenCode users: Commands use underscore naming (
/autoresearch_debug,/autoresearch_fix, etc.) instead of colons. See OpenCode Quick Start below.Codex users: Invoke the skill via
$autoresearchmention syntax. Subcommands are keywords:$autoresearch plan,$autoresearch debug, etc. See Codex Quick Start below.
| I want to... | Use |
|--------------|-----|
| Improve test coverage / reduce bundle size / any metric | /autoresearch (add Iterations: N for bounded runs) |
| Don't know what metric to use | /autoresearch:plan |
| Run a security audit | /autoresearch:security |
| Ship a PR / deployment / release | /autoresearch:ship |
| Optimize without breaking existing tests | Add Guard: npm test |
| Hunt all bugs in a codebase | /autoresearch:debug (add Iterations: 20 for bounded runs) |
| Fix all errors (tests, types, lint) | /autoresearch:fix |
| Debug then auto-fix | /autoresearch:debug --fix |
| Check if something is ready to ship | /autoresearch:ship --checklist-only |
| Explore edge cases for a feature | /autoresearch:scenario |
| Generate test scenarios | /autoresearch:scenario --domain software --format test-scenarios |
| Stress test a user journey | /autoresearch:scenario --depth deep |
| I want expert opinions before I start | /autoresearch:predict |
| Analyze this from multiple angles | /autoresearch:predict --chain debug |
| Generate docs for a new codebase | /autoresearch:learn --mode init |
| Update existing docs after changes | /autoresearch:learn --mode update |
| Check if docs are stale | /autoresearch:learn --mode check |
| Debate an architecture decision | /autoresearch:reason --domain software |
| Refine a pitch or proposal adversarially | /autoresearch:reason --domain business |
| Converge on best design then validate | /autoresearch:reason --chain predict |
| Surface hidden constraints before starting | /autoresearch:probe |
| Pre-flight a fuzzy goal then loop | /autoresearch:probe --chain plan,autoresearch |
| Stress-test requirements adversarially | /autoresearch:probe --adversarial --depth deep |
Option A — Plugin install (recommended):
In Claude Code, run:
/plugin marketplace add uditgoenka/autoresearch
/plugin install autoresearch@autoresearch
That's it. All 11 commands are available after restarting Claude Code.
Note: Start a new Claude Code session after installing. Reference files aren't resolvable in the same session where installation happened — this is a Claude Code platform limitation.
Updating (no reinstall needed):
/plugin update autoresearch
That pulls the latest version. Run /reload-plugins to activate. No need to uninstall or re-clone.
Option B — Manual copy:
git clone https://githu
No comments yet. Be the first to share your thoughts!