Autoresearch

Turn Claude Code, OpenCode, or OpenAI Codex into a relentless improvement engine.

Based on Karpathy's autoresearch — constraint + mechanical metric + autonomous iteration = compounding gains.

"Set the GOAL → The agent runs the LOOP → You wake up to results"

You don't need AGI. You need a goal, a metric, and a loop that never quits.

Now supports Claude Code, OpenCode, and OpenAI Codex.

How It Works · Commands · Quick Start · Guides · FAQ

      PLAN              LOOP             DEBUG              FIX            SECURE            SHIP
 ┌──────────┐     ┌──────────┐     ┌──────────┐     ┌──────────┐     ┌──────────┐     ┌──────────┐
 │   Goal   │     │  Modify  │     │   Find   │     │   Fix    │     │  STRIDE  │     │  Stage   │
 │  Metric  │────▶│  Verify  │────▶│   Bugs   │────▶│  Errors  │────▶│  OWASP   │────▶│  Deploy  │
 │  Scope   │     │  Keep/   │     │  Trace   │     │  Repair  │     │  Red     │     │ Release  │
 └──────────┘     │  Discard │     └──────────┘     └──────────┘     │  Team    │     └──────────┘
/autoresearch:    └──────────┘    /autoresearch:    /autoresearch:   └──────────┘    /autoresearch:
  plan            /autoresearch     debug              fix          /autoresearch:      ship
                                                                     security

 ┌──────────┐     ┌──────────┐     ┌──────────┐     ┌──────────┐     ┌──────────┐
 │  Probe   │     │ Scenario │     │ Predict  │     │  Learn   │     │  Reason  │
 │ Require- │     │   Edge   │     │ 5-Expert │     │   Docs   │     │  Debate  │
 │  ments   │     │   Cases  │     │  Swarm   │     │   Gen    │     │ Converge │
 └──────────┘     └──────────┘     └──────────┘     └──────────┘     └──────────┘
/autoresearch:   /autoresearch:   /autoresearch:   /autoresearch:   /autoresearch:
  probe            scenario         predict           learn           reason

Why This Exists

Karpathy's autoresearch demonstrated that a 630-line Python script could autonomously improve ML models overnight — 100 experiments per night — by following simple principles: one metric, constrained scope, fast verification, automatic rollback, git as memory.

Claude Autoresearch generalizes these principles to ANY domain. Not just ML — code, content, marketing, sales, HR, DevOps, or anything with a number you can measure.

How It Works

LOOP (FOREVER or N times):
  1. Review current state + git history + results log
  2. Pick the next change (based on what worked, what failed, what's untried)
  3. Make ONE focused change
  4. Git commit (before verification)
  5. Run mechanical verification (tests, benchmarks, scores)
  6. If improved → keep. If worse → git revert. If crashed → fix or skip.
  7. Log the result
  8. Repeat. Never stop until you interrupt (or N iterations complete).

Every improvement stacks. Every failure auto-reverts. Progress is logged in TSV format.

The Setup Phase

Before looping, Claude performs a one-time setup:

Read context — reads all in-scope files
Define goal — extracts or asks for a mechanical metric
Define scope — which files can be modified vs read-only
Establish baseline — runs verification on current state (iteration #0)
Confirm and go — shows setup, then begins the loop

8 Critical Rules

| # | Rule | |---|------| | 1 | Loop until done — unbounded: forever. Bounded: N times then summarize | | 2 | Read before write — understand full context before modifying | | 3 | One change per iteration — atomic changes. If it breaks, you know why | | 4 | Mechanical verification only — no subjective "looks good." Use metrics | | 5 | Automatic rollback — failed changes revert instantly | | 6 | Simplicity wins — equal results + less code = KEEP | | 7 | Git is memory — experiments committed with experiment: prefix, git revert preserves failed experiments in history, agent MUST read git log + git diff before each iteration | | 8 | When stuck, think harder — re-read, combine near-misses, try radical changes |

Commands

| Command | What it does | |---------|--------------| | /autoresearch | Run the autonomous iteration loop (unlimited) | | Iterations: N | Add to inline config to run exactly N iterations then stop | | /autoresearch:plan | Interactive wizard: Goal → Scope, Metric, Verify config | | /autoresearch:security | Autonomous STRIDE + OWASP + red-team security audit | | /autoresearch:ship | Universal shipping workflow (code, content, marketing, sales, research, design) | | /autoresearch:debug | Autonomous bug-hunting loop — scientific method + iterative investigation | | /autoresearch:fix | Autonomous fix loop — iteratively repair errors until zero remain | | /autoresearch:scenario | Scenario-driven use case generator — explore situations, edge cases, derivative scenarios | | /autoresearch:predict | Multi-persona prediction | Pre-analyze code from 5 expert perspectives before acting | | /autoresearch:learn | Autonomous documentation engine — scout codebase, generate/update docs, validate, fix loop | | /autoresearch:reason | Adversarial refinement — blind judge panel converges subjective content through isolated multi-agent debate | | /autoresearch:probe | Adversarial requirement / assumption interrogation — 8 personas probe user + codebase until net-new constraints saturate, emits ready-to-run autoresearch config | | Guard: <command> | Optional safety net — must pass for changes to be kept |

All commands use interactive setup when invoked without arguments. Just type the command — the agent will ask you what you need step by step with smart defaults based on your codebase. Power users can skip the wizard by providing flags inline.

OpenCode users: Commands use underscore naming (/autoresearch_debug, /autoresearch_fix, etc.) instead of colons. See OpenCode Quick Start below.

Codex users: Invoke the skill via $autoresearch mention syntax. Subcommands are keywords: $autoresearch plan, $autoresearch debug, etc. See Codex Quick Start below.

Quick Decision Guide

| I want to... | Use | |--------------|-----| | Improve test coverage / reduce bundle size / any metric | /autoresearch (add Iterations: N for bounded runs) | | Don't know what metric to use | /autoresearch:plan | | Run a security audit | /autoresearch:security | | Ship a PR / deployment / release | /autoresearch:ship | | Optimize without breaking existing tests | Add Guard: npm test | | Hunt all bugs in a codebase | /autoresearch:debug (add Iterations: 20 for bounded runs) | | Fix all errors (tests, types, lint) | /autoresearch:fix | | Debug then auto-fix | /autoresearch:debug --fix | | Check if something is ready to ship | /autoresearch:ship --checklist-only | | Explore edge cases for a feature | /autoresearch:scenario | | Generate test scenarios | /autoresearch:scenario --domain software --format test-scenarios | | Stress test a user journey | /autoresearch:scenario --depth deep | | I want expert opinions before I start | /autoresearch:predict | | Analyze this from multiple angles | /autoresearch:predict --chain debug | | Generate docs for a new codebase | /autoresearch:learn --mode init | | Update existing docs after changes | /autoresearch:learn --mode update | | Check if docs are stale | /autoresearch:learn --mode check | | Debate an architecture decision | /autoresearch:reason --domain software | | Refine a pitch or proposal adversarially | /autoresearch:reason --domain business | | Converge on best design then validate | /autoresearch:reason --chain predict | | Surface hidden constraints before starting | /autoresearch:probe | | Pre-flight a fuzzy goal then loop | /autoresearch:probe --chain plan,autoresearch | | Stress-test requirements adversarially | /autoresearch:probe --adversarial --depth deep |

Quick Start

Claude Code

Option A — Plugin install (recommended):

In Claude Code, run:

/plugin marketplace add uditgoenka/autoresearch
/plugin install autoresearch@autoresearch

That's it. All 11 commands are available after restarting Claude Code.

Note: Start a new Claude Code session after installing. Reference files aren't resolvable in the same session where installation happened — this is a Claude Code platform limitation.

Updating (no reinstall needed):

/plugin update autoresearch

That pulls the latest version. Run /reload-plugins to activate. No need to uninstall or re-clone.

Option B — Manual copy:

git clone https://githu

Autoresearch

Turn Claude Code, OpenCode, or OpenAI Codex into a relentless improvement engine.

Based on Karpathy's autoresearch — constraint + mechanical metric + autonomous iteration = compounding gains.

"Set the GOAL → The agent runs the LOOP → You wake up to results"

You don't need AGI. You need a goal, a metric, and a loop that never quits.

Now supports Claude Code, OpenCode, and OpenAI Codex.

How It Works · Commands · Quick Start · Guides · FAQ

      PLAN              LOOP             DEBUG              FIX            SECURE            SHIP
 ┌──────────┐     ┌──────────┐     ┌──────────┐     ┌──────────┐     ┌──────────┐     ┌──────────┐
 │   Goal   │     │  Modify  │     │   Find   │     │   Fix    │     │  STRIDE  │     │  Stage   │
 │  Metric  │────▶│  Verify  │────▶│   Bugs   │────▶│  Errors  │────▶│  OWASP   │────▶│  Deploy  │
 │  Scope   │     │  Keep/   │     │  Trace   │     │  Repair  │     │  Red     │     │ Release  │
 └──────────┘     │  Discard │     └──────────┘     └──────────┘     │  Team    │     └──────────┘
/autoresearch:    └──────────┘    /autoresearch:    /autoresearch:   └──────────┘    /autoresearch:
  plan            /autoresearch     debug              fix          /autoresearch:      ship
                                                                     security

 ┌──────────┐     ┌──────────┐     ┌──────────┐     ┌──────────┐     ┌──────────┐
 │  Probe   │     │ Scenario │     │ Predict  │     │  Learn   │     │  Reason  │
 │ Require- │     │   Edge   │     │ 5-Expert │     │   Docs   │     │  Debate  │
 │  ments   │     │   Cases  │     │  Swarm   │     │   Gen    │     │ Converge │
 └──────────┘     └──────────┘     └──────────┘     └──────────┘     └──────────┘
/autoresearch:   /autoresearch:   /autoresearch:   /autoresearch:   /autoresearch:
  probe            scenario         predict           learn           reason

Why This Exists

Claude Autoresearch generalizes these principles to ANY domain. Not just ML — code, content, marketing, sales, HR, DevOps, or anything with a number you can measure.

How It Works

LOOP (FOREVER or N times):
  1. Review current state + git history + results log
  2. Pick the next change (based on what worked, what failed, what's untried)
  3. Make ONE focused change
  4. Git commit (before verification)
  5. Run mechanical verification (tests, benchmarks, scores)
  6. If improved → keep. If worse → git revert. If crashed → fix or skip.
  7. Log the result
  8. Repeat. Never stop until you interrupt (or N iterations complete).

Every improvement stacks. Every failure auto-reverts. Progress is logged in TSV format.

The Setup Phase

Before looping, Claude performs a one-time setup:

Read context — reads all in-scope files
Define goal — extracts or asks for a mechanical metric
Define scope — which files can be modified vs read-only
Establish baseline — runs verification on current state (iteration #0)
Confirm and go — shows setup, then begins the loop

8 Critical Rules

Commands

OpenCode users: Commands use underscore naming (/autoresearch_debug, /autoresearch_fix, etc.) instead of colons. See OpenCode Quick Start below.

Codex users: Invoke the skill via $autoresearch mention syntax. Subcommands are keywords: $autoresearch plan, $autoresearch debug, etc. See Codex Quick Start below.

Quick Decision Guide

Quick Start

Claude Code

Option A — Plugin install (recommended):

In Claude Code, run:

/plugin marketplace add uditgoenka/autoresearch
/plugin install autoresearch@autoresearch

That's it. All 11 commands are available after restarting Claude Code.

Note: Start a new Claude Code session after installing. Reference files aren't resolvable in the same session where installation happened — this is a Claude Code platform limitation.

Updating (no reinstall needed):

/plugin update autoresearch

That pulls the latest version. Run /reload-plugins to activate. No need to uninstall or re-clone.

Option B — Manual copy:

git clone https://githu

autoresearch

Autoresearch

Why This Exists

How It Works

The Setup Phase

8 Critical Rules

Commands

Quick Decision Guide

Quick Start

Claude Code

Related Skills

autoresearch

Autoresearch

Why This Exists

How It Works

The Setup Phase

8 Critical Rules

Commands

Quick Decision Guide

Quick Start

Claude Code

Related Skills