by evo-hq
A plugin for Claude Code and Codex that turns your codebase into an autoresearch loop — discovers what to measure, instruments the benchmark, then runs tree search with parallel subagents.
# Add to your Claude Code skills
git clone https://github.com/evo-hq/evoA plugin for Claude Code and Codex that optimizes code through experiments. You give it a codebase. It discovers metrics to optimize, sets up the evaluation, and starts running experiments in a loop -- trying things, keeping what improves the score, throwing away what doesn't.
Inspired by Karpathy's autoresearch -- where an LLM runs training experiments autonomously to beat its own best score. Autoresearch is a pure hill climb: try something, keep or revert, repeat on a single branch. Evo adds structure on top of that idea:
evo:discover explores the repo, figures out what to measure, and instruments the evaluation.Common requirements: Python 3.12+, git, uv.
/plugin marketplace add evo-hq/evo
/plugin install evo@evo-hq-evo
No comments yet. Be the first to share your thoughts!
Reload Claude Code. /evo:discover and /evo:optimize become available in any repo.
Codex needs the evo CLI installed globally. Install once, outside Codex:
uv tool install evo-hq-cli
# or: pipx install evo-hq-cli
evo --version # should print: evo-hq-cli x.x.x
Then add the plugin (requires Codex 0.121.0-alpha.2 or newer for the marketplace add command -- npm install -g @openai/codex@alpha if you're still on 0.120.0 stable):
codex marketplace add evo-hq/evo
Open Codex, run /plugins, find evo, install. Skills become available as $evo discover and $evo optimize.
Two skills:
evo:discover -- explores the repo, instruments the benchmark, runs baselineevo:optimize -- runs the optimization loop with parallel subagents until interruptedInvocation syntax differs by host:
/evo:discover, /evo:optimize$evo discover, $evo optimizeevo:optimize accepts optional parameters:
| Parameter | Default | Description |
|-----------|---------|-------------|
| subagents | 5 | Number of parallel subagents per round |
| budget | 5 | Max iterations each subagent can run within its branch |
| stall | 5 | Consecutive rounds with no improvement before auto-stopping |
Example: /evo:optimize subagents=3 budget=10 stall=3 (or $evo optimize subagents=3 budget=10 stall=3 on Codex).
Typical flow:
you: evo:discover
evo: explores repo, instruments benchmark, runs baseline
you: evo:optimize
evo: spawns 5 subagents in parallel, each exploring a different direction
each subagent can run up to 5 iterations within its branch
orchestrator collects results, prunes dead branches, adjusts strategy
repeats until interrupted or stalled
Under the hood, each experiment gets its own git worktree branching from its parent. If the score improves and the gate passes, the experiment is committed. Otherwise it's discarded and the worktree is cleaned up.
Orchestrator (main agent)
- reads state, identifies failure patterns cross-cutting the tree
- writes a structured brief per subagent (objective, parent, boundaries, pointer traces)
- collects results, prunes dead branches, adjusts strategy for next round
Subagent 1 (background, budget: 5 iterations)
- reads traces, analyzes failures in its focus area
- formulates hypothesis, edits target, runs benchmark
- if budget remains and sees a follow-up, iterates on its branch
- returns: what it tried, what worked, what it learned
Subagent 2 (background, budget: 5 iterations)
...up to N subagents in parallel
The dashboard starts automatically when you run evo:discover (or evo init). When it comes up, the agent surfaces the URL in the chat:
Dashboard live: http://127.0.0.1:8080 (pid 12345)
If 8080 is busy, evo auto-increments (8081, 8082, ...) and prints the actual port. You can also start it manually:
uv run --project /path/to/evo evo dashboard --port 8080
The chosen port is persisted to .evo/dashboard.port so repeat runs re-use it.
For working on evo itself (not just using it):
git clone https://github.com/evo-hq/evo
uv run --project /path/to/evo evo status
uv run resolves dependencies on first use -- no pip install step.
Licensed under the Apache License 2.0.