evo

Name: evo
Author: evo-hq

Verified

turns your codebase into an autoresearch loop — discovers what to measure, instruments the benchmark, then runs tree search with parallel subagents.

1,337stars

99forks

Python

Installation

# Add to your Claude Code skills
git clone https://github.com/evo-hq/evo

Getting Started

Guides for using ai agents skills like evo.

Caveman: Cut Claude Token Use by 65%
How agent-side prompt compression works, when to use it, and when not to.
What is an AI Skills Marketplace?
Definitions, how marketplaces work, and how to choose between them in 2026.
Getting Started with AI Skills

Security ReportVerified

Last scanned: 5/9/2026

{
  "issues": [],
  "status": "PASSED",
  "scannedAt": "2026-05-09T06:17:16.863Z",
  "semgrepRan": false,
  "npmAuditRan": true,
  "pipAuditRan": true
}

README.md

Frequently Asked Questions

What is evo?

evo is an open-source ai agents skill for AI coding assistants such as Claude Code, Codex CLI, and ChatGPT, built by evo-hq. turns your codebase into an autoresearch loop — discovers what to measure, instruments the benchmark, then runs tree search with parallel subagents. It has 1,337 GitHub stars.

Is evo safe to use?

Yes. evo passed SkillsLLM's automated security scan — a dependency vulnerability audit plus prompt-injection heuristics — with no high-severity issues. You can read the full report in the Security Report section on this page.

How do I install evo?

Clone the repository with "git clone https://github.com/evo-hq/evo" and add it to your Claude Code skills directory (see the Installation section above).

What programming language is evo written in?

evo is primarily written in Python. It is open-source under evo-hq on GitHub, so you can review or fork the full source.

Are there alternatives to evo?

Yes. SkillsLLM lists many other AI Agents skills you can browse and compare side by side. Open the AI Agents category from the badge at the top of this page, or use the Related Skills and comparison links further down to weigh evo against similar tools.

Agentic AI for Beginners

Build your first AI agent from scratch - tool use, ReAct pattern, memory, deployment

41 minBeginner

Comments (0)

to leave a comment.

No comments yet. Be the first to share your thoughts!

Related Skills

superpowers

by obra

An agentic skills framework & software development methodology that works.

234,966

fim-one overstory

evo

Get started with autoresearch on any codebase - with two simple commands.

Do you want to do more with autoresearch or need a custom, hands-on deployment? Request access to evo platform or email hello@evo-hq.com.

Try it · Install · How it works · Dashboard · Upgrading

You give it a codebase. It discovers metrics to optimize, sets up the evaluation, and starts running experiments in a loop -- trying things, keeping what improves the score, throwing away what doesn't.

Inspired by Karpathy's autoresearch -- where an LLM runs training experiments autonomously to beat its own best score. Autoresearch is a pure hill climb: try something, keep or revert, repeat on a single branch. Evo adds structure on top of that idea:

Tree search over greedy hill climb. Multiple directions can fork from any committed node, so exploration doesn't collapse to one path.
Parallel semi-autonomous agents. Spawn multiple subagents and run them simultaneously, each in its own git worktree. Each subagent reads traces, formulates hypotheses, and can run multiple iterations within its branch.
Shared state. Failure traces, annotations, and discarded hypotheses are accessible to every agent before it decides what to try next.
Gating. Regression tests or safety checks can be wired up as a gate. Experiments that don't pass get discarded.
Observability. A dashboard to monitor your experiments.
Benchmark discovery. The discover skill explores the repo, figures out what to measure, and instruments the evaluation.

Runs on Claude Code, Codex, Cursor, OpenClaw, Hermes, Opencode, or Pi. Experiments run locally or on remote sandboxes — Modal, E2B, Daytona, AWS, Azure, SSH.

Try it

Two commands:

/evo:discover     # one-time code discovery: figures out benchmarks and creates gates against unintended changes
/evo:optimize     # run the loop

discover asks what to optimize, the benchmark command, and the metric direction. Skip the questions by seeding the answer:

/evo:discover make the JSON parser at src/parser.py faster

Then run the loop:

/evo:optimize

evo sizes each round to your benchmark's resource profile — one experiment at a time when a run needs the whole GPU or another exclusive resource, wider when runs are independent — and keeps going until the score stops improving. By default it runs unattended and pushes edits through parallel subagents; say so in plain language if you'd rather it pause after each round or hold to one experiment at a time.

Invocation syntax is host-specific: /evo: on Claude Code, $evo on Codex, / skill menu on Cursor, natural language on Hermes, Opencode, OpenClaw, and Pi.

Install

# 1. evo CLI
uv tool install evo-hq-cli

# 2. Host CLI (if you don't already have it)
npm install -g @anthropic-ai/claude-code     # or @openai/codex, openclaw, @earendil-works/pi-coding-agent
# Cursor: install from cursor.com (IDE), or `curl https://cursor.com/install -fsS | bash` for the cursor-agent CLI

# 3. Plugin + host hooks
evo install <host>     # claude-code | codex | cursor | hermes | opencode | openclaw | pi

For remote backends, install with the matching provider extra: uv tool install 'evo-hq-cli[modal]' (or [e2b], [daytona], [aws], [azure], [all]).

Codex hook trust

evo install codex trusts evo's hooks for you. To review them yourself first, pass --no-trust-hooks, then approve via /hooks inside codex.

How it works

Parallel

The orchestrator dispatches subagents in parallel. Each runs in its own isolated workspace, picks up shared state (failure traces, annotations, discarded hypotheses), forms a hypothesis, edits, and runs the benchmark. A subagent with iteration budget remaining continues on its branch within the same round when its prior edit warrants a follow-up.

Frontier strategy

After each round, the orchestrator selects which committed branch to extend next. Available strategies:

argmax — extend the highest-scoring branch
top_k — round-robin among the K best
epsilon_greedy — best most of the time, random sometimes
softmax — sample weighted by score
pareto_per_task — keep specialists the aggregate hides, inspired by GEPA

Configure in the dashboard's Frontier tab, which lists each strategy's parameters.

Cross-cutting scans

Between rounds, RLM-inspired scan subagents read trace batches in parallel and surface compound failure patterns: gate-failure intersections, shared root causes across traces. Findings land in shared state, which the next round's subagents read at startup.

Gates

evo introduces gates: pass/fail checks that run on every experiment. An experiment that fails a gate is discarded even if its score beats the current best. Without gates, the search will find ways to return a constant, skip work, or trade correctness for speed.

Any command that exits zero on pass and non-zero on fail qualifies as a gate: a test suite, an invariant script, a score floor on a held-out slice of the benchmark. Gates inherit down the experiment tree: a gate registered at the root runs on every descendant. Narrower gates can be attached to specific branches.

When discover builds a benchmark from scratch, it attaches a held-out-slice score-floor gate automatically. When the benchmark already exists in the repo, gates are opt-in.

Where experiments run

Backend	Where	Install
worktree (default)	local git worktree per experiment	included
pool	reuse a fixed set of local workspaces	included
ssh	your own SSH host	included
modal	Modal serverless cloud	`uv tool install 'evo-hq-cli[modal]'`
e2b	E2B cloud sandboxes	`uv tool install 'evo-hq-cli[e2b]'`
daytona	Daytona cloud workspaces	`uv tool install 'evo-hq-cli[daytona]'`
aws	AWS EC2 sandboxes	`uv tool install 'evo-hq-cli[aws]'`
azure	Azure VMs	`uv tool install 'evo-hq-cli[azure]'`

Pick and configure in the dashboard's Backend tab.

Dashboard

The dashboard starts automatically with /evo:discover (or evo init) and prints the URL in chat:

Dashboard live: http://127.0.0.1:8080 (pid 12345)

If 8080 is in use, evo increments to the next free port (8081, 8082, …) and prints it. Subsequent runs reuse the chosen port. Start it manually with:

uv run --project /path/to/evo/plugins/evo evo dashboard --port 8080

Upgrading

evo update                           # update CLI + every installed host
evo update <host>                    # update one host (also bumps CLI to match)
evo update <host> --version 0.4.1    # pin to a release

Every evo install / evo update keeps the CLI on PATH in lockstep with the host plugin version it just installed (uv tool install --force evo-hq-cli under the hood). Without a --version pin that resolves to the latest stable release, so running an unpinned evo install/evo update against a pre-release pulls the CLI back to stable — pin both sides for an alpha (see Testing a pre-release). The CLI binary, the skill files, and the hook protocol share wire formats — letting them drift caused silent failures in earlier versions. Editable installs (uv tool install --editable, pip install -e) are detected and left untouched.

See evo update --help for --force, --scope, and additional flags.

Migrating from any pre-0.4.4 version

uv tool install --force evo-hq-cli && evo update --force

--force wipes the host plugin cache and reinstalls, working around anthropics/claude-code#14061: /plugin update returns success but does not replace cached plugin files.

Hooks failing with exit 127

The host lost evo's hook binary. Fixed in 0.5.1; reinstall the host to repair:

uv tool install --force evo-hq-cli && evo install codex --force   # or: evo install claude-code --force

evo doctor <host> confirms the result.

Testing a pre-release (alpha)

uv and pip skip pre-releases by default. To install an alpha, pin both the CLI version and the host plugin tag:

uv tool install --force 'evo-hq-cli==0.4.1a2' && \
  evo update --version 0.4.1-alpha.2 --force

Substitute the target alpha version. The CLI uses PEP 440 form (0.4.1a2); the marketplace tag uses the dash form (v0.4.1-alpha.2).

Telemetry

evo sends anonymous telemetry and usage stats that helps us improve evo. Disable it globally anytime:

evo telemetry off

Or for one command/session:

EVO_TELEMETRY=0 evo ...

Dev install

For development on evo:

git clone https://github.com/evo-hq/evo
cd evo
uv tool install --editable plugins/evo

License

Apache-2.0. See LICENSE.

Citation

If you use evo in your work, please cite it (see CITATION.cff):