by greyhaven-ai
a recursive self-improving harness designed to help your agents (and future iterations of those agents) succeed on any task
# Add to your Claude Code skills
git clone https://github.com/greyhaven-ai/autocontextGuides for using ai agents skills like autocontext.
Last scanned: 5/3/2026
{
"issues": [],
"status": "PASSED",
"scannedAt": "2026-05-03T06:26:18.500Z",
"semgrepRan": false,
"npmAuditRan": true,
"pipAuditRan": true
}autocontext is a harness. You point it at a goal in plain language. It iterates against real evaluation, keeps what worked, throws out what didn't, and produces a structured trace of the work plus the artifacts, playbooks, datasets, and (optionally) a distilled local model that the next agent inherits. Repeated runs get better, not just different.
Documentation: autocontext.ai/docs · quickstart · CLI reference · changelog
The fastest path uses our Pi runtime, a local coding agent that handles its own auth. No API key plumbing, no provider config: install Pi, install autocontext, point one at the other.
uv tool install autocontext==0.8.0
AUTOCONTEXT_AGENT_PROVIDER=pi \
AUTOCONTEXT_PI_COMMAND=pi \
autoctx solve \
"improve customer-support replies for billing disputes" \
--iterations 3
Pi runs locally as a subprocess and emits live traces back into the harness. For a hosted Pi, set AUTOCONTEXT_AGENT_PROVIDER=pi-rpc and AUTOCONTEXT_PI_RPC_ENDPOINT instead.
Prefer TypeScript? Same surface, same command:
bun add -g autoctx@0.8.0
AUTOCONTEXT_AGENT_PROVIDER=pi bunx autoctx solve \
"improve customer-support replies for billing disputes" \
--iterations 5 --json
Already on Anthropic, OpenAI, Gemini, Mistral, Groq, OpenRouter, Azure, Claude CLI, Codex CLI, or MLX? Set AUTOCONTEXT_AGENT_PROVIDER and the matching credential env var:
AUTOCONTEXT_AGENT_PROVIDER=anthropic \
ANTHROPIC_API_KEY=sk-ant-... \
autoctx solve "..." --iterations 3
See .env.example for every provider's variables, or use the live provider guide at autocontext.ai/docs/providers. Prefer to clone and run a starter? examples/README.md has copy-paste recipes for Python CLI, Claude Code MCP, Python SDK, TypeScript library usage, and the experimental TypeScript agent handler surface.
If you already work inside a coding agent, you can wire autocontext in once and give the agent a natural-language entry point. Hermes and other terminal-capable agents should start with the CLI-backed skill; MCP remains available for clients that want a tool-catalog protocol.
Pi ships an autocontext skill out of the box. Install the published Pi package and Pi loads natural-language wrappers over live tools such as autocontext_solve_scenario, autocontext_evaluate_output, autocontext_run_improvement_loop, autocontext_run_status, and autocontext_list_scenarios.
pi install npm:pi-autocontext@0.8.0
Pi is on a separate package line: pi-autocontext@0.8.0 depends on autoctx@^0.8.0, matching the current Python and TypeScript 0.8 runtime line.
Then you just ask:
"Solve: improve customer-support replies for billing disputes."
"Judge this output against this rubric and improve it until it scores 0.85."
Claude Code (and any other MCP client) gets the same surface by adding one entry to .claude/settings.json:
{
"mcpServers": {
"autocontext": {
"command": "uv",
"args": ["run", "--directory", "/path/to/autocontext", "autoctx", "mcp-serve"],
"env": { "AUTOCONTEXT_AGENT_PROVIDER": "pi", "AUTOCONTEXT_PI_COMMAND": "pi" }
}
}
}
After that, Python MCP exposes prefixed tools such as autocontext_solve_scenario, autocontext_evaluate_output, autocontext_run_improvement_loop, autocontext_run_status, autocontext_list_scenarios, autocontext_export_skill, and autocontext_search_strategies. It also exposes runtime-session readers as autocontext_list_runtime_sessions, autocontext_get_runtime_session, and autocontext_get_runtime_session_timeline, with unprefixed aliases for parity with TypeScript MCP; Python runtime-backed run and solve role calls populate those logs automatically. The TypeScript package exposes the same capabilities with its documented tool names via bunx autoctx mcp-serve.
Hermes Agent can load a CLI-first skill and inspect Hermes Curator state without MCP:
cd autocontext
uv run autoctx hermes export-skill --output ~/.hermes/skills/autocontext/SKILL.md --json
# Add progressive-disclosure reference files alongside SKILL.md
uv run autoctx hermes export-skill \
--output ~/.hermes/skills/autocontext/SKILL.md \
--with-references --json
uv run autoctx hermes inspect --json
Full integration guide: autocontext.ai/docs/agents and autocontext/docs/agent-integration.md.
Every run leaves a structured record on disk. Replay it, diff it, export it, feed it back into training.
runs/<run_id>/
├── trace.jsonl # every prompt, tool call, and outcome, in order
├── generations/
│ ├── gen_1/
│ │ ├── strategy.json # what the competitor proposed
│ │ ├── analysis.md # what the analyst observed
│ │ └── score.json # how it was evaluated
│ └── gen_2/ ...
├── report.md # human-readable summary of the whole run
└── artifacts/ # files, configs, packages the run produced
knowledge/<scenario>/
├── playbook.md # accumulated lessons that carried forward
├── hints.md # competitor hints that survived the curator
└── tools/ # any helper tools the architect generated
A playbook.md is plain markdown the next run reads as context:
<!-- PLAYBOOK_START -->
## Billing dispute replies
- Always restate the disputed charge in the first sentence; refunds requested without
explicit confirmation cause loops.
- "Pending" charges are not yet billable. Don't promise a refund until status flips
to `posted`. Verified gen_4, regressed in gen_7 when omitted.
- Empathy + specific next step beats empathy alone. Escalation rate dropped from
0.31 to 0.12 once the second sentence named the next-step owner.
<!-- PLAYBOOK_END -->
A trace.jsonl line is one event:
{
"ts": "2026-04-28T17:42:11Z",
"gen": 4,
"role": "competitor",
"event": "strategy_proposed",
"score": 0.78,
"tokens_in": 1840,
"tokens_out": 612,
"strategy_id": "s_4f2a"
}
Inspect, replay, or compare any of it:
uv run autoctx list
uv run autoctx status <run_id>
uv run autoctx replay <run_id> --generation 2
Inside each run, five roles cooperate:
Strategies are evaluated through scenario execution, staged validation, and gating. Weak changes are rolled back. Successful changes accumulate as reusable knowledge that future runs (and future agents) inherit automatically.
The full vocabulary (Scenario, Task, Mission, Campaign, Run, Verifier, Knowledge, Artifact, Budget, Policy) lives in the concept docs and docs/concept-model.md.
autocontext can sit alongside your live application and record what your agents do, then turn that into training data. Wrap your existing Anthropic or OpenAI client once:
from anthropic import Anthropic
from autocontext.production_traces import instrument_client
client = instrument_client(Anthropic(), app="billing-bot", env="prod")
# use `client` exactly like before; calls are captured to JSONL with content blocks,
# cache-aware usage, and Anthropic-native outcome taxonomy.
import Anthropic from "@anthropic-ai/sdk";
import { instrumentClient } from "autoctx/production-traces";
const client = instrumentClient(new Anthropic(), { app: "billing-bot", env: "prod" });
Then build scoped datasets from the captured traces:
uv run autoctx build-dataset \
--app billing-bot --provider anthropic \
--env prod --outcome success \
--output training/billing.jsonl
And distill them into a smaller local model with MLX (Apple Silicon) or CUDA (Linux GPUs):
uv run autoctx train --scenario support_triage --data training/billing.jsonl --time-budget 300
autocontext is an open-source ai agents skill for AI coding assistants such as Claude Code, Codex CLI, and ChatGPT, built by greyhaven-ai. a recursive self-improving harness designed to help your agents (and future iterations of those agents) succeed on any task. It has 1,213 GitHub stars.
Yes. autocontext passed SkillsLLM's automated security scan — a dependency vulnerability audit plus prompt-injection heuristics — with no high-severity issues. You can read the full report in the Security Report section on this page.
Clone the repository with "git clone https://github.com/greyhaven-ai/autocontext" and add it to your Claude Code skills directory (see the Installation section above).
autocontext is primarily written in Python. It is open-source under greyhaven-ai on GitHub, so you can review or fork the full source.
Yes. SkillsLLM lists many other AI Agents skills you can browse and compare side by side. Open the AI Agents category from the badge at the top of this page, or use the Related Skills and comparison links further down to weigh autocontext against similar tools.
No comments yet. Be the first to share your thoughts!