by kayba-ai
๐ช Make your agents recursively self-improve
# Add to your Claude Code skills
git clone https://github.com/kayba-ai/recursive-improve90% of Claude's code is now written by Claude. Recursive self-improvement is already happening at Anthropic. What if you could do the same for your own agents?
You have an agent. It works, most of the time. But it could be better. Solving harder problems, handling more edge cases, wasting fewer tokens. What if it could improve itself, recursively, every time it runs?
Right now, it can't. Your agent is stateless. Every run starts from scratch. The only way to improve it is to manually improve it. There is no compounding of improvements.
recursive-improve closes this loop:
Your agent runs. Every LLM call is captured. Your coding agent analyzes the traces, identifying common failure patterns across runs, and applies targeted fixes. You run it again. It's better.
uv tool install "recursive-improve[all] @ git+https://github.com/kayba-ai/recursive-improve.git"
Then in your agent's project directory:
cd /path/to/your/agent
recursive-improve init
This creates the /recursive-improve skill files and the eval/traces/ directory.
Add the tracing dependency to your project:
uv add "recursive-improve @ git+https://github.com/kayba-ai/recursive-improve.git"
Two lines. Your agent code stays unchanged, we just observe.
import recursive_improve as ri
ri.patch() # auto-captures openai, anthropic, litellm calls
with ri.session("./eval/traces") as run:
result = my_agent("book a flight to Paris")
run.finish(output=result, success=True)
No comments yet. Be the first to share your thoughts!
Already have traces? Drop them in
eval/traces/and skip to step 4.
Open Claude Code or Codex in your project directory:
/recursive-improve
Clear old traces and run your agent again so the benchmark measures your improved code:
rm -f eval/traces/*.json
# run your agent the same way as step 3
Measure whether your changes actually solved the problems:
/benchmark
Results are stored in eval/benchmark_results.json and auto-compared against the previous run on the same dynamic metrics that were generated for your agent.
CLI alternative:
recursive-improve benchmark --label "v1-baseline"andrecursive-improve benchmark list
Start the interactive dashboard to visualize your improvement cycles:
recursive-improve dashboard # default: http://localhost:8420
recursive-improve dashboard -p 8080 # custom port
Each improvement cycle lives on its own branch. The dashboard shows before/after metrics for every cycle. See exactly what improved, merge the wins, discard the rest.

/ratchet
An autoresearch-style autonomous loop. It asks you what to optimize, then repeats: improve โ run agent โ eval โ keep or revert. Only improvements survive. Check eval/ratchet_summary.md when you wake up.
[!TIP] Want deeper analysis? Kayba offers managed recursive agent improvement at scale, tailored to your agent.
When you run the /recursive-improve skill, it walks through a structured pipeline:
Every fix traces back to a specific insight, linked to a specific metric.
your agent โโ> ri.patch() + ri.session() โโ> eval/traces/*.json
โ
โผ
/recursive-improve
โ
โผ
improved agent code โโ> repeat
โ
โผ
benchmark โโ> recursive-improve dashboard
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
โ /ratchet (autonomous loop) โ
โ improve โ run โ eval โ โ
โ keep or revert โ repeat โ
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
ri.patch(): monkey-patches OpenAI, Anthropic, and LiteLLM clients to capture every callri.session(): context manager that writes structured trace JSON files/recursive-improve: Claude Code / Codex skill that analyzes traces and applies fixesrecursive-improve benchmark: snapshot metric quality, store, and compare over timerecursive-improve dashboard: web UI to visualize runs and compare branches/ratchet: autonomous keep-or-revert loop that runs /recursive-improve repeatedly overnightStar this repo if you find it useful!
Built with โค๏ธ by Kayba and the open-source community.