by RyanAlberts
🏆 Ranked list of 100+ agent harnesses. Scored and updated weekly.
# Add to your Claude Code skills
git clone https://github.com/RyanAlberts/best-of-Agent-HarnessesGuides for using ai agents skills like best-of-Agent-Harnesses.
Last scanned: 6/14/2026
{
"issues": [],
"status": "PASSED",
"scannedAt": "2026-06-14T08:16:38.877Z",
"npmAuditRan": true,
"pipAuditRan": true,
"promptInjectionRan": true
}No comments yet. Be the first to share your thoughts!
30 days in the Featured rail · terms & refunds
A model answers; an agent acts. An agent harness is the runtime that turns one into the other — the model thinks; the harness decides what that thinking is allowed to touch.
Every prior wave of automation was constrained by brittleness: you scripted exact behavior, and when the world deviated, the system broke. Foundation models inverted that problem—they're flexible but directionless, stateless, and disconnected from anything real. The agent harness exists to bridge that gap: it is the orchestration infrastructure that converts a model's per-turn reasoning into sustained, tool-using, error-recovering, goal-directed behavior across time. Architecturally, it plays the role the kernel played in operating systems or the controller played in industrial robotics—mediating between raw capability and a messy environment—but with a critical difference: the "capability" it governs is general-purpose cognition, which means the harness is simultaneously a scheduler, a permission system, a memory manager, and a policy enforcement layer, all under-specified and evolving in real time.
Better models make harnesses more important: more capabilities mean more failure modes, and production needs retry logic, fallbacks, and validation. Harness quality—not just model quality—determines whether agents actually ship. This list ranks projects by relevance to harness concerns (environment, orchestration, lifecycle, guardrails) and by stars/activity.
Every project in the list, plotted by adoption surface area (the simplicity ↔ capability axis) against GitHub stars. Colors are categories; the largest projects in each tier are labeled.
The same projects placed by how much unsupervised rope they're designed to give (autonomy) and what happens when a run dies (recovery). In the tables below, ★ marks headless-ready projects and ✱ marks durable ones. Both charts regenerate from the list data on every refresh.
Start with the guide, then the head-to-head decision pages — grounded in the same data as the tables below:
Reader's index: pick by what you want to do, not by category. Tag chips (e.g. mcp · memory) next to each row let you cross-filter by capability — see TAGS.md for the full cross-reference.
This list is also published in machine-readable form, so coding agents and research agents can recommend harnesses — not just humans browsing GitHub: