by agent-sh
AI writes code. This automates everything else · 19 plugins, 47 agents, and 40 skills · for Claude Code, OpenCode, Codex, Cursor, Kiro.
# Add to your Claude Code skills
git clone https://github.com/agent-sh/agentsysAI models can write code. That's not the hard part anymore. The hard part is everything around it — task selection, branch management, code review, artifact cleanup, CI, PR comments, deployment. AgentSys is the runtime that orchestrates agents to handle all of it — structured pipelines, gated phases, specialized agents, and persistent state that survives session boundaries.
Building custom skills, agents, hooks, or MCP tools? agnix is the CLI + LSP linter that catches config errors before they fail silently - real-time IDE validation, auto suggestions, auto-fix, and 385 rules for Claude Code, Codex, OpenCode, Cursor, Kiro, Copilot, Gemini CLI, Cline, Windsurf, Roo Code, Amp, and more.
An agent orchestration system — 19 plugins, 47 agents, and 40 skills that compose into structured pipelines for software development. Each plugin lives in its own standalone repo under the agent-sh org. agentsys is the marketplace and installer that ties them together.
Each agent has a single responsibility, a specific model assignment, and defined inputs/outputs. Pipelines enforce phase gates so agents can't skip steps. State persists across sessions so work survives interruptions.
The system runs on Claude Code, OpenCode, Codex CLI, Cursor, and Kiro. Install via the marketplace or the npm installer, and the plugins are fetched automatically from their repos.
Code does code work. AI does AI work.
No comments yet. Be the first to share your thoughts!
Certainty levels exist because not all findings are equal:
| Level | Meaning | Action | |-------|---------|--------| | HIGH | Definitely a problem | Safe to auto-fix | | MEDIUM | Probably a problem | Needs context | | LOW | Might be a problem | Needs human judgment |
This came from testing on 1,000+ repositories.
Structured prompts and enriched context do more for output quality than model tier. Benchmarked March 2026 on real tasks (/can-i-help and /onboard against glide-mq), measured with claude -p --output-format json. Models: Claude Opus 4 and Claude Sonnet 4.
Same task, same repo, same prompt ("I want to improve docs"):
| Configuration | Cost | Output tokens | Result quality | |---------------|------|---------------|----------------| | Opus, no agentsys | $1.10 | 2,841 | Generic recommendations, no project-specific context | | Opus + agentsys | $1.95 | 5,879 | Specific recommendations with effort estimates, convention awareness, breaking change detection | | Sonnet + agentsys | $0.66 | 6,084 | Comparable to Opus + agentsys: specific, actionable, project-aware |
Sonnet + agentsys produced more output with higher specificity than raw Opus - at 40% lower cost.
Once the pipeline provides structured prompts, enriched repo-intel data, and phase-gated workflows, the model does less heavy lifting. The gap between Sonnet and Opus narrows:
| Plugin | Opus | Sonnet | Savings | |--------|------|--------|---------| | /onboard | $1.10 | $0.30 | 73% | | /can-i-help | $1.34 | $0.23 | 83% |
Both models reached the same outcome quality - Sonnet just costs less to get there. The structured pipeline captures most of the gains that would otherwise require a more expensive model.
| Scenario | Model cost | Quality | |----------|-----------|---------| | Without agentsys | Need Opus for good results | Depends on model capability | | With agentsys | Sonnet is sufficient | Pipeline handles the structure, model handles judgment |
The investment shifts from model spend to pipeline design. Better prompts, richer context, enforced phases - these compound in ways that model upgrades alone don't.
| Command | What it does |
|---------|--------------|
| /next-task | Task workflow: discovery, implementation, PR, merge |
| /prepare-delivery | Pre-ship quality gates: deslop, review, validation, docs sync |
| /gate-and-ship | Quality gates then ship (/prepare-delivery + /ship) |
| /agnix | Lint agent configurations (385 rules) |
| /ship | PR creation, CI monitoring, merge |
| /deslop | Clean AI slop patterns |
| /perf | Performance investigation with baselines and profiling |
| /drift-detect | Compare plan vs implementation |
| /audit-project | Multi-agent iterative code review |
| /enhance | Plugin, agent, and prompt analyzers |
| /repo-intel | Unified static analysis - git history, AST symbols, project metadata |
| /sync-docs | Sync documentation with code changes |
| /learn | Research topics, create learning guides |
| /consult | Cross-tool AI consultation |
| /debate | Structured debate between AI tools |
| /web-ctl | Browser automation for AI agents |
| /release | Versioned release with ecosystem detection |
| /skillers | Workflow pattern learning and automation |
| /onboard | Codebase orientation for newcomers |
| /can-i-help | Match contributor skills to project needs |
Each command works standalone. Together, they compose into end-to-end pipelines.
40 skills included across the plugins:
| Category | Skills |
|----------|--------|
| Workflow | discover-tasks, prepare-delivery, check-test-coverage, orchestrate-review, validate-delivery |
| Message Queues | glide-mq-migrate-bee, glide-mq-migrate-bullmq, glide-mq |
| Enhancement | enhance-agent-prompts, enhance-claude-memory, enhance-cross-file, enhance-docs, enhance-hooks, enhance-orchestrator, enhance-plugins, enhance-prompts, enhance-skills |
| Performance | baseline, benchmark, code-paths, investigation-logger, perf-analyzer, profile, theory-gatherer, theory-tester |
| Cleanup | deslop, sync-docs |
| Code Review | audit-project |
| AI Collaboration | consult, debate, learn, recommend, skillers-compact |
| Onboarding | can-i-help, onboard |
| Web | web-auth, web-browse |
| Release | release |
| Analysis | drift-analysis, repo-intel |
External skill plugins (standalone repos, installed separately):
| Category | Skills | Plugin |
|----------|--------|--------|
| Message Queues | glide-mq, glide-mq-migrate-bullmq, glide-mq-migrate-bee | agent-sh/glidemq |
Skills are the reusable implementation units. Agents invoke skills; commands orchestrate agents. When you install a plugin, its skills become available to all agents in that session.
| Section | What's there | |---------|--------------| | The Approach | Why it's built this way | | Benchmarks | Sonnet + agentsys vs raw Opus | | Commands | All 20 commands overview | | Skills | 40 skills across plugins | | Skill-Only Plugins | glide-mq and other non-command plugins | | Command Details | Deep dive into each command | | How Commands Work Together | Standalone vs integrated | | [Design Phil