agentops

Name: agentops
Author: boshu2

Verified

Independent verification for coding agents. A change isn't done until a different model or a real test checks it, and the verdict is recorded in your repo.

408stars

40forks

Installation

# Add to your Claude Code skills
git clone https://github.com/boshu2/agentops

Getting Started

Guides for using ai agents skills like agentops.

Caveman: Cut Claude Token Use by 65%
How agent-side prompt compression works, when to use it, and when not to.
What is an AI Skills Marketplace?
Definitions, how marketplaces work, and how to choose between them in 2026.
Getting Started with AI Skills

Security ReportVerified

Last scanned: 5/26/2026

{
  "issues": [],
  "status": "PASSED",
  "scannedAt": "2026-05-26T07:46:23.789Z",
  "semgrepRan": false,
  "npmAuditRan": true,
  "pipAuditRan": true
}

README.md

Frequently Asked Questions

What is agentops?

agentops is an open-source ai agents skill for AI coding assistants such as Claude Code, Codex CLI, and ChatGPT, built by boshu2. Independent verification for coding agents. A change isn't done until a different model or a real test checks it, and the verdict is recorded in your repo. It has 408 GitHub stars.

Is agentops safe to use?

Yes. agentops passed SkillsLLM's automated security scan — a dependency vulnerability audit plus prompt-injection heuristics — with no high-severity issues. You can read the full report in the Security Report section on this page.

How do I install agentops?

Clone the repository with "git clone https://github.com/boshu2/agentops" and add it to your Claude Code skills directory (see the Installation section above).

What programming language is agentops written in?

agentops is primarily written in Go. It is open-source under boshu2 on GitHub, so you can review or fork the full source.

Are there alternatives to agentops?

Yes. SkillsLLM lists many other AI Agents skills you can browse and compare side by side. Open the AI Agents category from the badge at the top of this page, or use the Related Skills and comparison links further down to weigh agentops against similar tools.

Agentic AI for Beginners

Build your first AI agent from scratch - tool use, ReAct pattern, memory, deployment

41 minBeginner

Comments (0)

to leave a comment.

No comments yet. Be the first to share your thoughts!

Related Skills

superpowers

by obra

An agentic skills framework & software development methodology that works.

234,966

opencode-with-claude flyto-core

AgentOps

Autonomous code validation for coding agents

Coding agents declare "done" on code that is still wrong. AgentOps catches that. Before a change counts as done, something that didn't write it has to check it: a different model, or a test that actually runs. No verdict = not done. It sits on top of the agent you already use (Claude Code, Codex, Cursor, OpenCode).

Install

Pick your runtime and install:

# Claude Code
claude plugin marketplace add boshu2/agentops
claude plugin install agentops@agentops-marketplace

# Codex CLI (macOS/Linux/WSL) — OpenCode: install-opencode.sh
curl -fsSL https://raw.githubusercontent.com/boshu2/agentops/main/scripts/install-codex.sh | bash
# Codex CLI (Windows):
irm https://raw.githubusercontent.com/boshu2/agentops/main/scripts/install-codex.ps1 | iex

# Gemini / Antigravity
curl -fsSL https://raw.githubusercontent.com/boshu2/agentops/main/scripts/install-agy.sh | bash

# Other skills-compatible agents (Cursor, etc.)
npx skills@latest add boshu2/agentops --cursor -g

The ao CLI is optional but recommended (bookkeeping, retrieval, the release gate):

brew tap boshu2/agentops https://github.com/boshu2/homebrew-agentops && brew install agentops   # macOS
# Windows: irm https://raw.githubusercontent.com/boshu2/agentops/main/scripts/install-ao.ps1 | iex
# Or release binaries / build from source (cli/README.md).

Live skills from a clone (optional). Already have the repo checked out? ao skills link symlinks its skills into the live tier of every agent runtime you have installed — ~/.claude/skills, ~/.codex/skills, ~/.gemini/skills (AGY), ~/.cursor/skills, ~/.pi/skills — so, unlike the copy-based installers above (which snapshot the skills at install time), your local edits and every git pull take effect next session with no re-copy:

git clone https://github.com/boshu2/agentops && cd agentops
ao skills link              # symlink repo skills into every installed runtime (idempotent, non-destructive)
git pull && ao skills link  # after a pull: mint links for any newly-added skills

Opt-in — the live/edit-in-place tier for people working from a clone; the installers above stay the copy-based path for everyone else. Never copies or clobbers: existing non-AgentOps skills (e.g. other marketplaces) are reported as conflicts and left untouched. --dest <dir> targets one specific dir instead.

Installs hookless. The only hard requirement is an agent runtime and git; everything else degrades gracefully. Dependencies: docs/dependencies.md · Day-2 ops (update, backup, recovery): docs/install-day2-ops.md.

What you get

A validation membrane. Tests, gates, /pre-mortem, /validate, and /council prove or reject the work before you trust it. No verdict, not done.
A bookkeeper that outlives the session. Work is tracked as beads, and every verdict is bound into a hash-chained provenance ledger: tamper-evident, grep-able, and portable across sessions and models. The record is the proof a change was actually checked — not a memory of one.
An evidence trail that's yours. Every run, decision, and verdict lands in .agents/ in your repo: grep-able, diff-able, portable to whatever model wins next quarter. AgentOps adds no hosted control plane and no telemetry; the corpus lives in your repo, not on our servers. Apache-2.0.
It runs on the agent you already pay for. Claude Code, Codex, Cursor, OpenCode. Same skills, same corpus.

> /validate --mixed   # the agent reported this PR done

[membrane] evidence sealed → fresh-context judges, Claude Code + Codex CLI
[claude/judge-1] REFUTE  /login has no rate limit — claimed "covered", isn't
[codex/judge-1]  REFUTE  token-bucket refill lacks jitter under burst
[claude/judge-2] PASS    redis integration follows the repo pattern
Verdict: HOLD — not done. Fix /login limit + refill jitter, then re-verify.
Recorded as a proof artifact — no verdict, not done.

Already installed? Try it in three steps: make a small change and commit it, run ao verify my-first-change, then read the verdict. A model that had no part in writing the change reviews your commit, prints CONFIRMED or REFUTED, and records the result as a line in docs/provenance/ledger.jsonl inside your repo.

The rest is below the fold for anyone who wants the detail.

Skills

Every skill works alone; flows compose them. Full catalog: docs/SKILLS.md · Skill Router.

Skill	Use it when
`/research`	you need codebase context and prior learnings before changing code
`/pre-mortem`	you want to pressure-test a plan before building
`/rpi`	you want discovery, build, validation, and bookkeeping in one flow
`/council`	you want independent judges (optionally Claude and Codex) to return one verdict
`/validate`	you want a code-quality and risk review before shipping
`/evolve`	a goal-driven improvement loop that runs without mutating source

The `ao` CLI

Repo-native control plane behind the skills. Full reference: CLI commands.

ao verify                 # independent verdict on your latest change
ao gate check --fast      # the release gate before you push
ao provenance show <sha>  # the recorded verdict trail for any commit
ao done <bead-id>         # close tracked work with its verdict attached
ao quick-start            # set up AgentOps in a repo
ao doctor                 # check reviewers, binary, and ledger health

# Experimental (still measuring whether these pay off; see the honest version below):
ao search "query"         # search history and local knowledge
ao lookup --query "topic" # retrieve curated learnings
ao compile                # rebuild the corpus

The whole loop runs in a plain session. No daemon, no scheduler, no cloud. For always-on work, it can hand each task to a background runner instead. Details: docs/3.0.md · operating loop.

The honest version

Proven: independent verification that records a verdict, and a durable, tamper-evident record of it. A change isn't done until something that didn't write it checks it, and that verdict is bound into the provenance ledger. No verdict, not done.

The receipts are public: membrane receipts — every number derived straight from the verdict ledger, none hand-written.

Still measuring: whether the accumulated corpus makes the next session measurably better. We won't claim it until the numbers say so (ADR-0004, ADR-0011).

AgentOps proves the work. It doesn't write the code; your agent still does that, and the cross-checks cost tokens. The .agents/ folder is plain markdown your agents keep up as they go.

When the labs ship their own version of this, your .agents/ folder comes with you. It's in your repo, in plain markdown, Apache-2.0.

What 3.0 is · vs hosted code review · docs index · newcomer guide · architecture · FAQ · upgrading / removed commands · built on the 12-factor doctrine.

Contributing: docs/CONTRIBUTING.md (agents: read AGENTS.md, track work with br). License: Apache-2.0.