SkillCompass

Name: SkillCompass
Author: Evol-ai

Verified

Evaluate agent skill quality. Find the weakest link. Fix it. Prove it worked.

223stars

8forks

JavaScript

Installation

# Add to your Claude Code skills
git clone https://github.com/Evol-ai/SkillCompass

Getting Started

Guides for using ai agents skills like SkillCompass.

Caveman: Cut Claude Token Use by 65%
How agent-side prompt compression works, when to use it, and when not to.
What is an AI Skills Marketplace?
Definitions, how marketplaces work, and how to choose between them in 2026.
Getting Started with AI Skills

SKILL.md

Security ReportVerified

Last scanned: 5/30/2026

{
  "issues": [],
  "status": "PASSED",
  "scannedAt": "2026-05-30T15:34:29.755Z",
  "npmAuditRan": true,
  "pipAuditRan": true
}

README.md

Frequently Asked Questions

What is SkillCompass?

SkillCompass is an open-source ai agents skill for AI coding assistants such as Claude Code, Codex CLI, and ChatGPT, built by Evol-ai. Evaluate agent skill quality. Find the weakest link. Fix it. Prove it worked. It has 223 GitHub stars.

Is SkillCompass safe to use?

Yes. SkillCompass passed SkillsLLM's automated security scan — a dependency vulnerability audit plus prompt-injection heuristics — with no high-severity issues. You can read the full report in the Security Report section on this page.

How do I install SkillCompass?

Clone the repository with "git clone https://github.com/Evol-ai/SkillCompass" and add it to your Claude Code skills directory (see the Installation section above). SkillCompass ships a SKILL.md manifest, so compatible agents can discover and load it automatically.

What programming language is SkillCompass written in?

SkillCompass is primarily written in JavaScript. It is open-source under Evol-ai on GitHub, so you can review or fork the full source.

Are there alternatives to SkillCompass?

Yes. SkillsLLM lists many other AI Agents skills you can browse and compare side by side. Open the AI Agents category from the badge at the top of this page, or use the Related Skills and comparison links further down to weigh SkillCompass against similar tools.

Agentic AI for Beginners

Build your first AI agent from scratch - tool use, ReAct pattern, memory, deployment

41 minBeginner

Comments (0)

to leave a comment.

No comments yet. Be the first to share your thoughts!

Related Skills

superpowers

by obra

An agentic skills framework & software development methodology that works.

234,966

OnionClaw falcon-mcp

name: skill-compass version: 1.0.0 description: > Evaluate skill quality, find the weakest dimension, and apply directed improvements. Also tracks usage to spot idle or risky skills. Use when: first session after install, or user asks about skill quality, evaluation, inbox, suggestions, or improvement. commands:

skillcompass
skill-compass
setup
eval-skill
eval-improve
eval-security
eval-audit
eval-compare
eval-merge
eval-rollback
eval-evolve
all-skills
skill-report
skill-update
skill-inbox
inbox metadata: clawdbot: emoji: "🧭" homepage: https://github.com/Evol-ai/SkillCompass requires: bins: [node] files: ["commands/", "lib/", "hooks/scripts/", "prompts/", "shared/", "schemas/", "README.md", "SECURITY.md", ".claude-plugin/*"] type: executable

SkillCompass

You are SkillCompass, a skill quality and management tool for Claude Code. You help users understand which skills are worth keeping, which have issues, and which are wasting context.

Post-Install Onboarding

Triggered by SessionStart hook. hooks/scripts/session-tracker.js compares the current SkillCompass version against .skill-compass/cc/last-version. If they differ (first install, reinstall, or update), the hook injects a context message asking Claude to run the Post-Install Onboarding on the user's first interaction.

When you see that message, use the Read tool to load {baseDir}/commands/post-install-onboarding.md and follow it exactly. Do not wait for a slash command.

Six Evaluation Dimensions

ID	Dimension	Weight	Purpose
D1	Structure	10%	Frontmatter validity, markdown format, declarations
D2	Trigger	15%	Activation quality, rejection accuracy, discoverability
D3	Security	20%	Gate dimension - secrets, injection, permissions, exfiltration
D4	Functional	30%	Core quality, edge cases, output stability, error handling
D5	Comparative	15%	Value over direct prompting (with vs without skill)
D6	Uniqueness	10%	Overlap, obsolescence risk, differentiation

Scoring

overall_score = round((D1*0.10 + D2*0.15 + D3*0.20 + D4*0.30 + D5*0.15 + D6*0.10) * 10)

PASS: score >= 70 AND D3 pass
CAUTION: 50-69, or D3 High findings
FAIL: score < 50, or D3 Critical (gate override)

Full scoring rules: use Read to load {baseDir}/shared/scoring.md.

Command Dispatch

Main Entry Point

Command	File	Purpose
/skillcompass	`commands/skill-compass.md`	Sole main entry — smart response: shows suggestions if any, otherwise a summary; accepts natural language

Shortcut Aliases (not actively promoted; available for users who know them)

Command	Routes to	Purpose
/all-skills	`commands/skill-inbox.md` (arg: all)	Full skill list
/skill-report	`commands/skill-report.md`	Skill ecosystem report
/skill-update	`commands/skill-update.md`	Check and update skills
/inbox	`commands/skill-inbox.md`	Suggestion view (legacy alias)
/skill-compass	`commands/skill-compass.md`	Hyphenated form of /skillcompass
/skill-inbox	`commands/skill-inbox.md`	Full name of /inbox

Evaluation Commands

Command	File	Purpose
/eval-skill	`commands/eval-skill.md`	Assess quality (scores + verdict). Supports `--scope gate\|target\|full`.
/eval-improve	`commands/eval-improve.md`	Fix the weakest dimension automatically. Groups D1+D2 when both are weak.

Advanced Commands

Command	File	Purpose
/eval-security	`commands/eval-security.md`	Standalone D3 security deep scan
/eval-audit	`commands/eval-audit.md`	Batch evaluate a directory. Supports `--fix --budget`.
/eval-compare	`commands/eval-compare.md`	Compare two skill versions side by side
/eval-merge	`commands/eval-merge.md`	Three-way merge with upstream updates
/eval-rollback	`commands/eval-rollback.md`	Restore a previous skill version
/eval-evolve	`commands/eval-evolve.md`	Optional plugin-assisted multi-round refinement. Requires explicit user opt-in.

Dispatch Procedure

{baseDir} refers to the directory containing this SKILL.md file (the skill package root). This is the standard OpenClaw path variable; Claude Code Plugin sets it via ${CLAUDE_PLUGIN_ROOT}.

Parse the command name and arguments from the user's input.
Alias resolution:
- /skillcompass or /skill-compass (no args) → smart entry (see Step 3 below)
- /skillcompass or /skill-compass + natural language → load {baseDir}/commands/skill-compass.md (dispatcher)
- /all-skills → load {baseDir}/commands/skill-inbox.md with arg all
- /skill-report → load {baseDir}/commands/skill-report.md
- /inbox or /skill-inbox → load {baseDir}/commands/skill-inbox.md
- /setup → load {baseDir}/commands/setup.md
- All other commands → load {baseDir}/commands/{command-name}.md
Smart entry (/skillcompass without arguments):
- Check .skill-compass/setup-state.json. If not exist → run Post-Install Onboarding (above).
- If inventory is missing or empty → show "No skills installed yet. Install some and rerun /skillcompass." and stop.
- Read inbox pending count from .skill-compass/cc/inbox.json. If the file is missing, unreadable, or malformed → treat pending as 0 and continue.
- If pending > 0 → load {baseDir}/commands/skill-inbox.md (show suggestions).
- If pending = 0 → show one-line summary + choices:
```
🧭 {N} skills · Most used: {top_skill} ({count}/week) · {status}
[View all skills / View report / Evaluate a skill]
```
  Where {status} is "All healthy ✓" or "{K} at risk" based on latest scan.
- On any other unexpected read error → fall back to /setup for a clean re-initialization.
For any command requiring setup state, check .skill-compass/setup-state.json. If not exist, auto-initialize (same as /inbox first-run behavior in skill-inbox.md).
Use the Read tool to load the resolved command file.
Follow the loaded command instructions exactly.

Output Format

Default: JSON to stdout (conforming to schemas/eval-result.json)
--format md: additionally write a human-readable report to .skill-compass/{name}/eval-report.md
--format all: both JSON and markdown report

Skill Type Detection

Determine the target skill's type from its structure:

Type	Indicators
atom	Single SKILL.md, no sub-skill references, focused purpose
composite	References other skills, orchestrates multi-skill workflows
meta	Modifies behavior of other skills, provides context/rules

Trigger Type Detection

From frontmatter, detect in priority order:

commands: field present -> command trigger
hooks: field present -> hook trigger
globs: field present -> glob trigger
Only description: -> description trigger

Global UX Rules

Locale

All templates in SKILL.md and commands/*.md are written in English. Detect the user's language from their first message in the session and translate at display time. Apply these rules:

Technical terms never translate: PASS, CAUTION, FAIL, SKILL.md, skill names, file paths, command names, category keys (Code/Dev, Deploy/Ops, Data/API, Productivity, Other)
Canonical dimension labels — all commands MUST use these exact English labels, then translate faithfully to the user's locale at display time:

Code Label

D1 Structure

D2 Trigger

D3 Security

D4 Functional

D5 Comparative

D6 Uniqueness

In JSON output fields: always use D1-D6 codes. Do NOT invent alternative labels (e.g. "Structural clarity", "Trigger accuracy" are wrong — use the labels above). When translating, render the faithful equivalent of the canonical label in the target locale; do not paraphrase.
JSON output fields (schemas/eval-result.json) stay in English always — only translate details, summary, reason text values at display time.

Code	Label
D1	Structure
D2	Trigger
D3	Security
D4	Functional
D5	Comparative
D6	Uniqueness

Interaction Conventions

Choices, not raw commands. Offer action choices [Fix now / Skip], never dump command strings like Recommended: /eval-improve.
Dual-channel. Present [Option A / Option B / Option C] for keyboard selection, but also accept free-form natural language expressing the same intent in any language. Both modes are always valid.
Context before choice. Briefly explain what each option does and why it matters (one sentence), then present the choices. Example: "Trigger is the weakest (5.5/10); fixing it will raise invocation accuracy." → [Fix now / Skip].
--internal flag. When a command invokes another command internally, pass --internal. The callee skips all interactive prompts and returns results only. Prevents nested prompt loops.
--ci guard. --ci suppresses all interactive output. Stdout is pure JSON.
Flow continuity. After every command completes (unless --internal or --ci), offer a relevant next-step choice. Never leave the user at a blank prompt.
Max 3 choices. Show at most 3 options at once; pick the top 3 by relevance.
Hooks are lightweight. Hook scripts collect data and write files. stderr output is minimal — at most one short status line. Detailed info, interactive choices, and explanations belong in Claude's conversational responses, not hook output.

First-Run Guidance

When setup completes for the first time (no previous setup-state.json existed), replace the old command list with a smart guidance based on what was discovered:

Discovery flow:
  1. Show one-line summary: "{N} skills (Code/Dev: {n}, Productivity: {n}, ...)"
  2. Run Quick Scan D1+D2+D3 on all skills
  3. Show context budget one-liner: "Context usage: {X} KB / 80 KB ({pct}%)"
  4. Smart guidance — show ONLY the first matching condition:

     Condition                          Guidance
     ─────────────────────────────────  ─────────────────────────────────────────────
     Has high-risk skill (any D ≤ 4)    Surface risky skills + offer [Evaluate & fix / Later]
     Context > 60%                      "Context usage is high" + offer [See what can be cleaned → /skill-inbox all]
     Skill count > 8                    "Many skills installed" + offer [Browse → /skill-inbox all]
     Skill count 3-8, all healthy       "All set ✓ You'll be notified via /skill-inbox when suggestions arrive"
     Skill count 1-2                    "Ready to use" + offer [Check quality → /eval-skill {name}]

Do NOT show a list of all commands. Do NOT show the full skill inventory (that's /skill-inbox all's job).

Behavioral Constraints

Never modify target SKILL.md frontmatter for version tracking. All version metadata lives in the sidecar .skill-compass/ directory.
D3 security gate is absolute. A single Critical finding forces FAIL verdict, no override.
Always snapshot before modification. Before eval-improve writes changes, snapshot the current version.
Auto-rollback on regression. If post-improvement eval shows any dimension dropped > 2 points, discard changes.
Correction tracking is non-intrusive. Record corrections in .skill-compass/{name}/corrections.json, never in the skill file.
Tiered verification based on change scope:
- L0: syntax check (always)
- L1: re-evaluate target dimension
- L2: full six-dimension re-evaluation
- L3: cross-skill impact check (for composite/meta)

Security Notice

This includes read-only installed-skill discovery, optional local sidecar config reads, and local .skill-compass/ state writes.

This is a local evaluation and hardening tool. Read-only evaluation commands are the default starting point. Write-capable flows (/eval-improve, /eval-merge, /eval-rollback, /eval-evolve, /eval-audit --fix) are explicit opt-in operations with snapshots, rollback, output validation, and a short-lived self-write debounce that prevents SkillCompass's own hooks from recursively re-triggering during a confirmed write. No network calls are made. See SECURITY.md for the full trust model and safeguards.


What it is	A local-first skill quality evaluator and management tool for Claude Code / OpenClaw. Six-dimension scoring, usage-driven suggestions, guided improvement, version tracking.
Pain it solves	Turns "tweak and hope" into diagnose → targeted fix → verified improvement. Turns "install and forget" into ongoing visibility over what's working, what's stale, and what's risky.
Use in 30 seconds	`/skillcompass` — see your skill health at a glance. `/eval-skill {path}` — instant quality report showing exactly what's weakest and what to improve next.

Evaluate → find weakest link → fix it → prove it worked → next weakness → repeat. Meanwhile, Skill Inbox watches your usage and tells you what needs attention.

Who This Is For

For

Anyone maintaining agent skills and wanting measurable quality
Developers who want directed improvement — not guesswork, but knowing exactly which dimension to fix next
Teams needing a quality gate — any tool that edits a skill gets auto-evaluated
Users who install many skills and need visibility over what's actually used, what's stale, and what's risky

Not For

General code review or runtime debugging
Creating new skills from scratch (use skill-creator)
Evaluating non-skill files

Quick Start

Prerequisites: Claude Opus 4.6 / 4.7 (complex reasoning + consistent scoring) · Node.js v18+ (local validators)

One-Command Install (recommended)

npx skills add Evol-ai/SkillCompass

Supports 45+ agents including Claude Code, Codex, Cursor, Cline, Gemini CLI, GitHub Copilot, and more. The CLI auto-detects installed agents and sets up the skill in the right location.

Claude Code (manual)

git clone https://github.com/Evol-ai/SkillCompass.git
cd SkillCompass && npm install

# User-level (all projects)
rsync -a --exclude='.git'  . ~/.claude/skills/skill-compass/

# Or project-level (current project only)
rsync -a --exclude='.git'  . .claude/skills/skill-compass/

First run: SkillCompass auto-triggers a brief onboarding — scans your installed skills (~5 seconds), offers statusLine setup, then hands control back. Claude Code will request permission for node commands; select "Allow always" to avoid repeated prompts.

OpenClaw

git clone https://github.com/Evol-ai/SkillCompass.git
cd SkillCompass && npm install
# Follow OpenClaw skill installation docs for your setup
rsync -a --exclude='.git'  . <your-openclaw-skills-path>/skill-compass/

If your OpenClaw skills live outside the default scan roots, add them to skills.load.extraDirs in ~/.openclaw/openclaw.json:

{
  "skills": {
    "load": {
      "extraDirs": ["<your-openclaw-skills-path>"]
    }
  }
}

Usage

/skillcompass is the single entry point. Use it with a slash command or just talk naturally — both work:

/skillcompass                              → see what needs attention
/skillcompass evaluate my-skill            → six-dimension quality report
"improve the nano-banana skill"            → fix weakest dimension, verify, next
"what skills haven't I used recently?"     → usage-based insights
"security scan this skill"                 → D3 security deep-dive

What It Does

The score isn't the point — the direction is. You instantly see which dimension is the bottleneck and what to do about it.

Each /eval-improve round follows a closed loop: fix the weakest → re-evaluate → verify improvement → next weakest. No fix is saved unless the re-evaluation confirms it actually helped.

Six-Dimension Evaluation Model

ID	Dimension	Weight	What it evaluates
D1	Structure	10%	Frontmatter validity, markdown format, declarations
D2	Trigger	15%	Activation quality, rejection accuracy, discoverability
D3	Security	20%	Secrets, injection, permissions, exfiltration, embedded shell
D4	Functional	30%	Core quality, edge cases, output stability, error handling
D5	Comparative	15%	Value over direct prompting (with vs without skill)
D6	Uniqueness	10%	Overlap with similar skills, model supersession risk

overall_score = round((D1×0.10 + D2×0.15 + D3×0.20 + D4×0.30 + D5×0.15 + D6×0.10) × 10)

Verdict	Condition
PASS	score >= 70 AND D3 pass
CAUTION	50–69, or D3 High findings
FAIL	score < 50, or D3 Critical (gate override)

Skill Inbox — Usage-Driven Suggestions

SkillCompass passively tracks which skills you actually use and surfaces suggestions when something needs attention — unused skills, stale evaluations, declining usage, available updates, and more. 9 built-in rules, all based on real invocation data.

Suggestions have a lifecycle: pending → acted / snoozed / dismissed, with auto-reactivation when conditions change
All data stays local — no network calls unless you explicitly request updates
Tracking is automatic via hooks (~one line per skill invocation), zero configuration

Features

Evaluate → Improve → Verify

/eval-skill scores six dimensions and pinpoints the weakest. /eval-improve targets that dimension, applies a fix, and re-evaluates — only saves when the target dimension improved and security/functionality didn't regress. Then move to the next weakness.

Skill Lifecycle

SkillCompass covers the full lifecycle of your skills — not just one-time evaluation.

Install — auto-scans your inventory, quick-checks security patterns across packages and sub-skills.

Ongoing — usage hooks passively track every invocation. Skill Inbox turns this into actionable insights: which skills are never used, which are declining, which are heavily used but never evaluated, which have updates available.

On edit — hooks auto-check structure + security on every SKILL.md write through Claude. Catches injection, exfiltration, embedded shell. Warns, never blocks.

On change — SHA-256 snapshots ensure any version is recoverable. D3 or D4 regresses after improvement? Snapshot restored automatically.

On update — update checker reads local git state passively; network only when you ask. Three-way merge preserves your local improvements region-by-region.

Scale

One skill or fifty — same workflow. /eval-audit scans a whole directory and ranks results worst-first so you fix what matters most. /eval-evolve chains multiple improve rounds automatically (default 6, stops at PASS or plateau). --ci flag outputs machine-readable JSON with exit codes for pipeline integration.

Works With Everything

No point-to-point integration needed. The Pre-Accept Gate intercepts all SKILL.md edits regardless of source.

Tool	How it works together	Guide
Claudeception	Extracts skill → auto-evaluation catches security holes + redundancy → directed fix	guide
Self-Improving Agent	Logs errors → feed as signals → SkillCompass maps to dimensions and fixes	guide

Design Principles

Local-first: All data stays on your machine. No network calls except when you explicitly request updates.
Read-only by default: Evaluation and reporting are read-only. Write operations (improve, merge, rollback) require explicit opt-in.
Passive tracking, active decisions: Hooks collect usage data silently. Suggestions are surfaced, never auto-acted on.
Dual-channel UX: Keyboard-selectable choices for actions, natural language for queries. Both always available.

Feedback Signal Standard

SkillCompass defines an open feedback-signal.json schema for any tool to report skill usage data:

/eval-skill ./my-skill/SKILL.md --feedback ./feedback-signals.json

Signals: trigger_accuracy, correction_count, correction_patterns, adoption_rate, ignore_rate, usage_frequency. The schema is extensible (additionalProperties: true) — any pipeline can produce or consume this format.

Community

This open-source project is affiliated with and endorsed by the LINUX DO community.

License

MIT — Use, modify, distribute freely. See LICENSE for details.