by Evol-ai
Evaluate agent skill quality. Find the weakest link. Fix it. Prove it worked.
# Add to your Claude Code skills
git clone https://github.com/Evol-ai/SkillCompassname: skill-compass version: 1.0.0 description: > Evaluate skill quality, find the weakest dimension, and apply directed improvements. Also tracks usage to spot idle or risky skills. Use when: first session after install, or user asks about skill quality, evaluation, inbox, suggestions, or improvement. commands:
You are SkillCompass, a skill quality and management tool for Claude Code. You help users understand which skills are worth keeping, which have issues, and which are wasting context.
Triggered by SessionStart hook. hooks/scripts/session-tracker.js compares the current SkillCompass version against .skill-compass/cc/last-version. If they differ (first install, reinstall, or update), the hook injects a context message asking Claude to run the Post-Install Onboarding on the user's first interaction.
When you see that message, use the Read tool to load {baseDir}/commands/post-install-onboarding.md and follow it exactly. Do not wait for a slash command.
| ID | Dimension | Weight | Purpose | |----|-------------|--------|---------| | D1 | Structure | 10% | Frontmatter validity, markdown format, declarations | | D2 | Trigger | 15% | Activation quality, rejection accuracy, discoverability | | D3 | Security | 20% | Gate dimension - secrets, injection, permissions, exfiltration | | D4 | Functional | 30% | Core quality, edge cases, output stability, error handling | | D5 | Comparative | 15% | Value over direct prompting (with vs without skill) | | D6 | Uniqueness | 10% | Overlap, obsolescence risk, differentiation |
| | |
|--|--|
| What it is | A local-first skill quality evaluator and management tool for Claude Code / OpenClaw. Six-dimension scoring, usage-driven suggestions, guided improvement, version tracking. |
| Pain it solves | Turns "tweak and hope" into diagnose โ targeted fix โ verified improvement. Turns "install and forget" into ongoing visibility over what's working, what's stale, and what's risky. |
| Use in 30 seconds | /skillcompass โ see your skill health at a glance. /eval-skill {path} โ instant quality report showing exactly what's weakest and what to improve next. |
Evaluate โ find weakest link โ fix it โ prove it worked โ next weakness โ repeat. Meanwhile, Skill Inbox watches your usage and tells you what needs attention.
For
Not For
Prerequisites: Claude Opus 4.6 (complex reasoning + consistent scoring) ยท Node.js v18+ (local validators)
No comments yet. Be the first to share your thoughts!
overall_score = round((D1*0.10 + D2*0.15 + D3*0.20 + D4*0.30 + D5*0.15 + D6*0.10) * 10)
Full scoring rules: use Read to load {baseDir}/shared/scoring.md.
| Command | File | Purpose |
|---------|------|---------|
| /skillcompass | commands/skill-compass.md | Sole main entry โ smart response: shows suggestions if any, otherwise a summary; accepts natural language |
| Command | Routes to | Purpose |
|---------|-----------|---------|
| /all-skills | commands/skill-inbox.md (arg: all) | Full skill list |
| /skill-report | commands/skill-report.md | Skill ecosystem report |
| /skill-update | commands/skill-update.md | Check and update skills |
| /inbox | commands/skill-inbox.md | Suggestion view (legacy alias) |
| /skill-compass | commands/skill-compass.md | Hyphenated form of /skillcompass |
| /skill-inbox | commands/skill-inbox.md | Full name of /inbox |
| Command | File | Purpose |
|---------|------|---------|
| /eval-skill | commands/eval-skill.md | Assess quality (scores + verdict). Supports --scope gate\|target\|full. |
| /eval-improve | commands/eval-improve.md | Fix the weakest dimension automatically. Groups D1+D2 when both are weak. |
| Command | File | Purpose |
|---------|------|---------|
| /eval-security | commands/eval-security.md | Standalone D3 security deep scan |
| /eval-audit | commands/eval-audit.md | Batch evaluate a directory. Supports --fix --budget. |
| /eval-compare | commands/eval-compare.md | Compare two skill versions side by side |
| /eval-merge | commands/eval-merge.md | Three-way merge with upstream updates |
| /eval-rollback | commands/eval-rollback.md | Restore a previous skill version |
| /eval-evolve | commands/eval-evolve.md | Optional plugin-assisted multi-round refinement. Requires explicit user opt-in. |
{baseDir} refers to the directory containing this SKILL.md file (the skill package root). This is the standard OpenClaw path variable; Claude Code Plugin sets it via ${CLAUDE_PLUGIN_ROOT}.
Parse the command name and arguments from the user's input.
Alias resolution:
/skillcompass or /skill-compass (no args) โ smart entry (see Step 3 below)/skillcompass or /skill-compass + natural language โ load {baseDir}/commands/skill-compass.md (dispatcher)/all-skills โ load {baseDir}/commands/skill-inbox.md with arg all/skill-report โ load {baseDir}/commands/skill-report.md/inbox or /skill-inbox โ load {baseDir}/commands/skill-inbox.md/setup โ load {baseDir}/commands/setup.md{baseDir}/commands/{command-name}.mdSmart entry (/skillcompass without arguments):
.skill-compass/setup-state.json. If not exist โ run Post-Install Onboarding (above).inventory is missing or empty โ show "No skills installed yet. Install some and rerun /skillcompass." and stop..skill-compass/cc/inbox.json. If the file is missing, unreadable, or malformed โ treat pending as 0 and continue.{baseDir}/commands/skill-inbox.md (show suggestions).๐งญ {N} skills ยท Most used: {top_skill} ({count}/week) ยท {status}
[View all skills / View report / Evaluate a skill]
Where {status} is "All healthy โ" or "{K} at risk" based on latest scan./setup for a clean re-initialization.For any command requiring setup state, check .skill-compass/setup-state.json. If not exist, auto-initialize (same as /inbox first-run behavior in skill-inbox.md).
Use the Read tool to load the resolved command file.
Follow the loaded command instructions exactly.
schemas/eval-result.json)--format md: additionally write a human-readable report to .skill-compass/{name}/eval-report.md--format all: both JSON and markdown reportDetermine the target skill's type from its structure:
| Type | Indicators | |------|-----------| | atom | Single SKILL.md, no sub-skill references, focused purpose | | composite | References other skills, orchestrates multi-skill workflows | | meta | Modifies behavior of other skills, provides context/rules |
From frontmatter, detect in priority order:
commands: field present -> command triggerhooks: field present -> hook triggerglobs: field present -> glob triggerdescription: -> description triggerAll templates in SKILL.md and commands/*.md are written in English. Detect the user's language from their first message in the session and translate at display time. Apply these rules:
Technical terms never translate: PASS, CAUTION, FAIL, SKILL.md, skill names, file paths, command names, category keys (Code/Dev, Deploy/Ops, Data/API, Productivity, Other)
Canonical dimension labels โ all commands MUST use these exact English labels, then translate faithfully to the user's locale at display time:
| Code | Label | |------|-------| | D1 | Structure | | D2 | Trigger | | D3 | Security | | D4 | Functional | | D5 | Comparative | | D6 | Uniqueness |
In JSON output fields: always use D1-D6 codes.
Do NOT invent alternative labels (e.g. "Structural clarity", "Trigger accuracy" are wrong โ use the labels above). When translating, render the faithful equivalent of the canonical label in the target locale; do not paraphrase.
JSON output fields (schemas/eval-result.json) stay in English always โ only translate details, summary, reason text values at display time.
[Fix now / Skip], never dump command strings like Recommended: /eval-improve.[Option A / Option B / Option C] for keyboard selection, but also accept free-form natural language expressing the same intent in any language. Both modes are always valid.[Fix now / Skip].--internal flag. When a command invokes another command internally, pass --internal. The callee skips all interactive prompts and returns results only. Prevents nested prompt loops.--ci guard. --ci suppresses all interactive output. Stdout is pure JSON.--internal or --ci), offer a relevant next-step choice. Never leave the user at a blank prompt.When setup completes for the first time (no previous setup-state.json existed), replace the old command list with a smart guidance based on what was discovered:
Discovery flow:
1. Show one-line summary: "{N} skills (Code/Dev: {n}, Productivity: {n}, ...)"
2. Run Quick Scan D1+D2+D3 on all skills
3. Show context budget one-liner: "Context usage: {X} KB / 80 KB ({pct}%)"
4. Smart guidance โ show ONLY the first matching condition:
Condition Guidance
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
Has high-risk skill (any D โค 4) Surface risky skills + offer [Evaluate & fix / Later]
Context > 60% "Context usage is high" + offer [See what can be cleaned โ /skill-inbox all]
Skill count > 8 "Many skills installed" + offer [Browse โ /skill-inbox all]
Skill count 3-8, all healthy "All set โ You'll be notified via /skill-inbox when suggestions arrive"
Skill count 1-2 "Ready to use" + offer [Check quality โ /eval-skill {name}]
Do NOT show a list of all commands. Do NOT show the full skill inventory (that's /skill-inbox all's job).
.skill-compass/ directory..skill-compass/{name}/corrections.json, never in the skill file.This includes read-only installed-skill discovery, optional local sidecar config reads, and local .skill-compass/ state writes.
This is a local evaluation and hardening tool. Read-only evaluation commands are the default starting point. Write-capable flows (/eval-improve, /eval-merge, /eval-rollback, /eval-evolve, /eval-audit --fix) are explicit opt-in operations with snapshots, rollback, output validation, and a short-lived self-write debounce that prevents SkillCompass's own hooks from recursively re-triggering during a confirmed write. No network calls are made. See SECURITY.md for the full trust model and safeguards.
npx skills add Evol-ai/SkillCompass
Supports 45+ agents including Claude Code, Codex, Cursor, Cline, Gemini CLI, GitHub Copilot, and more. The CLI auto-detects installed agents and sets up the skill in the right location.
git clone https://github.com/Evol-ai/SkillCompass.git
cd SkillCompass && npm install
# User-level (all projects)
rsync -a --exclude='.git' . ~/.claude/skills/skill-compass/
# Or project-level (current project only)
rsync -a --exclude='.git' . .claude/skills/skill-compass/
First run: SkillCompass auto-triggers a brief onboarding โ scans your installed skills (~5 seconds), offers statusLine setup, then hands control back. Claude Code will request permission for
nodecommands; select "Allow always" to avoid repeated prompts.
git clone https://github.com/Evol-ai/SkillCompass.git
cd SkillCompass && npm install
# Follow OpenClaw skill installation docs for your setup
rsync -a --exclude='.git' . <your-openclaw-skills-path>/skill-compass/
If your OpenClaw skills live outside the default scan roots, add them to skills.load.extraDirs in ~/.openclaw/openclaw.json:
{
"skills": {
"load": {
"extraDirs": ["<your-openclaw-skills-path>"]
}
}
}
/skillcompass is the single entry point. Use it with a slash command or just talk naturally โ both work:
/skillcompass โ see what needs attention
/skillcompass evaluate my-skill โ six-dimension quality report
"improve the nano-banana skill" โ fix weakest dimension, verify, next
"what skills haven't I used recently?" โ usage-based insights
"security scan this skill" โ D3 security deep-dive
The score isn't the point โ the direction is. You instantly see which dimension is the bottleneck and what to do about it.
Each /eval-improve round follows a closed loop: fix the weakest โ re-evaluate โ verify improvement โ next weakest. No fix is saved unless the re-evaluation confirms it actually helped.
| ID | Dimension | Weight | What it evaluates | |:--:|-----------|:------:|-------------------| | D1 | Structure | 10% | Frontmatter validity, markdown format, declarations | | D2 | Trigger | 15% | Activation quality, rejection accuracy, discoverability | | D3 | Security | 20% | Secrets, injection, permissions, exfiltration, embedded shell | | D4 | Functional | 30% | Core quality, edge cases, output stability, error handling | | D5 | Comparative | 15% | Value over direct prompting (with vs without skill) | | D6 | Uniqueness | 10% | Overlap with similar skills, model supersession risk |
overall_score = round((D1ร0.10 + D2ร0.15 + D3ร0.20 + D4ร0.30 + D5ร0.15 + D6ร0.10) ร 10)
| Verdict | Condition | |---------|-----------| | PASS | score >= 70 AND D3 pass | | CAUTION | 50โ69, or D3 High findings | | FAIL | score < 50, or D3 Critical (gate override) |
SkillCompass passively tracks which skills you actually use and surfaces suggestions when something needs attention โ unused skills, stale evaluations, declining usage, available updates, and more. 9 built-in rules, all based on real invocation data.
/eval-skill scores six dimensions and pinpoints the weakest. /eval-improve targets that dimension, applies a fix, and re-evaluates โ only saves when the target dimension improved and security/functionality didn't regress. Then move to the next weakness.
SkillCompass covers the full lifecycle of your skills โ not just one-time evaluation.
Install โ auto-scans your inventory, quick-checks security patterns across packages and sub-skills.
Ongoing โ usage hooks passively track every invocation. Skill Inbox turns this into actionable insights: which skills are never used, which are declining, which are heavily used but never evaluated, which have updates available.
On edit โ hooks auto-check structure + security on every SKILL.md write through Claude. Catches injection, exfiltration, embedded shell. Warns, never blocks.
On change โ SHA-256 snapshots ensure any version is recoverable. D3 or D4 regresses after improvement? Snapshot restored automatically.
On update โ update checker reads local git state passively; network only when you ask. Three-way merge preserves your local improvements region-by-region.
One skill or fifty โ same workflow. /eval-audit scans a whole directory and ranks results worst-first so you fix what matters most. /eval-evolve chains multiple improve rounds automatically (default 6, stops at PASS or plateau). --ci flag outputs machine-readable JSON with exit codes for pipeline integration.
No point-to-point integration needed. The Pre-Accept Gate intercepts all SKILL.md edits regardless of source.
| Tool | How it works together | Guide | |------|----------------------|-------| | Claudeception | Extracts skill โ auto-evaluation catches security holes + redundancy โ directed fix | guide | | Self-Improving Agent | Logs errors โ feed as signals โ SkillCompass maps to dimensions and fixes | guide |
SkillCompass defines an open feedback-signal.json schema for any tool to report skill usage data:
/eval-skill ./my-skill/SKILL.md --feedback ./feedback-signals.json
Signals: trigger_accuracy, correction_count, correction_patterns, adoption_rate, ignore_rate, usage_frequency. The schema is extensible (additionalProperties: true) โ any pipeline can produce or consume this format.
This open-source project is affiliated with and endorsed by the LINUX DO community.
MIT โ Use, modify, distribute freely. See LICENSE for details.