SkillsLLM
CategoriesBlogAI NewsAbout
HomeAI Agentsskillgrade

skillgrade

by mgechev

Pending

"Unit tests" for your agent skills

126stars
7forks
TypeScript
Added 3/15/2026
View on GitHubDownload ZIP
AI Agentsagentclaude-codecodexevalgemini-cliskill
Installation
# Add to your Claude Code skills
git clone https://github.com/mgechev/skillgrade
README.md

Skillgrade

The easiest way to evaluate your Agent Skills. Tests that AI agents correctly discover and use your skills.

See examples/ — superlint (simple) and angular-modern (TypeScript grader).

Browser Preview

Quick Start

Prerequisites: Node.js 20+, Docker

npm i -g skillgrade

1. Initialize — go to your skill directory (must have SKILL.md) and scaffold:

cd my-skill/
GEMINI_API_KEY=your-key skillgrade init    # or ANTHROPIC_API_KEY / OPENAI_API_KEY
# Use --force to overwrite an existing eval.yaml

Generates eval.yaml with AI-powered tasks and graders. Without an API key, creates a well-commented template.

2. Edit — customize eval.yaml for your skill (see eval.yaml Reference).

3. Run:

GEMINI_API_KEY=your-key skillgrade --smoke

The agent is auto-detected from your API key: GEMINI_API_KEY → Gemini, ANTHROPIC_API_KEY → Claude, OPENAI_API_KEY → Codex. Override with --agent=claude.

4. Review:

skillgrade preview          # CLI report
skillgrade preview browser  # web UI → http://localhost:3847

Reports are saved to $TMPDIR/skillgrade/<skill-name>/results/. Override with --output=DIR.

Presets

| Flag | Trials | Use Case | |------|--------|----------| | --smoke | 5 | Quick capability check | | --reliable | 15 | Reliable pass rate estimate | | --regression | 30 | High-confidence regression detection |

Options

| Flag | Description | |------|-------------| | --trials=N | Override trial count | | --parallel=N | Run trials concurrently | | --agent=gemini\|claude\|codex | Override agent (default: auto-detect from API key) | | --provider=docker\|local | Override provider | | --output=DIR | Output directory (default: $TMPDIR/skillgrade) | | --validate | Verify graders using reference solutions | | --ci | CI mode: exit non-zero if below threshold | | --threshold=0.8 | Pass rate threshold for CI mode | | --preview | Show CLI results after running |

eval.yaml Reference

version: "1"

# Optional: explicit path to skill directory (defaults to auto-detecting SKILL.md)
# skill: path/to/my-skill

defaults:
  agent: gemini          # gemini | claude | codex
  provider: docker       # docker | local
  trials: 5
  timeout: 300           # seconds
  threshold: 0.8         # for --ci mode
  grader_model: gemini-3-flash-preview  # default LLM grader model
  docker:
    base: node:20-slim
    setup: |             # extra commands run during image build
      apt-get update && apt-get install -y jq
  environment:           # container resource limits
    cpus: 2
    memory_mb: 2048

tasks:
  - name: fix-linting-errors
    instruction: |
      Use the superlint tool to fix coding standard violations in app.js.

    workspace:    ...
Comments (0)
to leave a comment.

No comments yet. Be the first to share your thoughts!

Related Skills

gemini-cli

by google-gemini

An open-source AI agent that brings the power of Gemini directly into your terminal.
97,754
12,257
TypeScript
AI Agentsaiai-agents
View details
everything-claude-code

by affaan-m

The agent harness performance optimization system. Skills, instincts, memory, security, and research-first development for Claude Code, Codex, Opencode, Cursor and beyond.
76,097
9,506
JavaScript
AI Agentsai-agentsanthropic
View details
awesome-claude-skills

by ComposioHQ

A curated list of awesome Claude Skills, resources, and tools for customizing Claude AI workflows
44,141
4,446
Python
AI Agentsagent-skillsai-agents
View details
chatgpt-on-wechat

by zhayujie

CowAgent是基于大模型的超级AI助理,能主动思考和任务规划、访问操作系统和外部资源、创造和执行Skills、拥有长期记忆并不断成长。同时支持飞书、钉钉、企业微信应用、微信公众号、网页等接入,可选择OpenAI/Claude/Gemini/DeepSeek/ Qwen/GLM/Kimi/LinkAI,能处理文本、语音、图片和文件,可快速搭建个人AI助手和企业数字员工。
42,209
9,826
Python
AI Agentsaiai-agent
View details
ui-ux-pro-max-skill

by nextlevelbuilder

An AI SKILL that provide design intelligence for building professional UI/UX multiple platforms
41,523
4,026
Python
CLI Toolsai-skillsantigravity
View details
cherry-studio

by CherryHQ

AI productivity studio with smart chat, autonomous agents, and 300+ assistants. Unified access to frontier LLMs
41,505
3,839
TypeScript
AI Agentsai-agentclaude-code
View details