by davidliuk
Dependency-Aware Structural Retrieval for Massive Agent Skills
# Add to your Claude Code skills
git clone https://github.com/davidliuk/graph-of-skillsGraph of Skills builds a skill graph offline from a library of SKILL.md documents, then retrieves a small, ranked set of relevant skills at task time. Instead of flooding the agent context with an entire skill library, GoS surfaces only the skills most likely to help -- along with their prerequisites and related capabilities.
Retrieval pipeline:
GoS is evaluated on SkillsBench (87 dockerized coding tasks) and ALFWorld (134 household games) across three model families. = average reward (%), = input tokens, = runtime (s). ↑ higher is better, ↓ lower is better.
No comments yet. Be the first to share your thoughts!
| Model | Method | SB R↑ | SB T↓ | SB S↓ | AW R↑ | AW T↓ | AW S↓ | |-------|--------|------:|------:|------:|------:|------:|------:| | Claude Sonnet 4.5 | Vanilla Skills | 25.0 | 967,791 | 465.8 | 89.3 | 1,524,401 | 53.2 | | | Vector Skills | 19.3 | 894,640 | 357.3 | 93.6 | 28,407 | 37.8 | | | + GoS | 31.0 | 860,315 | 364.9 | 97.9 | 27,215 | 49.2 | | MiniMax M2.7 | Vanilla Skills | 17.2 | 942,113 | 580.7 | 47.1 | 2,184,823 | 88.6 | | | Vector Skills | 10.4 | 852,881 | 552.9 | 50.7 | 66,109 | 73.4 | | | + GoS | 18.7 | 867,452 | 502.5 | 54.3 | 65,227 | 68.8 | | GPT-5.2 Codex | Vanilla Skills | 27.4 | 3,187,749 | 686.8 | 89.3 | 1,435,614 | 83.3 | | | Vector Skills | 21.5 | 1,243,648 | 773.0 | 92.9 | 34,436 | 57.0 | | | + GoS | 34.4 | 1,379,773 | 715.6 | 93.6 | 46,462 | 64.7 |
GoS achieves the highest reward on every model on both benchmarks while cutting input tokens by up to 56× (ALFWorld, Claude Sonnet 4.5) vs. Vanilla Skills. For scalability and ablation analysis, see the paper.
uv (recommended) or pipgit clone https://github.com/graph-of-skills/graph-of-skills.git
cd graph-of-skills
uv sync
cp .env.example .env # then fill in your API keys
OPENAI_API_KEY=sk-...
# Use the ``openai/...`` prefix so LiteLLM targets the OpenAI API (omit OPENAI_BASE_URL).
GOS_EMBEDDING_MODEL=openai/text-embedding-3-large
GOS_EMBEDDING_DIM=3072
OPENROUTER_API_KEY=<openrouter-key>
OPENAI_BASE_URL=https://openrouter.ai/api/v1
GOS_EMBEDDING_MODEL=openrouter/openai/text-embedding-3-large
GOS_EMBEDDING_DIM=3072
OPENAI_API_KEY=<azure-api-key>
OPENAI_BASE_URL=https://YOUR-RESOURCE.services.ai.azure.com/openai/v1
# Must match your **deployment name** in Azure (not necessarily ``text-embedding-3-large``).
GOS_EMBEDDING_MODEL=openai/<your-deployment-name>
GOS_EMBEDDING_DIM=<vector-dimension-for-that-model>
GEMINI_API_KEY=<your-key>
GOS_EMBEDDING_MODEL=gemini/gemini-embedding-001
GOS_EMBEDDING_DIM=3072
Goal: install the package, pull the published skill libraries, build (or download) a graph workspace, then run retrieval from the shell.
Read next: DATA.md for every download flag and asset size; .env.example for embedding providers. After GoS works locally, use evaluation/README.md for benchmark runners and evaluation/skillsbench/README.md for Harbor-based SkillsBench.
Complete Installation above: clone, uv sync, cp .env.example .env, and set embedding (and optional LLM) keys. Indexing and retrieval load .env from the repo root when you use uv run gos ….
The collections skills_200, skills_500, skills_1000, skills_2000 are directories of SKILL.md files on HuggingFace, not in git. They unpack to:
data/skillsets/skills_200/ … data/skillsets/skills_2000/./scripts/download_data.sh --skillsets
This tries each archive, skips directories that already have files, and logs [skip] if an archive is not yet on the Hub. Gated datasets: HF_TOKEN=hf_... ./scripts/download_data.sh --skillsets. Full reference (tasks, workspaces, selective flags): DATA.md.
Tiny smoke test without HuggingFace: index the built-in folder skills/ (only a few skills) with any --workspace path you like.
--workspace is where GoS stores the indexed graph (vectors + graph storage). Use the same path for gos retrieve, gos status, and gos add.
For ALFWorld and SkillsBench defaults, keep this mapping (see evaluation/README.md and evaluation/skillsbench/graphskills_benchmark.py):
| Skill tree you index | Recommended --workspace |
|----------------------|---------------------------|
| data/skillsets/skills_200 | data/gos_workspace/skills_200_v1 |
| data/skillsets/skills_500 | data/gos_workspace/skills_500_v1 |
| data/skillsets/skills_1000 | data/gos_workspace/skills_1000_v1 |
| data/skillsets/skills_2000 | data/gos_workspace/skills_2000_v1 |
A. Build locally (needs embedding API; duration grows with library size):
mkdir -p data/gos_workspace
uv run gos index data/skillsets/skills_200 \
--workspace data/gos_workspace/skills_200_v1 --clear
Use the matching pair for other sets (e.g. skills_1000 → data/gos_workspace/skills_1000_v1). Embedding model and dimension in .env must stay the same for later retrieval (see Configuration).
B. Download a prebuilt workspace (no gos index; must match the embedding used to build that archive):
./scripts/download_data.sh --workspace
See DATA.md for which gos_workspace_skills_*_v1.tar.gz files exist on the Hub and how they map to data/gos_workspace/.
uv run gos retrieve "parse binary STL file, calculate volume and mass" \
--workspace data/gos_workspace/skills_200_v1 --max-skills 5
uv run gos status --workspace data/gos_workspace/skills_200_v1
uv run gos add path/to/NEW_SKILL.md --workspace data/gos_workspace/skills_200_v1
GoS ships with a built-in MCP server that gives Claude Code direct access to the skill graph. When you open this project, Claude Code auto-discovers the server via .mcp.json — no manual setup.
Quick start:
uv sync # install deps (once)
cp .env.example .env # fill in API keys
./scripts/download_data.sh --workspace # download prebuilt workspaces
Then open the project in Claude Code. The graph-of-skills MCP server is ready. Ask naturally:
"Find skills for processing 3D mesh files with GoS, then follow the