by OpenLAIR
Open-World Self-Evolution for LLM Agents — agents that build both their skills and their own verification signals from scratch, with no target-task supervision. (Code coming soon.)
# Add to your Claude Code skills
git clone https://github.com/OpenLAIR/OpenSkillOpenSkill is an open-source ai agents skill for AI coding assistants such as Claude Code, Codex CLI, and ChatGPT, built by OpenLAIR. Open-World Self-Evolution for LLM Agents — agents that build both their skills and their own verification signals from scratch, with no target-task supervision. (Code coming soon.). It has 58 GitHub stars.
OpenSkill's catalog security scan is still queued. You can run an instant dependency and prompt-injection check now with the "Scan for vulnerabilities" button above.
Clone the repository with "git clone https://github.com/OpenLAIR/OpenSkill" and add it to your Claude Code skills directory (see the Installation section above).
Yes. SkillsLLM lists many other AI Agents skills you can browse and compare side by side. Open the AI Agents category from the badge at the top of this page, or use the Related Skills and comparison links further down to weigh OpenSkill against similar tools.
No comments yet. Be the first to share your thoughts!
Unlocks once the catalog security scan passes (runs nightly).
The deep catalog scan for this skill is still queued. Run an instant dependency check now instead.
An agent that builds both its skills and its own verification signals from scratch — using only a task prompt and open-world resources, with no target-task supervision.
[!NOTE] Code is on the way. This repository currently hosts the project overview and release plan. Star ⭐ and watch 👀 to be notified when the code, skills, and benchmark drop. See the roadmap.
Self-evolving agents need to adapt after deployment — but existing methods assume a usable learning loop is already there: curated skills, successful trajectories, or verifier signals. Real open-world deployments may offer none of these, only a task prompt.
OpenSkill studies open-world self-evolution: an agent must build both its skills and its own verification signals from scratch, drawing on open-world resources but no target-task supervision. Target-task supervision is reserved strictly for final evaluation.
Unlike human-curated, LLM-generated, or supervised self-evolution, OpenSkill acquires skills from the open world and verifies them with self-built virtual tasks — making it simultaneously scalable, grounded, and supervision-free. Prior paradigms each miss at least one of these properties.
Given only a task prompt, a base model, tool access, and open-world resources, OpenSkill bootstraps a learning loop from scratch in three stages.
| Stage | Name | What happens |
|---|---|---|
| 01 | Open-world knowledge acquisition | Retrieves task-relevant knowledge and independent verification anchors from docs, repos, papers, and the web — then drafts a structured skill plan. |
| 02 | Leakage-free skill evolution | Drafts skills and refines them in a sandbox against self-built virtual tests grounded in the anchors, fixing bugs and knowledge gaps over up to three rounds. |
| 03 | Zero-shot target evaluation | Deploys the frozen skill to the target agent. Ground-truth tests are unlocked only here, at final evaluation — never during construction. |
On SkillsBench (11 domains) OpenSkill beats the strongest closed-world baseline by +8.9 / +8.8 points and lands within 1–3 points of the human upper bound — while honoring the no-supervision constraint.
| Metric | Value |
|---|---|
| Overall pass rate on Opus 4.6 | 43.6% (+8.9 over best baseline) |
| Overall pass rate on GPT 5.2 | 42.1% (+8.8 over best baseline) |
| GT test intents covered by self-built verifier | 88.9% |
| Domains best / tied-best on Opus 4.6 | 8 / 11 |
SkillsBench — overall average pass rate (%) (Human = reference upper bound, excluded from ranking)
| Target agent | No Skill | Self-Gen | CoT | Skill-Creator | AutoSkill | Memento | OpenSkill | Human |
|---|---|---|---|---|---|---|---|---|
| Opus 4.6 (Claude Code) | 25.5 | 23.9 | 23.9 | 34.7 | 24.7 | 30.1 | 43.6 | 44.5 |
| GPT 5.2 (Codex) | 25.0 | 32.2 | 33.3 | 29.2 | 11.2 | 15.6 | 42.1 | 44.8 |
Beyond SkillsBench, OpenSkill is also the best automated method on SocialMaze (82.7% / 70.7%) and ScienceWorld (90.0% / 85.3%) across both target agents.
RQ1 — Transferability Skills generated by Opus 4.6 transfer as-is to four weaker models, improving by +5.5 to +14.8 points over no-skill with no model-specific adaptation.
RQ2 — Virtual verifier quality Without ever seeing ground-truth tests, the verifier reaches 80.5% recall against GT-positive outcomes, 60.7% overall agreement, and covers 88.9% of GT test intents.
RQ3 — Component contribution. On SocialMaze, reward peaks at three refinement rounds; open-world query and the virtual verifier each improve over a parametric-only baseline and are largely complementary.
Releases ship in phases. ⭐ the repo to get notified as each lands.
@misc{yan2026openskillopenworldselfevolutionllm,
title = {OpenSkill: Open-World Self-Evolution for LLM Agents},
author = {Zhiling Yan and Dingjie Song and Hanrong Zhang and Wei Liang and Yuxuan Zhang and Yutong Dai and Lifang He and Philip S. Yu and Ran Xu and Xiang Li and Lichao Sun},
year = {2026},
eprint = {2606.06741},
archivePrefix = {arXiv},
primaryClass = {cs.AI},
url = {https://arxiv.org/abs/2606.06741}
}
Zhiling Yan1,*, Dingjie Song1,*, Hanrong Zhang2, Wei Liang1, Yuxuan Zhang3,4, Yutong Dai5, Lifang He1, Philip S. Yu2, Ran Xu5, Xiang Li6, Lichao Sun1,†
1 Lehigh University · 2 University of Illinois Chicago · 3 University of British Columbia · 4 Vector Institute · 5 Salesforce AI Research · 6 Massachusetts General Hospital & Harvard Medical School
* Equal contribution † Corresponding author