Agent-Loop-Skills

Name: Agent-Loop-Skills
Author: gaasher

Pending

Loop until it's better — drop-in agentic loops (autoresearch, scientific writing, data analysis, code/SQL/prompt optimization, red-teaming) as open-standard Agent Skills. Verification-gated; native on Claude Code, portable across Codex, Cursor & other Skills hosts.

50stars

6forks

Python

Installation

# Add to your Claude Code skills
git clone https://github.com/gaasher/Agent-Loop-Skills

Getting Started

Guides for using ai agents skills like Agent-Loop-Skills.

Caveman: Cut Claude Token Use by 65%
How agent-side prompt compression works, when to use it, and when not to.
What is an AI Skills Marketplace?
Definitions, how marketplaces work, and how to choose between them in 2026.
Getting Started with AI Skills

README.md

Frequently Asked Questions

What is Agent-Loop-Skills?

Agent-Loop-Skills is an open-source ai agents skill for AI coding assistants such as Claude Code, Codex CLI, and ChatGPT, built by gaasher. Loop until it's better — drop-in agentic loops (autoresearch, scientific writing, data analysis, code/SQL/prompt optimization, red-teaming) as open-standard Agent Skills. Verification-gated; native on Claude Code, portable across Codex, Cursor & other Skills hosts. It has 50 GitHub stars.

Is Agent-Loop-Skills safe to use?

Agent-Loop-Skills's catalog security scan is still queued. You can run an instant dependency and prompt-injection check now with the "Scan for vulnerabilities" button above.

How do I install Agent-Loop-Skills?

Clone the repository with "git clone https://github.com/gaasher/Agent-Loop-Skills" and add it to your Claude Code skills directory (see the Installation section above).

What programming language is Agent-Loop-Skills written in?

Agent-Loop-Skills is primarily written in Python. It is open-source under gaasher on GitHub, so you can review or fork the full source.

Are there alternatives to Agent-Loop-Skills?

Yes. SkillsLLM lists many other AI Agents skills you can browse and compare side by side. Open the AI Agents category from the badge at the top of this page, or use the Related Skills and comparison links further down to weigh Agent-Loop-Skills against similar tools.

Agentic AI for Beginners

Build your first AI agent from scratch - tool use, ReAct pattern, memory, deployment

41 minBeginner

Comments (0)

to leave a comment.

No comments yet. Be the first to share your thoughts!

Related Skills

superpowers

by obra

An agentic skills framework & software development methodology that works.

234,966

claude-design-studio-toolkit CiteCheck

agent-loop-skills

Loop until it's better — drop-in agentic loops, packaged as open-standard Agent Skills.

Autoresearch · scientific writing · data analysis · code/SQL/prompt optimization · red-teaming — each a generic, reusable loop you bind to your own task at invocation time, that iterates against a real signal until the work is actually better.

A real run. The tournament-autoresearch loop on a CIFAR-10 model under a fixed 5-epoch budget — competing agents propose a change each step, a self-calibrating judge keeps the winners (green) and discards the regressions (gray): 0.734 → 0.798 val_acc, hands-off, 7 of 11 kept. Full ledger: showcase/tournament-autoresearch. Far from SOTA by design — a deliberately tiny CNN at 5 epochs on a laptop GPU (Apple MPS). The demo is the loop's decision-making, not the absolute accuracy.

Why loops-as-skills

Two ideas collided in late 2025, and this repo lives in the overlap:

Skills became the portable unit. An Agent Skill is just Markdown + a little YAML that an agent loads only when relevant — "maybe a bigger deal than MCP … throw in some text and let the model figure it out" (Simon Willison). One SKILL.md now runs across ~30 hosts (Claude Code, Codex, Cursor, …).
The loop became the program. Karpathy ran ~700 autoresearch experiments in 2 days from one markdown prompt; Geoffrey Huntley's Ralph is, "in its purest form, a Bash loop." Agents get most of their power not from one clever prompt but from iterating against feedback.

This repo makes the loop be the skill. Instead of task-specific skills, each entry is a generic loop — program · artifact · feedback signal · run ledger · termination — that you bind to your task at invocation time. Paste your goal; the loop proposes a change, runs it in your environment, scores it on a real signal (tests, latency, a metric, a calibrated judge), keeps it only if it's better, logs it, and repeats.

The honest part: unsupervised agent loops are famous for spinning forever and confidently shipping garbage — at 90% per-step accuracy, a 5-step chain fails ~40% of the time. Every loop here is verification-gated: an objective feedback signal decides each step and an explicit termination condition ends it. That discipline — not autonomy for its own sake — is the point. (See Limitations.)

How a loop works

flowchart LR
  T["bind your task<br/>(artifact + signal + budget)"] --> P["propose<br/>one change"]
  P --> R["run it in<br/>your env"]
  R --> S{"score<br/>tests · metric · judge"}
  S -->|better| K["keep + log"]
  S -->|worse| X["revert"]
  K --> G{stop?}
  X --> G
  G -->|"plateau · budget · threshold"| B(["best artifact"])
  G -->|no| P

Every loop decomposes into the same five ingredients — program (SKILL.md), artifact slot (what's improved), feedback signal (what drives the next step), run ledger (append-only log), and termination (when to stop). Skills ship zero heavy dependencies: your code (a torch trainer, a SQL database, a dataset) runs in your environment via a bound run command; the skill shells out and reads the result. Multi-role loops use spawn-or-degrade — real isolated subagents on Claude Code, the same roles inline elsewhere.

Install

Any one of these installs all the loops:

Claude Code — plugin marketplace (add once, then install):

/plugin marketplace add gaasher/agent-loop-skills
/plugin install agent-loops@agent-loop-skills

Loops install namespaced as agent-loops:<name> (e.g. agent-loops:karpathy).

Any Agent-Skills host — the standard installers:

npx skills add gaasher/agent-loop-skills                   # auto-detects host, installs to the right dir
gh skill install gaasher/agent-loop-skills --agent <host>  # claude-code | codex | cursor | …  (--pin, gh skill update)

Manual — clone, then copy the loops into your host's skills dir (pick the line for your host):

git clone https://github.com/gaasher/agent-loop-skills

cp -r agent-loop-skills/loops/* ~/.agents/skills/   # cross-tool: Codex, Cursor, Pi, OpenClaw, …
cp -r agent-loop-skills/loops/* ~/.claude/skills/   # Claude Code
# Hermes: hermes skills tap add gaasher/agent-loop-skills

Then just describe your task — the host loads the matching loop. Research loops also call the shared literature-search skill; installing everything puts it alongside them, and any loop degrades gracefully (to WebSearch) if it's absent.

Loops in action

Most skill repos tell you what a skill is. Here's what these loops actually do — real Sonnet runs, full ledgers in showcase/.

🧑‍⚖️ `tournament-autoresearch` — competing ideas, a self-calibrating judge

<n> agents pitch competing changes each step; a judge critiques them, picks one, runs it, and recalibrates by comparing its predicted vs realized gain. On a CIFAR-10 SmallCNN under a fixed 5-epoch budget it climbed 0.734 → 0.798 val_acc, keeping 7 of 11 changes and reverting all 4 that regressed — escaping the plateaus a single-thread loop gets stuck on. (That's the run charted up top.) → showcase/tournament-autoresearch

🔬 `ml-autoresearch` — analysis-first, every change traced to a cause

This loop reads inside each run — gradient flow, dead neurons, the loss curve — and grounds the next change in that evidence rather than guessing: "FC grad 57% vs first conv 3.3% — severe imbalance; 54% dead neurons" → add BatchNorm; "cosine schedule fixed the epoch-3 dip entirely (monotonic!), +0.033". It also reverts what hurts (augmentation, over-aggressive LR). The point isn't a leaderboard number — it's that every accepted change has a measured reason behind it. → showcase/ml-autoresearch

📊 `data-analysis` — findings with a number behind every one

Hypothesis → verify, stdlib-only. On a planted dataset it surfaced 3 real findings and correctly refuted 2, with effect sizes matching ground truth and no hallucinations: enterprise vs consumer order value 184.90 vs 109.16 (Cohen's d = 2.13), mobile return rate 32.8% vs 8.2% (RR 4.0) — and it reversed a plausible-but-wrong claim once it spotted a mobile confound. → showcase/data-analysis

Loop	What the run did
`optimize-loop`	Correctness-gated speedup: a SQLite query 1,131.75 ms → 1.055 ms (~1,073×), result-set hash matching baseline on every kept iteration; in code mode cut cyclomatic complexity 23 → 15 (nesting 7 → 3) with 13/13 tests green.
`research-proposal`	ScholarEval graded a proposal against the literature; Judge + Reviser iterated grade 45 → 84 (soundness 2→4, contribution 1→4) over 5 rounds.
`scientific-figure`	Same ImageNet top-1-accuracy bar-chart brief, with vs without the loop: a single call truncated the y-axis at 50% and used non-paper numbers; the loop verified every value against the arXiv papers, flagged GoogLeNet's borrowed top-1, and iterated 80 → 96 (PASS).
`red-team`	Against a naive content filter, surfaced all 5 planted weaknesses (case bypass, leetspeak, spacing, synonyms, over-block) — 39 bypasses + 6 over-blocks — with a one-line root-cause fix each.
`power-analysis`	Solved n = 100/group for 80% power via Monte-Carlo, fixed all 6 validity flaws, and emitted a full pre-registration.
`research-question`	Sharpened 5 vague drafts → 3 strong questions (≥75), with real web novelty checks pivoting already-answered questions toward the open sub-problem.

The loops

† = multi-role (real subagents on Claude Code, inline elsewhere). Browse any folder for its SKILL.md.

Loop	Why you'd reach for it
`karpathy`	The minimal baseline — propose, train, keep-if-better, loop. A faithful nod to Karpathy's autoresearch.
`ml-autoresearch` †	Analysis-first: diagnoses each run and grounds the next change in evidence. A `literature` dial adds paper-grounded changes.
`exploratory-autoresearch`	Forces broad exploration via a temperatur