by Green-PT
Honey (I Shrunk the AI) by GreenPT: a cross-tool coding skill that cuts AI coding-agent token usage and LLM API costs — write less code, less prose, and denser agent-to-agent handoffs (−53%, lossless in benchmarks) with no loss of quality. Works with Claude Code, Cursor, GitHub Copilot, Codex, Gemini CLI, Windsurf, Cline & Kiro.
# Add to your Claude Code skills
git clone https://github.com/Green-PT/honey-for-devsGuides for using ai agents skills like honey-for-devs.
honey-for-devs is an open-source ai agents skill for AI coding assistants such as Claude Code, Codex CLI, and ChatGPT, built by Green-PT. Honey (I Shrunk the AI) by GreenPT: a cross-tool coding skill that cuts AI coding-agent token usage and LLM API costs — write less code, less prose, and denser agent-to-agent handoffs (−53%, lossless in benchmarks) with no loss of quality. Works with Claude Code, Cursor, GitHub Copilot, Codex, Gemini CLI, Windsurf, Cline & Kiro. It has 52 GitHub stars.
honey-for-devs's catalog security scan is still queued. You can run an instant dependency and prompt-injection check now with the "Scan for vulnerabilities" button above.
Clone the repository with "git clone https://github.com/Green-PT/honey-for-devs" and add it to your Claude Code skills directory (see the Installation section above).
honey-for-devs is primarily written in JavaScript. It is open-source under Green-PT on GitHub, so you can review or fork the full source.
Yes. SkillsLLM lists many other AI Agents skills you can browse and compare side by side. Open the AI Agents category from the badge at the top of this page, or use the Related Skills and comparison links further down to weigh honey-for-devs against similar tools.
No comments yet. Be the first to share your thoughts!
Unlocks once the catalog security scan passes (runs nightly).
The deep catalog scan for this skill is still queued. Run an instant dependency check now instead.
Write less code and say less about it. Honey (I Shrunk the AI) by GreenPT is a cross-tool coding skill that cuts AI coding-agent token usage and LLM API costs — making agents emit less code and less prose without losing correctness. It works with Claude Code, Cursor, GitHub Copilot, Codex, Gemini CLI, Windsurf, Cline, OpenClaw, and Kiro. Three independent levers, applied reflexively:
Honey combines what Ponytail (minimal code) and Caveman (terse prose) do separately, then goes further:
lite / full / ultra chosen reflexively from the
request, with no deliberation tax (it never spends reasoning tokens deciding
how to comply — that would defeat the purpose on reasoning models).Volume is cost. In agentic coding sessions, the volume of generated code and prose is what runs up the bill — and most of it is waste.
This repo ships a reproducible benchmark (bench/) so you don't have
to take the numbers on faith: 23 tasks across three kinds of work — baseline vs
Caveman vs
Ponytail vs Honey — same model, same
prompts, only the skill changes. Correctness is objective (unit tests, structural /
accessibility checks, and lossless round-trip recovery for agent handoffs); quality
is scored by a 4-model cross-family judge panel (median of Opus 4.8 + Sonnet 4.6
cd bench && npm run bench to reproduce.A single blended number hides the story, because the levers fire differently per task type. Quality is % of baseline (panel median; for handoffs, lossless recovery); tokens are generated output vs baseline:
| Task tier | Caveman | Ponytail | Honey |
|---|---|---|---|
| Code (14 unit-tested tasks) | 101% · −37% | 99% · +24% | 98% · −49% |
| User-facing (7 landing/UI tasks) | 99% · −18% | 95% · −33% | 101% · −6% |
| Agent-to-agent (2 handoff tasks, lossless recovery) | 67% · −23% | 50% · −22% | 100% · −51% |
Honey leads quality where it matters most — it tops the user-facing and agent-to-agent tiers (the quality-separating ones) and stays within judge noise of the pack on saturated code tasks — while cutting tokens where it's safe to:
The same pattern holds on GPT-5.5 (full two-provider table in
bench/results/cross-provider.md): Honey is the
only variant with no test regressions across all three tiers on Opus, and on
both models it keeps top-tier quality while cutting tokens on every tier.
Honey includes ESO, a zero-dependency, schema-first format for agent handoffs. Repeated record keys are emitted once; declared row counts catch truncated messages; JSON-compatible cells preserve types.
The reproducible ESO/TOON/JSON benchmark measures bytes,
two tokenizer estimates, codec speed, and lossless recovery across five agent
handoff shapes. Run it with npm run bench:eso.
printf '%s' '{"from":"reviewer","findings":[{"sev":"H","issue":"expired token"}]}' | eso encode
eso decode < handoff.eso
ESO is lossless, for handoffs where every row matters. CCR (Compress-Cache-Retrieve)
is the lossy-but-recoverable lever for the opposite case: a long uniform array you must
read but mostly skim — logs, scan results, event streams. It keeps an informative sample
(endpoints, anomalies/change-points, head/tail), caches the dropped rows locally, and
leaves a <<ccr:HASH N_rows_offloaded>> sentinel. Nothing is lost — retrieve restores
the original by hash on demand.
some-tool | eso crush # → sampled view + sentinel; originals cached in .honey-ccr/
eso retrieve <hash> # → the full original array, verbatim
Validated on a 90-row log (opus-4.8 + gpt-5.5): −82% tokens, crushed-only 96%
answer accuracy, 100% with retrieve — and the lone crushed miss was a refusal, not a
hallucination. Benches: npm run bench:ccr (tokens) and npm run bench:ccr:comprehension
(quality). The honey-ccr skill tells the agent when to reach for it.
Pick Honey when you want the best quality-per-token, especially in Claude Code.
Honey is one always-on core plus a family of on-demand tools. The core is a writing style (it must be the default to pay off); the rest are actions you reach for at a specific moment.
| Name | Kind | What it does |
|---|---|---|
honey |
core skill (always-on) | the three levers, applied reflexively to every response. /honey [lite|full|ultra|off] |
honey-design |
satellite skill | for user-facing UI (landing pages, components): keeps the full rendered polish, cuts tokens by writing the design densely (CSS vars, shared classes, clamp()) — same pixels, fewer tokens |
honey-review |
satellite skill | review a diff for over-engineering + over-verbosity; terse delete-list |
honey-eco |
satellite skill | this session's CO₂ / $ / tokens saved, from the committed EcoLogits port |
honey-gain |
satellite skill | the committed benchmark scoreboard (reads bench/results/ at runtime) |
honey-compress |
satellite skill | rewrite a re-read memory file (CLAUDE.md, AGENTS.md) tersely to cut input tokens; backs up the original |
honey-memory |
satellite skill | create + maintain one committed per-project PROJECT.md so agents stop re-discovering the same facts every cold session; stores only stable, not-in-the-code context, kept honest by living in git |
honey-ccr |
satellite skill | crush huge redundant array tool output (logs, scan results) to a sampled view; lossy-but-recoverable via eso crush/retrieve |
honey-hive |
guide skill | decide when to delegate to the hive vs. work inline |
hive-scout |
subagent (haiku, read-only) | locate symbols / callers / configs; returns a compact id-keyed JSON map |
hive-reviewer |
subagent (haiku, read-only) | review a diff/files; returns columnar id-keyed JSON findings |
hive-builder |
subagent (sonnet, ≤2 files) | make a surgical edit under the ladder; returns a compact change-manifest |
The hive is Lever 3 with a runtime: each subagent returns a compressed handoff,
so the result injected back into the orchestrator's context is −44–53% smaller
with zero loss (npm run bench:hive). Live, the skills hold up too — honey −86%,
honey-review −70%, hive-reviewer −43% output tokens at passing correctness
(npm run bench:skills). See bench/hive/RESULTS.md and
bench/skills/RESULTS.md.
On user-facing work — where the core skill spends tokens because polish is the
spec — honey-design keeps the same rendered polish for −19% output tokens vs no
skill (judge 92 vs 90), beating the core skill on both axes across 7 landing-page/UI
tasks. See bench/results/honey-design.md.
Honesty note. Earlier versions of this README quoted
92% / 78% / 73%quality and−57% / −65% / −70%tokens from an unpublished run. Those don't reproduce — the real quality spread is far narrower and the token savings are tier-dependent (and Ponytail adds tokens on simple code). The table above is what the committedbench/harness actually produces; seebench/results/combined.mdfor the full