Compress tool outputs, logs, files, and RAG chunks before they reach the LLM. 60-95% fewer tokens, same answers. Library, proxy, MCP server.
# Add to your Claude Code skills
git clone https://github.com/headroomlabs-ai/headroomGuides for using ai agents skills like headroom.
Last scanned: 6/22/2026
{
"issues": [
{
"file": "README.md",
"line": 344,
"type": "remote-install",
"message": "Install command (remote install script piped to a shell — review the source before running): \"curl --proto '=https' --tlsv1.2 -sSf https://sh.rustup.rs | sh\"",
"severity": "low"
}
],
"status": "PASSED",
"scannedAt": "2026-06-22T09:48:12.622Z",
"npmAuditRan": true,
"pipAuditRan": true,
"promptInjectionRan": true
}Headroom compresses everything your AI agent reads — tool outputs, logs, RAG chunks, files, and conversation history — before it reaches the LLM. Same answers, fraction of the tokens.
compress(messages) in Python or TypeScript, inline in any appheadroom proxy --port 8787, zero code changes, any languageheadroom wrap claude|codex|cursor|aider|copilot in one commandheadroom_compress, headroom_retrieve, headroom_stats for any MCP clientheadroom learn — mines failed sessions, writes corrections to CLAUDE.md / AGENTS.md Your agent / app
(Claude Code, Cursor, Codex, LangChain, Agno, Strands, your own code…)
│ prompts · tool outputs · logs · RAG results · files
▼
┌────────────────────────────────────────────────────┐
│ Headroom (runs locally — your data stays here) │
│ ──────────────────────────────────────────────── │
│ CacheAligner → ContentRouter → CCR │
│ ├─ SmartCrusher (JSON) │
│ ├─ CodeCompressor (AST) │
│ └─ Kompress-base (text, HF) │
│ │
│ Cross-agent memory · headroom learn · MCP │
└────────────────────────────────────────────────────┘
│ compressed prompt + retrieval tool
▼
LLM provider (Anthropic · OpenAI · Bedrock · …)
headroom_retrieve if it needs them→ Architecture · CCR reversible compression · Kompress-v2-base model card
# 1 — Install
pip install "headroom-ai[all]" # Python
npm install headroom-ai # Node / TypeScript
# 2 — Pick your mode
headroom wrap claude # wrap a coding agent
headroom proxy --port 8787 # drop-in proxy, zero code changes
# or: from headroom import compress # inline library
# 3 — See the savings
headroom perf
Granular extras: [proxy], [mcp], [ml], [code], [memory], [relevance], [image], [agno], [langchain], [evals], [pytorch-mps] (Apple-GPU memory-embedder offload — set HEADROOM_EMBEDDER_RUNTIME=pytorch_mps). Requires Python 3.10+.
Savings on real agent workloads:
| Workload | Before | After | Savings |
|---|---|---|---|
| Code search (100 results) | 17,765 | 1,408 | 92% |
| SRE incident debugging | 65,694 | 5,118 | 92% |
| GitHub issue triage | 54,174 | 14,761 | 73% |
| Codebase exploration | 78,502 | 41,254 | 47% |
Accuracy preserved on standard benchmarks:
| Benchmark | Category | N | Baseline | Headroom | Delta |
|---|---|---|---|---|---|
| GSM8K | Math | 100 | 0.870 | 0.870 | ±0.000 |
| TruthfulQA | Factual | 100 | 0.530 | 0.560 | +0.030 |
| SQuAD v2 | QA | 100 | — | 97% | 19% compression |
| BFCL | Tools | 100 | — | 97% | 32% compression |
Reproduce: python -m headroom.evals suite --tier 1 · Full benchmarks & methodology
Everything above shrinks the prompt you send. But you also pay for every token the model writes back — and on Opus-class models output costs 5× input. A lot of that output is waste: "Great, let me…" preambles, re-printing code you just showed it, and deep "thinking" on routine steps like reading a file.
Headroom can trim that too, from the proxy, without you changing any code:
Turn it on:
export HEADROOM_OUTPUT_SHAPER=1 # off by default
headroom proxy --port 8787
Already running a proxy? These switches are read live on every request, so a proxy that
headroom wrapreused (rather than started) would not see a value you export afterwards — its environment was snapshotted at launch.headroom wrapnow hot-syncs your current settings to the running proxy via a loopbackPOST /admin/runtime-env, so they take effect immediately with no restart (no cold start, no dropped requests, no lost caches). Set them before youwrap. On a shared proxy these overrides are global — the last explicit setting wins.
Learn the right terseness for you. People don't say how terse they want
answers — they show it (they interrupt long replies, or move on before they
could have read them). headroom learn --verbosity reads your past sessions and
picks the level automatically:
headroom learn --verbosity # preview what it found (dry run)
headroom learn --verbosity --apply # save it; the proxy uses it from now on
See how many output tokens you saved. Output savings are counterfactual — we never see what the model would have written — so Headroom reports an honest estimate with a confidence range, never a made-up number:
headroom output-savings
# Reduction: 31.7% (95% CI 27.7% … 35.7%) [estimated]
Want a measured number instead of an estimate? Leave 10% of conversations
unshaped as a control group: export HEADROOM_OUTPUT_HOLDOUT=0.1. The dashboard
shows an Output Tokens Saved card next to input compression, labelled
measured or estimated with the confidence band.
→ Full write-up incl. the measurement methodology: docs/proposals/output-token-reduction.md
| Agent | headroom wrap |
Notes |
|---|
headroom is an open-source ai agents skill for AI coding assistants such as Claude Code, Codex CLI, and ChatGPT, built by headroomlabs-ai. Compress tool outputs, logs, files, and RAG chunks before they reach the LLM. 60-95% fewer tokens, same answers. Library, proxy, MCP server. It has 45,630 GitHub stars.
Yes. headroom passed SkillsLLM's automated security scan — a dependency vulnerability audit plus prompt-injection heuristics — with no high-severity issues. You can read the full report in the Security Report section on this page.
Clone the repository with "git clone https://github.com/headroomlabs-ai/headroom" and add it to your Claude Code skills directory (see the Installation section above).
headroom is primarily written in Python. It is open-source under headroomlabs-ai on GitHub, so you can review or fork the full source.
Yes. SkillsLLM lists many other AI Agents skills you can browse and compare side by side. Open the AI Agents category from the badge at the top of this page, or use the Related Skills and comparison links further down to weigh headroom against similar tools.
No comments yet. Be the first to share your thoughts!