headroom

Name: headroom
Author: headroomlabs-ai

Verified

Compress tool outputs, logs, files, and RAG chunks before they reach the LLM. 60-95% fewer tokens, same answers. Library, proxy, MCP server.

45,630stars

3,167forks

Python

Added 6/22/2026

View on GitHub Download ZIP Scan for vulnerabilities

30 days in the Featured rail · terms & refunds

AI Agentsagentaianthropicclaude-codecompressioncontext-engineeringcontext-windowcursorfastapilangchainllmmcpopenaiprompt-engineeringproxypythonragtoken-optimizationtokenstypescript

Installation

# Add to your Claude Code skills
git clone https://github.com/headroomlabs-ai/headroom

Getting Started

Guides for using ai agents skills like headroom.

Security ReportVerified

Last scanned: 6/22/2026

{
  "issues": [
    {
      "file": "README.md",
      "line": 344,
      "type": "remote-install",
      "message": "Install command (remote install script piped to a shell — review the source before running): \"curl --proto '=https' --tlsv1.2 -sSf https://sh.rustup.rs | sh\"",
      "severity": "low"
    }
  ],
  "status": "PASSED",
  "scannedAt": "2026-06-22T09:48:12.622Z",
  "npmAuditRan": true,
  "pipAuditRan": true,
  "promptInjectionRan": true
}

README.md

Headroom compresses everything your AI agent reads — tool outputs, logs, RAG chunks, files, and conversation history — before it reaches the LLM. Same answers, fraction of the tokens.

What it does

Library — compress(messages) in Python or TypeScript, inline in any app
Proxy — headroom proxy --port 8787, zero code changes, any language
Agent wrap — headroom wrap claude|codex|cursor|aider|copilot in one command
MCP server — headroom_compress, headroom_retrieve, headroom_stats for any MCP client
Cross-agent memory — shared store across Claude, Codex, Gemini, auto-dedup
headroom learn — mines failed sessions, writes corrections to CLAUDE.md / AGENTS.md
Output token reduction — trims what the model writes back (not just what you send): drops ceremony/restated code and skips deep "thinking" on routine steps. See Output token reduction.
Reversible (CCR) — originals are cached for retrieval on demand

How it works (30 seconds)

 Your agent / app
   (Claude Code, Cursor, Codex, LangChain, Agno, Strands, your own code…)
        │   prompts · tool outputs · logs · RAG results · files
        ▼
    ┌────────────────────────────────────────────────────┐
    │  Headroom   (runs locally — your data stays here)  │
    │  ────────────────────────────────────────────────  │
    │  CacheAligner  →  ContentRouter  →  CCR            │
    │                    ├─ SmartCrusher   (JSON)        │
    │                    ├─ CodeCompressor (AST)         │
    │                    └─ Kompress-base  (text, HF)    │
    │                                                    │
    │  Cross-agent memory  ·  headroom learn  ·  MCP     │
    └────────────────────────────────────────────────────┘
        │   compressed prompt  +  retrieval tool
        ▼
 LLM provider  (Anthropic · OpenAI · Bedrock · …)

ContentRouter — detects content type, selects the right compressor
SmartCrusher / CodeCompressor / Kompress-base — compress JSON, AST, or prose
CacheAligner — stabilizes prefixes so provider KV caches actually hit
CCR — stores originals locally; LLM calls headroom_retrieve if it needs them

→ Architecture · CCR reversible compression · Kompress-v2-base model card

Get started (60 seconds)

# 1 — Install
pip install "headroom-ai[all]"          # Python
npm install headroom-ai                 # Node / TypeScript

# 2 — Pick your mode
headroom wrap claude                    # wrap a coding agent
headroom proxy --port 8787              # drop-in proxy, zero code changes
# or: from headroom import compress      # inline library

# 3 — See the savings
headroom perf

Granular extras: [proxy], [mcp], [ml], [code], [memory], [relevance], [image], [agno], [langchain], [evals], [pytorch-mps] (Apple-GPU memory-embedder offload — set HEADROOM_EMBEDDER_RUNTIME=pytorch_mps). Requires Python 3.10+.

Proof

Savings on real agent workloads:

Workload	Before	After	Savings
Code search (100 results)	17,765	1,408	92%
SRE incident debugging	65,694	5,118	92%
GitHub issue triage	54,174	14,761	73%
Codebase exploration	78,502	41,254	47%

Accuracy preserved on standard benchmarks:

Benchmark	Category	N	Baseline	Headroom	Delta
GSM8K	Math	100	0.870	0.870	±0.000
TruthfulQA	Factual	100	0.530	0.560	+0.030
SQuAD v2	QA	100	—	97%	19% compression
BFCL	Tools	100	—	97%	32% compression

Reproduce: python -m headroom.evals suite --tier 1 · Full benchmarks & methodology

Output token reduction (cut what the model writes back)

Everything above shrinks the prompt you send. But you also pay for every token the model writes back — and on Opus-class models output costs 5× input. A lot of that output is waste: "Great, let me…" preambles, re-printing code you just showed it, and deep "thinking" on routine steps like reading a file.

Headroom can trim that too, from the proxy, without you changing any code:

Verbosity steering — appends a short "be terse, don't restate context" note to the end of the system prompt (so your prompt cache still hits).
Effort routing — when a turn is just the model resuming after a tool result (a file read, a passing test), it dials the model's thinking effort down. New questions and errors keep full effort.

Turn it on:

export HEADROOM_OUTPUT_SHAPER=1     # off by default
headroom proxy --port 8787

Already running a proxy? These switches are read live on every request, so a proxy that headroom wrap reused (rather than started) would not see a value you export afterwards — its environment was snapshotted at launch. headroom wrap now hot-syncs your current settings to the running proxy via a loopback POST /admin/runtime-env, so they take effect immediately with no restart (no cold start, no dropped requests, no lost caches). Set them before you wrap. On a shared proxy these overrides are global — the last explicit setting wins.

Learn the right terseness for you. People don't say how terse they want answers — they show it (they interrupt long replies, or move on before they could have read them). headroom learn --verbosity reads your past sessions and picks the level automatically:

headroom learn --verbosity            # preview what it found (dry run)
headroom learn --verbosity --apply    # save it; the proxy uses it from now on

See how many output tokens you saved. Output savings are counterfactual — we never see what the model would have written — so Headroom reports an honest estimate with a confidence range, never a made-up number:

headroom output-savings
# Reduction: 31.7%  (95% CI 27.7% … 35.7%)   [estimated]

Want a measured number instead of an estimate? Leave 10% of conversations unshaped as a control group: export HEADROOM_OUTPUT_HOLDOUT=0.1. The dashboard shows an Output Tokens Saved card next to input compression, labelled measured or estimated with the confidence band.

→ Full write-up incl. the measurement methodology: docs/proposals/output-token-reduction.md

Agent compatibility matrix

Agent	`headroom wrap`	Notes

Frequently Asked Questions

What is headroom?

headroom is an open-source ai agents skill for AI coding assistants such as Claude Code, Codex CLI, and ChatGPT, built by headroomlabs-ai. Compress tool outputs, logs, files, and RAG chunks before they reach the LLM. 60-95% fewer tokens, same answers. Library, proxy, MCP server. It has 45,630 GitHub stars.

Is headroom safe to use?

Yes. headroom passed SkillsLLM's automated security scan — a dependency vulnerability audit plus prompt-injection heuristics — with no high-severity issues. You can read the full report in the Security Report section on this page.

How do I install headroom?

Clone the repository with "git clone https://github.com/headroomlabs-ai/headroom" and add it to your Claude Code skills directory (see the Installation section above).

What programming language is headroom written in?

headroom is primarily written in Python. It is open-source under headroomlabs-ai on GitHub, so you can review or fork the full source.

Are there alternatives to headroom?

Yes. SkillsLLM lists many other AI Agents skills you can browse and compare side by side. Open the AI Agents category from the badge at the top of this page, or use the Related Skills and comparison links further down to weigh headroom against similar tools.

Agentic AI for Beginners

Build your first AI agent from scratch - tool use, ReAct pattern, memory, deployment

41 minBeginner

Comments (0)

to leave a comment.

No comments yet. Be the first to share your thoughts!

Related Skills

superpowers

by obra

An agentic skills framework & software development methodology that works.

234,966

20,863

Shell

AI Agentsaibrainstorming

View details

Compare

ECC

by affaan-m

The agent harness performance optimization system. Skills, instincts, memory, security, and research-first development for Claude Code, Codex, Opencode, Cursor and beyond.

219,557

33,651

JavaScript

AI Agentsai-agentsanthropic

The agent that grows with you

199,395

35,414

Python

AI Agentsaiai-agent

View details

Compare

n8n

by n8n-io

Fair-code workflow automation platform with native AI capabilities. Combine visual building with custom code, self-host or cloud, 400+ integrations.

193,572

58,743

TypeScript

MCP Serversaiapis

View details

Compare

everything-claude-code

by affaan-m

The agent harness performance optimization system. Skills, instincts, memory, security, and research-first development for Claude Code, Codex, Opencode, Cursor and beyond.

185,940

28,768

JavaScript

AI Agentsai-agentsanthropic

View details

Compare

claude-code

by anthropics

Claude Code is an agentic coding tool that lives in your terminal, understands your codebase, and helps you code faster by executing routine tasks, explaining complex code, and handling git workflows - all through natural language commands.

120,031

19,897

Shell

AI Agents

View details

Compare

Browse all AI Agents skills

JeecgBoot CowAgent

Headroom compresses everything your AI agent reads — tool outputs, logs, RAG chunks, files, and conversation history — before it reaches the LLM. Same answers, fraction of the tokens.

What it does

Library — compress(messages) in Python or TypeScript, inline in any app
Proxy — headroom proxy --port 8787, zero code changes, any language
Agent wrap — headroom wrap claude|codex|cursor|aider|copilot in one command
MCP server — headroom_compress, headroom_retrieve, headroom_stats for any MCP client
Cross-agent memory — shared store across Claude, Codex, Gemini, auto-dedup
headroom learn — mines failed sessions, writes corrections to CLAUDE.md / AGENTS.md
Output token reduction — trims what the model writes back (not just what you send): drops ceremony/restated code and skips deep "thinking" on routine steps. See Output token reduction.
Reversible (CCR) — originals are cached for retrieval on demand

How it works (30 seconds)

 Your agent / app
   (Claude Code, Cursor, Codex, LangChain, Agno, Strands, your own code…)
        │   prompts · tool outputs · logs · RAG results · files
        ▼
    ┌────────────────────────────────────────────────────┐
    │  Headroom   (runs locally — your data stays here)  │
    │  ────────────────────────────────────────────────  │
    │  CacheAligner  →  ContentRouter  →  CCR            │
    │                    ├─ SmartCrusher   (JSON)        │
    │                    ├─ CodeCompressor (AST)         │
    │                    └─ Kompress-base  (text, HF)    │
    │                                                    │
    │  Cross-agent memory  ·  headroom learn  ·  MCP     │
    └────────────────────────────────────────────────────┘
        │   compressed prompt  +  retrieval tool
        ▼
 LLM provider  (Anthropic · OpenAI · Bedrock · …)

ContentRouter — detects content type, selects the right compressor
SmartCrusher / CodeCompressor / Kompress-base — compress JSON, AST, or prose
CacheAligner — stabilizes prefixes so provider KV caches actually hit
CCR — stores originals locally; LLM calls headroom_retrieve if it needs them

→ Architecture · CCR reversible compression · Kompress-v2-base model card

Get started (60 seconds)

# 1 — Install
pip install "headroom-ai[all]"          # Python
npm install headroom-ai                 # Node / TypeScript

# 2 — Pick your mode
headroom wrap claude                    # wrap a coding agent
headroom proxy --port 8787              # drop-in proxy, zero code changes
# or: from headroom import compress      # inline library

# 3 — See the savings
headroom perf

Proof

Savings on real agent workloads:

Workload	Before	After	Savings
Code search (100 results)	17,765	1,408	92%
SRE incident debugging	65,694	5,118	92%
GitHub issue triage	54,174	14,761	73%
Codebase exploration	78,502	41,254	47%

Accuracy preserved on standard benchmarks:

Benchmark	Category	N	Baseline	Headroom	Delta
GSM8K	Math	100	0.870	0.870	±0.000
TruthfulQA	Factual	100	0.530	0.560	+0.030
SQuAD v2	QA	100	—	97%	19% compression
BFCL	Tools	100	—	97%	32% compression

Reproduce: python -m headroom.evals suite --tier 1 · Full benchmarks & methodology

Output token reduction (cut what the model writes back)

Headroom can trim that too, from the proxy, without you changing any code:

Verbosity steering — appends a short "be terse, don't restate context" note to the end of the system prompt (so your prompt cache still hits).
Effort routing — when a turn is just the model resuming after a tool result (a file read, a passing test), it dials the model's thinking effort down. New questions and errors keep full effort.

Turn it on:

export HEADROOM_OUTPUT_SHAPER=1     # off by default
headroom proxy --port 8787

Already running a proxy? These switches are read live on every request, so a proxy that headroom wrap reused (rather than started) would not see a value you export afterwards — its environment was snapshotted at launch. headroom wrap now hot-syncs your current settings to the running proxy via a loopback POST /admin/runtime-env, so they take effect immediately with no restart (no cold start, no dropped requests, no lost caches). Set them before you wrap. On a shared proxy these overrides are global — the last explicit setting wins.

headroom learn --verbosity            # preview what it found (dry run)
headroom learn --verbosity --apply    # save it; the proxy uses it from now on

headroom output-savings
# Reduction: 31.7%  (95% CI 27.7% … 35.7%)   [estimated]

→ Full write-up incl. the measurement methodology: docs/proposals/output-token-reduction.md

Agent compatibility matrix

Agent	`headroom wrap`	Notes