by agiwhitelist
Local streaming reverse proxy between AI coding agents (Claude Code, Cursor, Codex) and model APIs (Anthropic, OpenAI, Gemini, MiniMax). Meters every token + USD cost, compacts bloated context to cut pay-per-token API spend, and runs shadow-eval to prove quality held. ccusage-style metering + live local dashboard.
# Add to your Claude Code skills
git clone https://github.com/agiwhitelist/tokdiettokdiet is an open-source ai agents skill for AI coding assistants such as Claude Code, Codex CLI, and ChatGPT, built by agiwhitelist. Local streaming reverse proxy between AI coding agents (Claude Code, Cursor, Codex) and model APIs (Anthropic, OpenAI, Gemini, MiniMax). Meters every token + USD cost, compacts bloated context to cut pay-per-token API spend, and runs shadow-eval to prove quality held. ccusage-style metering + live local dashboard. It has 69 GitHub stars.
tokdiet's catalog security scan is still queued. You can run an instant dependency and prompt-injection check now with the "Scan for vulnerabilities" button above.
Clone the repository with "git clone https://github.com/agiwhitelist/tokdiet" and add it to your Claude Code skills directory (see the Installation section above).
tokdiet is primarily written in TypeScript. It is open-source under agiwhitelist on GitHub, so you can review or fork the full source.
Yes. SkillsLLM lists many other AI Agents skills you can browse and compare side by side. Open the AI Agents category from the badge at the top of this page, or use the Related Skills and comparison links further down to weigh tokdiet against similar tools.
No comments yet. Be the first to share your thoughts!
Unlocks once the catalog security scan passes (runs nightly).
The deep catalog scan for this skill is still queued. Run an instant dependency check now instead.
Your AI agent is paying to send the same file dump five times. tokdiet is a local proxy that sits between your agent and the model API, meters every token, puts your bloated context on a diet — and proves the answer didn't get worse.
ccusage that shrinks the bill — without losing quality.
🌐 Live demo (watch one request lose the weight): agiwhitelist.github.io/tokdiet 📝 Launch write-up + full benchmark methodology: I cut an AI agent's input tokens by 71% and quality held — here's the 66-task benchmark
Every "context optimizer" cuts tokens. The scary question is the one they can't answer:
"If I cut the context, does the model get dumber?"
So we measured it. A 66-task A/B benchmark across 6 categories on a real model (MiniMax‑M3), each task run twice — full context (baseline) vs through tokdiet (governed) — graded against the known answer, repeated ×3 and majority‑voted to cancel model noise:
baseline tokdiet
input tokens 5.07M → 1.46M −71%
quality (66 tasks) 64/66 63/66 ≈ parity (95–97%)
─────────────────────────────────────────────────────────
198 paired runs · LLM-judge 92% similarity · confirmed on a 2nd model (MiniMax-M2.5: −72%)
−71% tokens, quality on par with baseline. Real requests, real grading — not a mock. The ~1–2 task gap is model nondeterminism plus the model declining to echo a secret — not context loss; the hardest "needle buried in junk" adversarial cases pass, because tokdiet doesn't delete blindly — it pages cold context out recoverably and protects anything on‑topic. Reproduce it yourself: node bench/run.mjs (needs an API key in env).
| shows your bill | cuts the bill | proves quality held | |
|---|---|---|---|
eyeballing /cost, ccusage |
✅ | ❌ | ❌ |
manual /compact, hand-pruning context |
❌ | ✅ (blind) | ❌ |
| tokdiet | ✅ | ✅ | ✅ measured + auto safe-mode |
Everyone shows the bill or cuts it blind. tokdiet is the one that cuts it and proves the model didn't get dumber — and stops cutting the moment it might.
# 1. Start the proxy (and live dashboard) — no install needed
npx tokdiet start
# 2. Point your agent at the proxy instead of the real API
export ANTHROPIC_BASE_URL=http://localhost:7787
export OPENAI_BASE_URL=http://localhost:7787/v1
Now run your agent (Claude Code, Cursor, Codex, your own script) as usual. Traffic flows through tokdiet, gets metered and compacted, and is forwarded upstream unchanged in every way that matters.
Your API key stays with you. tokdiet reads x-api-key / Authorization only to forward them upstream. They are never written to SQLite and never written to any log. And it's fail‑open: if anything inside the governor errors, it falls back to transparent passthrough — the proxy will never break your request or surface its own 5xx.
Default ports: proxy
7787, dashboard7878. Override with--port/--dashboard-port.
tokdiet ships as a Claude Code plugin via its own marketplace:
/plugin marketplace add agiwhitelist/tokdiet
/plugin install tokdiet
What the plugin does — and what it doesn't. The plugin ships a lightweight
metering hook plus a /tokdiet command. The hook runs on every tool call
(PreToolUse + PostToolUse) and logs tool I/O byte sizes to
~/.tokdiet/tool-meter.log. It does not save tokens by itself — a plugin
can't set ANTHROPIC_BASE_URL for the Claude Code process, so it can't route
your traffic through the compacting proxy.
The actual token savings come from the proxy. Start it and point Claude Code at it (this is what gives you the ~−71% token reduction):
npx tokdiet start
export ANTHROPIC_BASE_URL=http://localhost:7787 # then launch Claude Code from this shell
View metered tokens, cost, and savings any time with npx tokdiet report, or run
/tokdiet inside Claude Code for these instructions.
Claude Code is the flagship use case, and it has two landmines a naive compacting proxy walks straight into. tokdiet handles both:
cache_control; cached input costs ~10% of normal. Rewriting that prefix invalidates the cache and can make a request cost more. tokdiet is cache‑aware — it never touches content at or before a cache_control breakpoint.thinking blocks that Anthropic requires returned verbatim; touching one is an instant 400. tokdiet is thinking‑safe — signed/thinking blocks are never surfaced or mutated.Both are covered by regression tests (tests/cc-compat.test.ts).
A note on honesty: the dollar‑savings story applies to pay‑per‑token API keys (MiniMax, Anthropic API, OpenAI, …). On a flat Claude subscription there are no per‑token charges to cut, so the value there is metering, budgets, and the live dashboard — not dollars.
tokdiet is a streaming reverse proxy. SSE responses are proxied incrementally (never buffered whole), so your agent's tokens still stream in real time.
tokdiet (localhost:7787)
agent ─────────────────────────────────────────────────────────────► model API
(Claude request ┌───────────┐ ┌───────┐ ┌────────┐ ┌───────────┐ (Anthropic /
Code, ──────────► │interceptor│─►│ meter │─►│ budget │─►│ compactor │──► OpenAI /
Cursor, raw key └───────────┘ └───────┘ └────────┘ └─────┬─────┘ Gemini /
Codex, forwarded detect count session/ │ dedup / elision / MiniMax)
…) provider, tokens day / repo │ mid-summarize
keep body & cost limits ▼
byte-faithful ┌───────────────┐
response │ quality guard │
◄──────────────────────────────────────────────────────────┤ shadow-eval + │
streamed back, token-for-token │ safe-mode │
┌──────────────┐ └───────┬───────┘
│ store(SQLite)│◄──────────┘
│ + dashboard │ telemetry, savings, degradation
└──────────────┘
Blind compaction is "delete and pray." tokdiet treats your context like virtual memory: hot content (recent, pinned, relevant to the current question) stays resident; cold content (stale, redundant) is paged out to a local store as a recoverable stub — not deleted. The full block is kept in SQLite keyed by an id, so it can be audited and (roadmap) paged back in on demand when the model actually needs it.
| Mechanism | What it does |
|---|---|
| Shadow‑eval | Re‑runs a sampled fraction of compacted requests against the un‑compacted baseline and scores the divergence (0 = identical, 100 = unrelated). This is the measurement that answers "did quality drop?" |
| Quality budget | A hard ceiling on acceptable measured degradation (qualityBudget.maxDegradationPct, default 2%). As you approach it, the compactor restricts itself to its safest strategies. |
| Safe‑mode | If rolling degradation exceeds the budget, the offending strategy is disabled (per‑strategy) and a safe-mode event fires. Savings stop before quality does. |
KEY=VALUE, URLs, paths, numbers) and storing the full body for recovery. Recent, pinned, and question‑relevant results are kept intact.tokdiet <command> [flags] # alias: td
| Command | What it does | Key flags |
|---|---|---|
start |
Run the proxy + live dashboard | --port, --dashboard-port, --no-dashboard, --config <path> |
report |
Print a usage report (or export) | --since <days>, --json, --csv <file>, --config <path> |
init |
Scaffold tokdiet.config.json in the cwd |
--force |
install-claude-plugin |
Install an idempotent Claude Code metering |