tokdiet

Name: tokdiet
Author: agiwhitelist

Pending

Local streaming reverse proxy between AI coding agents (Claude Code, Cursor, Codex) and model APIs (Anthropic, OpenAI, Gemini, MiniMax). Meters every token + USD cost, compacts bloated context to cut pay-per-token API spend, and runs shadow-eval to prove quality held. ccusage-style metering + live local dashboard.

69stars

2forks

TypeScript

Installation

# Add to your Claude Code skills
git clone https://github.com/agiwhitelist/tokdiet

Getting Started

Guides for using ai agents skills like tokdiet.

README.md

Frequently Asked Questions

What is tokdiet?

tokdiet is an open-source ai agents skill for AI coding assistants such as Claude Code, Codex CLI, and ChatGPT, built by agiwhitelist. Local streaming reverse proxy between AI coding agents (Claude Code, Cursor, Codex) and model APIs (Anthropic, OpenAI, Gemini, MiniMax). Meters every token + USD cost, compacts bloated context to cut pay-per-token API spend, and runs shadow-eval to prove quality held. ccusage-style metering + live local dashboard. It has 69 GitHub stars.

Is tokdiet safe to use?

tokdiet's catalog security scan is still queued. You can run an instant dependency and prompt-injection check now with the "Scan for vulnerabilities" button above.

How do I install tokdiet?

Clone the repository with "git clone https://github.com/agiwhitelist/tokdiet" and add it to your Claude Code skills directory (see the Installation section above).

What programming language is tokdiet written in?

tokdiet is primarily written in TypeScript. It is open-source under agiwhitelist on GitHub, so you can review or fork the full source.

Are there alternatives to tokdiet?

Yes. SkillsLLM lists many other AI Agents skills you can browse and compare side by side. Open the AI Agents category from the badge at the top of this page, or use the Related Skills and comparison links further down to weigh tokdiet against similar tools.

Agentic AI for Beginners

Build your first AI agent from scratch - tool use, ReAct pattern, memory, deployment

41 minBeginner

Comments (0)

to leave a comment.

No comments yet. Be the first to share your thoughts!

Related Skills

superpowers

by obra

An agentic skills framework & software development methodology that works.

234,966

en-zh-translation-polish komi-learn

tokdiet

Your AI agent is paying to send the same file dump five times. tokdiet is a local proxy that sits between your agent and the model API, meters every token, puts your bloated context on a diet — and proves the answer didn't get worse.

ccusage that shrinks the bill — without losing quality.

tokdiet — −71% tokens, quality = baseline

🌐 Live demo (watch one request lose the weight): agiwhitelist.github.io/tokdiet 📝 Launch write-up + full benchmark methodology: I cut an AI agent's input tokens by 71% and quality held — here's the 66-task benchmark

The proof (this is the whole point)

Every "context optimizer" cuts tokens. The scary question is the one they can't answer:

"If I cut the context, does the model get dumber?"

So we measured it. A 66-task A/B benchmark across 6 categories on a real model (MiniMax‑M3), each task run twice — full context (baseline) vs through tokdiet (governed) — graded against the known answer, repeated ×3 and majority‑voted to cancel model noise:

                       baseline      tokdiet
  input tokens          5.07M    →    1.46M       −71%
  quality (66 tasks)     64/66        63/66        ≈ parity (95–97%)
  ─────────────────────────────────────────────────────────
  198 paired runs · LLM-judge 92% similarity · confirmed on a 2nd model (MiniMax-M2.5: −72%)

−71% tokens, quality on par with baseline. Real requests, real grading — not a mock. The ~1–2 task gap is model nondeterminism plus the model declining to echo a secret — not context loss; the hardest "needle buried in junk" adversarial cases pass, because tokdiet doesn't delete blindly — it pages cold context out recoverably and protects anything on‑topic. Reproduce it yourself: node bench/run.mjs (needs an API key in env).

How it compares

	shows your bill	cuts the bill	proves quality held
eyeballing `/cost`, ccusage	✅	❌	❌
manual `/compact`, hand-pruning context	❌	✅ (blind)	❌
tokdiet	✅	✅	✅ measured + auto safe-mode

Everyone shows the bill or cuts it blind. tokdiet is the one that cuts it and proves the model didn't get dumber — and stops cutting the moment it might.

Quick start

# 1. Start the proxy (and live dashboard) — no install needed
npx tokdiet start

# 2. Point your agent at the proxy instead of the real API
export ANTHROPIC_BASE_URL=http://localhost:7787
export OPENAI_BASE_URL=http://localhost:7787/v1

Now run your agent (Claude Code, Cursor, Codex, your own script) as usual. Traffic flows through tokdiet, gets metered and compacted, and is forwarded upstream unchanged in every way that matters.

Your API key stays with you. tokdiet reads x-api-key / Authorization only to forward them upstream. They are never written to SQLite and never written to any log. And it's fail‑open: if anything inside the governor errors, it falls back to transparent passthrough — the proxy will never break your request or surface its own 5xx.

Default ports: proxy 7787, dashboard 7878. Override with --port / --dashboard-port.

Install via Claude Code

tokdiet ships as a Claude Code plugin via its own marketplace:

/plugin marketplace add agiwhitelist/tokdiet
/plugin install tokdiet

What the plugin does — and what it doesn't. The plugin ships a lightweight metering hook plus a /tokdiet command. The hook runs on every tool call (PreToolUse + PostToolUse) and logs tool I/O byte sizes to ~/.tokdiet/tool-meter.log. It does not save tokens by itself — a plugin can't set ANTHROPIC_BASE_URL for the Claude Code process, so it can't route your traffic through the compacting proxy.

The actual token savings come from the proxy. Start it and point Claude Code at it (this is what gives you the ~−71% token reduction):

npx tokdiet start
export ANTHROPIC_BASE_URL=http://localhost:7787   # then launch Claude Code from this shell

View metered tokens, cost, and savings any time with npx tokdiet report, or run /tokdiet inside Claude Code for these instructions.

Works with Claude Code (and it's careful about it)

Claude Code is the flagship use case, and it has two landmines a naive compacting proxy walks straight into. tokdiet handles both:

Prompt caching. Claude Code marks a cached prefix with cache_control; cached input costs ~10% of normal. Rewriting that prefix invalidates the cache and can make a request cost more. tokdiet is cache‑aware — it never touches content at or before a cache_control breakpoint.
Extended thinking. Claude Code sends signed thinking blocks that Anthropic requires returned verbatim; touching one is an instant 400. tokdiet is thinking‑safe — signed/thinking blocks are never surfaced or mutated.

Both are covered by regression tests (tests/cc-compat.test.ts).

A note on honesty: the dollar‑savings story applies to pay‑per‑token API keys (MiniMax, Anthropic API, OpenAI, …). On a flat Claude subscription there are no per‑token charges to cut, so the value there is metering, budgets, and the live dashboard — not dollars.

How it works

tokdiet is a streaming reverse proxy. SSE responses are proxied incrementally (never buffered whole), so your agent's tokens still stream in real time.

                            tokdiet (localhost:7787)
   agent  ─────────────────────────────────────────────────────────────►  model API
 (Claude  request    ┌───────────┐  ┌───────┐  ┌────────┐  ┌───────────┐   (Anthropic /
  Code,  ──────────► │interceptor│─►│ meter │─►│ budget │─►│ compactor │──►   OpenAI /
  Cursor, raw key    └───────────┘  └───────┘  └────────┘  └─────┬─────┘      Gemini /
  Codex,  forwarded   detect          count      session/        │ dedup / elision /  MiniMax)
  …)                  provider,       tokens     day / repo      │ mid-summarize
                      keep body        & cost     limits          ▼
                      byte-faithful                          ┌───────────────┐
   response                                                  │ quality guard │
 ◄──────────────────────────────────────────────────────────┤ shadow-eval + │
   streamed back, token-for-token                            │  safe-mode    │
                                          ┌──────────────┐   └───────┬───────┘
                                          │ store(SQLite)│◄──────────┘
                                          │ + dashboard  │  telemetry, savings, degradation
                                          └──────────────┘

Context as virtual memory (the idea)

Blind compaction is "delete and pray." tokdiet treats your context like virtual memory: hot content (recent, pinned, relevant to the current question) stays resident; cold content (stale, redundant) is paged out to a local store as a recoverable stub — not deleted. The full block is kept in SQLite keyed by an id, so it can be audited and (roadmap) paged back in on demand when the model actually needs it.

The 3 quality mechanisms

Mechanism	What it does
Shadow‑eval	Re‑runs a sampled fraction of compacted requests against the un‑compacted baseline and scores the divergence (0 = identical, 100 = unrelated). This is the measurement that answers "did quality drop?"
Quality budget	A hard ceiling on acceptable measured degradation (`qualityBudget.maxDegradationPct`, default 2%). As you approach it, the compactor restricts itself to its safest strategies.
Safe‑mode	If rolling degradation exceeds the budget, the offending strategy is disabled (per‑strategy) and a `safe-mode` event fires. Savings stop before quality does.

Compaction strategies (safest‑first)

Dedup — loss‑free. When the same large block is re‑pasted across a conversation, keep the freshest copy verbatim and replace earlier copies with a pointer marker. Works on near‑duplicates too (a file re‑pasted with a few lines changed), not just byte‑identical ones.
Elision — recoverable. Page out the bulk of old tool results (file dumps, command output), keeping a preview plus the salient lines (errors, ids, KEY=VALUE, URLs, paths, numbers) and storing the full body for recovery. Recent, pinned, and question‑relevant results are kept intact.
Mid‑summarize (off by default) — summarize mid‑history with a cheap model. Opt‑in (it costs money).

Commands

tokdiet <command> [flags]   # alias: td

Command	What it does	Key flags
`start`	Run the proxy + live dashboard	`--port`, `--dashboard-port`, `--no-dashboard`, `--config <path>`
`report`	Print a usage report (or export)	`--since <days>`, `--json`, `--csv <file>`, `--config <path>`
`init`	Scaffold `tokdiet.config.json` in the cwd	`--force`
`install-claude-plugin`	Install an idempotent Claude Code metering