by fkiene
Local proxy that compresses your LLM API requests so you pay less, with no change to the answers. Trims wasted tokens from prompts, history, tool output, and code before they're sent: -31% input / -74% output, measured live. Any provider, no extra model calls. Also an MCP server and embeddable library (Rust, Python, Ruby, Kotlin, Swift, JS/TS).
# Add to your Claude Code skills
git clone https://github.com/fkiene/llmtrimllmtrim is an open-source ai agents skill for AI coding assistants such as Claude Code, Codex CLI, and ChatGPT, built by fkiene. Local proxy that compresses your LLM API requests so you pay less, with no change to the answers. Trims wasted tokens from prompts, history, tool output, and code before they're sent: -31% input / -74% output, measured live. Any provider, no extra model calls. Also an MCP server and embeddable library (Rust, Python, Ruby, Kotlin, Swift, JS/TS). It has 85 GitHub stars.
llmtrim's catalog security scan is still queued. You can run an instant dependency and prompt-injection check now with the "Scan for vulnerabilities" button above.
Clone the repository with "git clone https://github.com/fkiene/llmtrim" and add it to your Claude Code skills directory (see the Installation section above).
llmtrim is primarily written in Rust. It is open-source under fkiene on GitHub, so you can review or fork the full source.
Yes. SkillsLLM lists many other AI Agents skills you can browse and compare side by side. Open the AI Agents category from the badge at the top of this page, or use the Related Skills and comparison links further down to weigh llmtrim against similar tools.
No comments yet. Be the first to share your thoughts!
Unlocks once the catalog security scan passes (runs nightly).
The deep catalog scan for this skill is still queued. Run an instant dependency check now instead.
You run Claude Code, Codex, Cursor, or your own app. Every time it talks to an LLM, it sends a big blob of text: your system prompt, the tool definitions, the whole conversation history, and the raw output of every command it ran. You pay for every one of those tokens, on every single turn.
A lot of that text is waste. A 200-line build log where only 2 lines are errors. A tool schema resent identically 50 times. A JSON array with 500 near-identical rows. The model doesn't need the bulk of it to answer well, but you're billed for all of it.
llmtrim removes the waste before it's sent. It installs as a local proxy that sits between your tool and the LLM provider. Requests pass through it, get compressed, and continue to the provider. The reply comes back unchanged. Your tool doesn't know it's there; you just get a smaller bill.
before: your tool ───── full request ─────▶ OpenAI / Anthropic / …
◀──────── reply ──────────
after: your tool ──▶ llmtrim ──smaller──▶ OpenAI / Anthropic / …
(on your machine)
◀──────── reply ────────── (same answer)
[!IMPORTANT] It can never make your bill bigger or break a request. Every compression step is re-measured with the provider's real tokenizer; if a step doesn't actually save tokens, it's reverted. If the provider rejects the compressed request, the original is resent verbatim. Worst case is zero savings, never a worse outcome.
Everything runs locally. Nothing is ever sent to us.
Here's one real thing llmtrim does, end to end. An AI agent ran a build, and the bash tool returned a 58-line log. Only two lines matter (the errors), but all 58 get sent to the model and billed.
Before, what the model would receive (58 lines, 4,662 chars):
[2026-06-13T10:02:00Z] INFO compiling module core::worker::task_0 (incremental)
[2026-06-13T10:02:01Z] INFO compiling module core::worker::task_1 (incremental)
[2026-06-13T10:02:02Z] INFO compiling module core::worker::task_2 (incremental)
... 27 more near-identical INFO lines ...
[2026-06-13T10:02:31Z] ERROR src/worker/pool.rs:214: mismatched types: expected `usize`, found `i64`
... 25 more INFO lines ...
[2026-06-13T10:03:01Z] ERROR src/net/conn.rs:88: cannot borrow `buf` as mutable more than once
[2026-06-13T10:03:02Z] INFO build failed, 2 errors
After, what llmtrim sends instead (5 lines, 978 chars, −79%):
[{}] INFO compiling module core::worker::task_{} (incremental) [×30: (10:02:00Z..10:02:29Z step 1s; 0..29)]
[2026-06-13T10:02:31Z] ERROR src/worker/pool.rs:214: mismatched types: expected `usize`, found `i64`
[{}] INFO compiling module core::net::conn_{} (incremental) [×25: 10:02:32Z..10:02:56Z; 0..24]
[2026-06-13T10:03:01Z] ERROR src/net/conn.rs:88: cannot borrow `buf` as mutable more than once
[2026-06-13T10:03:02Z] INFO build failed, 2 errors
Both errors and the summary survive verbatim. The repetitive INFO lines fold into a template plus their values, losslessly, because the range is regular (task_0..task_29). The model still sees exactly what happened; it just costs a fifth as much.
If that's useful to you, a ⭐ helps other people find it.
Try it yourself on any request body:
echo '{"model":"gpt-4o","messages":[...]}' | llmtrim compress --provider openai
Log-folding is just one of ten compressors. A different one re-encodes bulky JSON arrays into a compact table, with the same data in a third of the tokens:
before: [{"id":1,"city":"Paris","ok":true},{"id":2,"city":"Lyon","ok":false}, … 200 rows]
after: [200]{id,city,ok}: 1,Paris,true; 2,Lyon,false; … (TOON encoding, lossless)
Each compressor fires only where it pays:
| Where the waste is | What llmtrim does |
|---|---|
| Tool output (build logs, diffs, grep dumps, big JSON) | Keep the signal (errors, changes, matches), fold the noise |
| Long context (pasted docs, history) | Rank and keep the chunks relevant to the question; drop the rest |
| Source code | Keep the bodies of relevant functions, reduce the rest to signatures |
| Tool schemas (resent every turn) | Trim descriptions, drop unused tools, keep the cache prefix stable |
| JSON / record arrays | Re-encode to a compact table format, sample huge arrays |
| The model's reply | Ask for terser output where it won't hurt the answer |
Stages run in savings order. Nothing under a cache_control marker is ever rewritten.
| Stage | What it does | When it runs |
|---|---|---|
| tool-output | Lossless template fold first, then window logs · diffs · grep · dumps down to errors / changes / matches | tool results |
| cache discipline | Mark + stabilize the invariant prefix (sort tools/schema · OpenAI prompt_cache_key) so it stays cached |
tools |
| lexical retrieval | BM25+ ranking with RM3 feedback · TextTiling topic cuts · budgeted non-redundant selection; question protected | long context |
| skeletonization | tree-sitter keeps relevant function bodies, drops the rest to signatures (14 languages) | code |
| serialize + hygiene | Minify JSON, encode record arrays to TOON or CSV, Unicode-normalize | always · lossless |
| json sample | Down-sample huge record arrays: first/last + outliers + a query-biased diverse sample | big JSON |
| dedup | Collapse duplicate + near-duplicate lines (prose only) | always |
| output control | Terse instruction · Chain-of-Draft · token budget · native JSON schema | auto |
| tool layer | Static tool selection + description trimming | tools |
| multimodal | Downscale images to the provider's resolution cap | images |
Default auto switches each stage on only where it pays. safe runs the lossless stages only. Full config →
[!NOTE] Works with any tool that routes through
HTTPS_PROXY: Claude Code, Codex, Cursor, Aider, your own app. GitHub Copilot pins its certificates and can't be intercepted (full list).
# 1. Install (any OS, prebuilt binary, no Rust needed)
npm install -g @llmtrim/cli@latest && llmtrim setup
# 2. Open a new shell. Your AI tools now route through llmtrim automatically.
# 3. Watch the savings add up as you work
llmtrim status --watch
No Node? Use an installer instead:
# Linux / macOS
curl -fsSL https://raw.githubusercontent.com/fkiene/llmtrim/main/install.sh | sh
# Windows (PowerShell)
irm https://raw.githubusercontent.com/fkiene/llmtrim/main/install.ps1 | iex
Or your own package manager, same binary everywhere: brew install fkiene/tap/llmtrim · cargo binstall llmtrim · scoop install llmtrim · docker run ghcr.io/fkiene/llmtrim. Full options in INSTALL.md.
setup is a local HTTPS proxy, the same technique as mitmproxy, scoped to LLM APIs. It changes exactly three things (a CA certificate in ~/.llmtrim/, a proxy setting in your shell profile, a login service), and llmtrim uninstall reverses all three. No API keys are stored (it forwards your tool's own auth), and your prompts never touch disk; only an anonymous count of tokens saved is kept. Full threat model: SECURITY.md.