llmtrim

Name: llmtrim
Author: fkiene

Pending

Local proxy that compresses your LLM API requests so you pay less, with no change to the answers. Trims wasted tokens from prompts, history, tool output, and code before they're sent: -31% input / -74% output, measured live. Any provider, no extra model calls. Also an MCP server and embeddable library (Rust, Python, Ruby, Kotlin, Swift, JS/TS).

85stars

5forks

Rust

Installation

# Add to your Claude Code skills
git clone https://github.com/fkiene/llmtrim

Getting Started

Guides for using ai agents skills like llmtrim.

README.md

Frequently Asked Questions

What is llmtrim?

llmtrim is an open-source ai agents skill for AI coding assistants such as Claude Code, Codex CLI, and ChatGPT, built by fkiene. Local proxy that compresses your LLM API requests so you pay less, with no change to the answers. Trims wasted tokens from prompts, history, tool output, and code before they're sent: -31% input / -74% output, measured live. Any provider, no extra model calls. Also an MCP server and embeddable library (Rust, Python, Ruby, Kotlin, Swift, JS/TS). It has 85 GitHub stars.

Is llmtrim safe to use?

llmtrim's catalog security scan is still queued. You can run an instant dependency and prompt-injection check now with the "Scan for vulnerabilities" button above.

How do I install llmtrim?

Clone the repository with "git clone https://github.com/fkiene/llmtrim" and add it to your Claude Code skills directory (see the Installation section above).

What programming language is llmtrim written in?

llmtrim is primarily written in Rust. It is open-source under fkiene on GitHub, so you can review or fork the full source.

Are there alternatives to llmtrim?

Yes. SkillsLLM lists many other AI Agents skills you can browse and compare side by side. Open the AI Agents category from the badge at the top of this page, or use the Related Skills and comparison links further down to weigh llmtrim against similar tools.

Agentic AI for Beginners

Build your first AI agent from scratch - tool use, ReAct pattern, memory, deployment

41 minBeginner

Comments (0)

to leave a comment.

No comments yet. Be the first to share your thoughts!

Related Skills

superpowers

by obra

An agentic skills framework & software development methodology that works.

234,966

alpaca-skills prompt-refine-skill

What it actually does

You run Claude Code, Codex, Cursor, or your own app. Every time it talks to an LLM, it sends a big blob of text: your system prompt, the tool definitions, the whole conversation history, and the raw output of every command it ran. You pay for every one of those tokens, on every single turn.

A lot of that text is waste. A 200-line build log where only 2 lines are errors. A tool schema resent identically 50 times. A JSON array with 500 near-identical rows. The model doesn't need the bulk of it to answer well, but you're billed for all of it.

llmtrim removes the waste before it's sent. It installs as a local proxy that sits between your tool and the LLM provider. Requests pass through it, get compressed, and continue to the provider. The reply comes back unchanged. Your tool doesn't know it's there; you just get a smaller bill.

  before:  your tool ───── full request ─────▶  OpenAI / Anthropic / …
                    ◀──────── reply ──────────

  after:   your tool ──▶ llmtrim ──smaller──▶  OpenAI / Anthropic / …
                            (on your machine)
                    ◀──────── reply ──────────  (same answer)

[!IMPORTANT] It can never make your bill bigger or break a request. Every compression step is re-measured with the provider's real tokenizer; if a step doesn't actually save tokens, it's reverted. If the provider rejects the compressed request, the original is resent verbatim. Worst case is zero savings, never a worse outcome.

Everything runs locally. Nothing is ever sent to us.

See it on real output

Here's one real thing llmtrim does, end to end. An AI agent ran a build, and the bash tool returned a 58-line log. Only two lines matter (the errors), but all 58 get sent to the model and billed.

Before, what the model would receive (58 lines, 4,662 chars):

[2026-06-13T10:02:00Z] INFO  compiling module core::worker::task_0 (incremental)
[2026-06-13T10:02:01Z] INFO  compiling module core::worker::task_1 (incremental)
[2026-06-13T10:02:02Z] INFO  compiling module core::worker::task_2 (incremental)
... 27 more near-identical INFO lines ...
[2026-06-13T10:02:31Z] ERROR src/worker/pool.rs:214: mismatched types: expected `usize`, found `i64`
... 25 more INFO lines ...
[2026-06-13T10:03:01Z] ERROR src/net/conn.rs:88: cannot borrow `buf` as mutable more than once
[2026-06-13T10:03:02Z] INFO  build failed, 2 errors

After, what llmtrim sends instead (5 lines, 978 chars, −79%):

[{}] INFO compiling module core::worker::task_{} (incremental) [×30: (10:02:00Z..10:02:29Z step 1s; 0..29)]
[2026-06-13T10:02:31Z] ERROR src/worker/pool.rs:214: mismatched types: expected `usize`, found `i64`
[{}] INFO compiling module core::net::conn_{} (incremental) [×25: 10:02:32Z..10:02:56Z; 0..24]
[2026-06-13T10:03:01Z] ERROR src/net/conn.rs:88: cannot borrow `buf` as mutable more than once
[2026-06-13T10:03:02Z] INFO  build failed, 2 errors

Both errors and the summary survive verbatim. The repetitive INFO lines fold into a template plus their values, losslessly, because the range is regular (task_0..task_29). The model still sees exactly what happened; it just costs a fifth as much.

If that's useful to you, a ⭐ helps other people find it.

Try it yourself on any request body:

echo '{"model":"gpt-4o","messages":[...]}' | llmtrim compress --provider openai

Log-folding is just one of ten compressors. A different one re-encodes bulky JSON arrays into a compact table, with the same data in a third of the tokens:

before:  [{"id":1,"city":"Paris","ok":true},{"id":2,"city":"Lyon","ok":false}, … 200 rows]
after:   [200]{id,city,ok}: 1,Paris,true; 2,Lyon,false; …          (TOON encoding, lossless)

Each compressor fires only where it pays:

Where the waste is	What llmtrim does
Tool output (build logs, diffs, grep dumps, big JSON)	Keep the signal (errors, changes, matches), fold the noise
Long context (pasted docs, history)	Rank and keep the chunks relevant to the question; drop the rest
Source code	Keep the bodies of relevant functions, reduce the rest to signatures
Tool schemas (resent every turn)	Trim descriptions, drop unused tools, keep the cache prefix stable
JSON / record arrays	Re-encode to a compact table format, sample huge arrays
The model's reply	Ask for terser output where it won't hurt the answer

Stages run in savings order. Nothing under a cache_control marker is ever rewritten.

Stage	What it does	When it runs
tool-output	Lossless template fold first, then window logs · diffs · grep · dumps down to errors / changes / matches	tool results
cache discipline	Mark + stabilize the invariant prefix (sort tools/schema · OpenAI `prompt_cache_key`) so it stays cached	tools
lexical retrieval	BM25+ ranking with RM3 feedback · TextTiling topic cuts · budgeted non-redundant selection; question protected	long context
skeletonization	tree-sitter keeps relevant function bodies, drops the rest to signatures (14 languages)	code
serialize + hygiene	Minify JSON, encode record arrays to TOON or CSV, Unicode-normalize	always · lossless
json sample	Down-sample huge record arrays: first/last + outliers + a query-biased diverse sample	big JSON
dedup	Collapse duplicate + near-duplicate lines (prose only)	always
output control	Terse instruction · Chain-of-Draft · token budget · native JSON schema	auto
tool layer	Static tool selection + description trimming	tools
multimodal	Downscale images to the provider's resolution cap	images

Default auto switches each stage on only where it pays. safe runs the lossless stages only. Full config →

Get started

[!NOTE] Works with any tool that routes through HTTPS_PROXY: Claude Code, Codex, Cursor, Aider, your own app. GitHub Copilot pins its certificates and can't be intercepted (full list).

# 1. Install (any OS, prebuilt binary, no Rust needed)
npm install -g @llmtrim/cli@latest && llmtrim setup

# 2. Open a new shell. Your AI tools now route through llmtrim automatically.

# 3. Watch the savings add up as you work
llmtrim status --watch

No Node? Use an installer instead:

# Linux / macOS
curl -fsSL https://raw.githubusercontent.com/fkiene/llmtrim/main/install.sh | sh

# Windows (PowerShell)
irm https://raw.githubusercontent.com/fkiene/llmtrim/main/install.ps1 | iex

Or your own package manager, same binary everywhere: brew install fkiene/tap/llmtrim · cargo binstall llmtrim · scoop install llmtrim · docker run ghcr.io/fkiene/llmtrim. Full options in INSTALL.md.

Is this safe to install?

setup is a local HTTPS proxy, the same technique as mitmproxy, scoped to LLM APIs. It changes exactly three things (a CA certificate in ~/.llmtrim/, a proxy setting in your shell profile, a login service), and llmtrim uninstall reverses all three. No API keys are stored (it forwards your tool's own auth), and your prompts never touch disk; only an anonymous count of tokens saved is kept. Full threat model: SECURITY.md.