Caveman: The Claude Code Skill That Cuts Token Use by 65%

If you have ever watched a Claude Code session blow through your monthly budget, you have probably wondered if all those polite preambles, hedging phrases, and structured explanations are really worth the cost. The answer, according to a Claude Code skill called Caveman, is no.

Caveman is a Claude Code skill that compresses model output by up to 65 percent without losing the technical content. It does this with one of the more memorable design choices in the AI tools space: it tells Claude to speak like a caveman.

The numbers

A standard 69-token explanation of a function turns into a 19-token caveman version. Same information. Same accuracy. Three and a half times fewer tokens.

For a single message that is rounding error. For an agent that loops 20 times across a long-running task, that is the difference between a five-cent and a one-cent run. Multiply by ten thousand sessions a month and the math gets serious.

The math is also asymmetric. Output tokens cost two to five times more than input tokens on every major provider. Cutting output is the highest-leverage place to attack token spend.

Four modes

Caveman ships with four intensity levels:

Lite: trims redundant phrasing, keeps natural sentence structure. Roughly 30 percent reduction. Use this for code reviews and pull request comments where readability still matters.
Full: drops articles and connectives, keeps the technical core. About 65 percent reduction. The default mode for most agentic work.
Ultra: maximum compression, nearly telegraphic. 75-80 percent reduction. Use for internal pipelines where the model is talking to itself.
文言文 (Classical Chinese): the most extreme mode, drawing on the famously dense classical literary register. Genuinely useful when latency matters more than human readability.

The skill also includes specialized commands: caveman-commit produces terse commit messages, caveman-review returns one-line code review comments, and caveman-compress reduces input tokens by approximately 46 percent before sending to the model.

Why it works

The technique is not magic. It is just disciplined prompt design. Claude's default voice was tuned for human-facing conversation: warm, hedged, structured. That voice is wrong for most agentic use cases where another program reads the output. Caveman gives the model permission to drop the warmth and keep the substance.

The skill teaches a broader lesson about working with LLMs at scale. The model adopts the tone you ask for. If you ask for "professional and helpful", you get verbose. If you ask for "minimal and information-dense", you get tight. Most token bloat comes from default tone settings nobody intentionally chose.

When NOT to use Caveman

Caveman is the right tool for agent-to-agent communication, internal pipelines, batch processing, and anything where output is consumed by code. It is the wrong tool when output is consumed by humans.

Customer-facing chatbots should not respond in caveman voice. Documentation generation should not use it. Anything that ends up in a Slack message or an email needs the full register.

The pattern that works in production: use Caveman for the agent's internal reasoning and tool-call arguments, then have a final formatting step that translates the result into normal English for the user. Best of both worlds: cheap reasoning, polished delivery.

What this means for your stack

If you ship an LLM feature that runs at any meaningful volume, audit one week of your output. Look at how much of the token spend is on tone and structure that nobody actually reads. The number will surprise you. Caveman is one way to attack that bloat. Custom system prompts that explicitly request terse output is another.

Both are cheaper than scaling your inference budget linearly with traffic.

Get started

The Caveman skill is open source on GitHub at JuliusBrussee/caveman and installable in any Claude Code workspace. Browse the skill page on SkillsLLM for installation instructions, security scan results, and current statistics.

If reading about token economics made you want to actually build agents that use these techniques, our Agentic AI for Beginners course covers the loop, tool use, memory patterns, and deployment in 41 minutes. Caveman cuts your token bill. Building good agents is what makes the savings matter.