See what Claude Code and Codex actually send to the API — and what each part costs.
# Add to your Claude Code skills
git clone https://github.com/tigerless-labs/cost-xrayGuides for using cli tools skills like cost-xray.
Last scanned: 6/17/2026
{
"issues": [
{
"file": "README.md",
"line": 34,
"type": "remote-install",
"message": "Install command (remote install script piped to a shell — review the source before running): \"curl -fsSL https://raw.githubusercontent.com/tigerless-labs/cost-xray/master/ins\"",
"severity": "low"
}
],
"status": "PASSED",
"scannedAt": "2026-06-17T09:02:38.913Z",
"npmAuditRan": true,
"pipAuditRan": false,
"promptInjectionRan": true
}cost-xray is an open-source cli tools skill for AI coding assistants such as Claude Code, Codex CLI, and ChatGPT, built by tigerless-labs. See what Claude Code and Codex actually send to the API — and what each part costs. It has 111 GitHub stars.
Yes. cost-xray passed SkillsLLM's automated security scan — a dependency vulnerability audit plus prompt-injection heuristics — with no high-severity issues. You can read the full report in the Security Report section on this page.
Clone the repository with "git clone https://github.com/tigerless-labs/cost-xray" and add it to your Claude Code skills directory (see the Installation section above).
cost-xray is primarily written in Python. It is open-source under tigerless-labs on GitHub, so you can review or fork the full source.
Yes. SkillsLLM lists many other CLI Tools skills you can browse and compare side by side. Open the CLI Tools category from the badge at the top of this page, or use the Related Skills and comparison links further down to weigh cost-xray against similar tools.
No comments yet. Be the first to share your thoughts!
Top skills in this category by stars
Most usage tools read local logs. That shows the total cost of a call or session, but it misses the request-time context assembled before the model is invoked: system prompts, tool schemas, MCP blocks, tool results, cache reads/writes, and previous thinking blocks.
cost-xray captures the actual local API traffic for Claude Code and Codex, then attributes tokens and dollars back to the sources inside the request. It shows not just how much a turn cost, but why it cost that much.
curl -fsSL https://raw.githubusercontent.com/tigerless-labs/cost-xray/master/install.sh | bash
The installer asks which agent(s) to capture — Claude Code, Codex, or both — and prompts even under curl … | bash.
COST_XRAY_AGENTS=claude|codex|all../install.sh.Then open a new terminal and run claude / codex exactly as before — capture is automatic, no flags and no base-URL change. It's forward-only: runs started in that new shell are captured, not past history. Open the live TUI from anywhere:
cx
Capture runs as a background service (auto-start on boot, self-healing, port-adaptive) and doesn't change what your agent does, its results, or its cost — pause anytime with cx stop. See docs/install.md for systemd details, GUI agents (Cursor base-URL setup), manual (no-systemd) mode, and troubleshooting.
cx # open the live cost-xray TUI (from any directory)
cx status # services' state, live ports, sessions captured
cx stop # stop monitoring — proxies down; agents run direct (uncaptured)
cx start # resume monitoring
cx restart # restart the proxies (after a config change)
cx install # (re)install — authoritative: installs the chosen agents, removes the rest
cx uninstall # remove services + shell wrappers (keeps captured data)
cx is the whole CLI and works from any directory. Change port/upstream by editing ~/.cost-xray/env, then cx restart. (./run.sh <cmd> from the repo still works too — cx just calls it for you from anywhere.)
In the TUI, drill from the top down: agent → project → session → category → MCP server → tool → per-turn call → the real output. Every cell carries its cache split (read / write / fresh / output $).
| Agent | Status | Capture |
|---|---|---|
| Claude Code | Supported | reverse proxy (base-URL override) — no certificate |
| Codex | Supported | forward proxy + scoped local CA (self-healing wrapper) |
The wire is decoded by a thin per-agent adapter — the only place code forks by agent (docs/architecture.md; per-agent capture + tokenizer notes under docs/providers/). Adding an agent is one small module; anything speaking the Anthropic or OpenAI-Responses wire shape is close to drop-in.
cost-xray traces every token's cost — split into fresh / cache-read / cache-write / output — to the source that caused it, and down to the individual call: the cost of this Read invocation and its output, not a session sum. Everyone else aggregates — a request- or session-level total, at most grouped by tool type ("Read cost $X this session"). As far as we've found, nothing else prices below the tool, call by call.
What is taking space in the context window right now — system prompt, every tool schema, MCP servers, messages, and generated output — decomposed into source-level rows. Prompt caching makes a stable 40k-token schema block cheap on cache read, but it still crowds out the code and conversation that matter; cost-xray shows you the occupancy, not just the bill.
Configured servers and tools that are injected into every request's prefix but never actually called. They pay their tool-schema overhead on every turn — cost-xray flags the dead weight.
Claude's tokenizer is private — no official or open-source tokenizer exists — and calling Anthropic's count_tokens API for everything would add load and hit its limits. So for Claude, cost-xray uses an estimator plus proportional calibration: tiktoken sizes each part, the parts it mis-sizes most (thinking, tool schemas) get targeted corrections — pinned with count_tokens when you're logged in, a fixed ratio otherwise — and the rest is scaled so the calibrated total matches the provider's own usage. The total, and therefore the bill, is exact; only the split between sources in the same request is approximate. We benchmark those residuals continuously (CONTRIBUTING.md) and they're small enough for attribution work.
Coming soon: an opt-in exact mode — every part sized by full count_tokens differencing — manually enabled, for users who need maximum per-source precision.
The wrappers are per-command and self-healing: if the proxy is down, the wrapper restarts it and routes through; if it can't, the agent runs direct — never broken. Stop monitoring anytime with cx stop (agents then run direct).
cost-xray keeps the complete raw API traffic — but a long session re-sends its whole history every turn, so the capture is hugely repetitive. We deduplicate it: each unique block (message, schema, tool result) is stored once, with a tiny per-turn delta. Full per-turn bytes rebuild on demand. Disk stays small even across million-token sessions.
Log-based tools (ccusage, codeburn, and similar) are excellent at local-first session analytics: they read local transcripts, classify turns by tool usage, and price the session by model, day, or task. That answers how much did I spend. cost-xray answers the lower-level question: what bytes did the model actually receive, and which source owns those tokens?
The difference is the data source. Log readers see the transcript after the agent has run — but the system prompt, injected tool schemas, MCP schemas, reminders, and provider-added blocks are assembled at request time and never written to the transcript. In real coding-agent requests, that invisible prefix can be roughly half the context or more. cost-xray reads the raw API request, so it can compute source-level tokens and attribute cost to schemas, MCP servers, tools, and message buckets.
| Question | Log / usage tools | cost-xray wire capture |
|---|---|---|
| How much did the session cost? | Yes | Yes |
| Which task/tool was active? | Yes | Yes |
| System prompt and injected schemas visible? | No | Yes |
| How many tokens does each tool schema occupy? | No, schemas aren't in logs | Yes |
| Which MCP server is dead weight in the prefix? | Estimate | Exact |
| Cache read/write/fresh dollars per source/tool? | No, usage is request-level | Yes, by span and cache boundary |
| Live view of the current request window? | No | Yes |
cost-xray surfaces the data; you read the story. A few patterns worth knowing:
| Signal you see | What it might mean |
|---|---|
| A 40k-token MCP schema block on every turn | A configured server crowding the prefix — drill it to see if any tool was called |
| Cache-read dollars dwarf fresh input | Stable prefix is working; the spend is in what's new each turn |
| Repeated cache-write on the same source | The prefix is being rewritten — something upstream of it changed |
| Thinking tokens dominate output cost | Long reasoning turns; check whether they earned their keep |
| A tool's schema costs more than the tool is ever used | Candidate to drop from the agent's tool-set |
Big tool_result rows on Read/Bash |
Uncapped output bloating the window — cap it at the source |
These are starting points, not verdicts. One experimental session looking odd is fine; the same pattern across weeks of work is a config issue.
cost-xray runs mitmproxy as a local capture hop and keeps all analysis off the request path.
agent ──HTTP──▶ mitmproxy ──HTTPS──▶ model API
│
└── redacted raw request/response → ~/.c