Hook-based token compressor for 5 AI CLI hosts (Claude Code, Copilot CLI, OpenCode, Gemini CLI, Codex CLI). Up to 95% bash compression, signature-mode for code reads, cross-call dedup, MCP server, self-teaching protocol. Zero runtime deps.
# Add to your Claude Code skills
git clone https://github.com/claudioemmanuel/squeezEnd-to-end token optimizer for Claude Code, GitHub Copilot CLI, OpenCode, Gemini CLI, and OpenAI Codex CLI. Compresses bash output up to 95%, collapses redundant calls, and injects a terse prompt persona — automatically, with zero new runtime dependencies.
Three methods — all produce the same result (binary at ~/.claude/squeez/bin/squeez, hooks registered).
curl -fsSL https://raw.githubusercontent.com/claudioemmanuel/squeez/main/install.sh | sh
Windows: requires Git Bash. Run the command above inside Git Bash — PowerShell/CMD are not supported.
# Install globally
npm install -g squeez
# Or run once without installing
npx squeez
Downloads the correct pre-built binary for your platform (macOS universal, Linux x86_64/aarch64, Windows x86_64). Requires Node ≥ 16.
No comments yet. Be the first to share your thoughts!
cargo install squeez
Builds from crates.io. Requires Rust stable. On Windows you also need MSVC C++ Build Tools.
squeez setup auto-detects every CLI present on disk and registers the hooks. squeez uninstall removes them. Session data and config.ini are preserved so reinstall is lossless.
| Host | Memory file | Bash wrap | Session memory | Budget inject (Read/Grep) | Notes |
|---|---|---|---|---|---|
| Claude Code | ~/.claude/CLAUDE.md | ✅ native | ✅ native | ✅ native | Restart Claude Code to pick up hooks |
| Copilot CLI | ~/.copilot/copilot-instructions.md | ✅ native | ✅ native | ✅ native | Restart Copilot CLI after setup |
| OpenCode | ~/.config/opencode/AGENTS.md | ✅ native | ✅ native | ✅ native | Plugin at ~/.config/opencode/plugins/squeez.js; MCP tool calls skip hooks (upstream sst/opencode#2319) |
| Gemini CLI | ~/.gemini/GEMINI.md | ✅ native | ✅ native | 🟡 soft via GEMINI.md | BeforeTool rewrite schema pending upstream docs (google-gemini/gemini-cli#25629) |
| Codex CLI | ~/.codex/AGENTS.md | ✅ native | ✅ native | 🟡 soft via AGENTS.md | Codex PreToolUse is Bash-only until upstream expands (openai/codex#18491) |
squeez setup # register into every detected host
squeez setup --host=<slug> # register into one host
squeez uninstall # remove squeez entries from every detected host
squeez uninstall --host=<slug>
Slugs: claude-code / copilot / opencode / gemini / codex.
After install, restart the CLI you use to pick up the new hooks.
squeez uninstall # preserves session data + config.ini
bash ~/.claude/squeez/uninstall.sh # (legacy) full wipe, if the script exists
squeez update # download latest binary + verify SHA256
squeez update --check # check for update without installing
squeez update --insecure # skip checksum (not recommended)
| Feature | Description |
|---------|-------------|
| Bash compression | Intercepts every command via PreToolUse hook, applies smart filter → dedup → grouping → truncation. Up to 95% reduction. |
| Context engine | Cross-call redundancy with two paths: exact-hash match (FNV-1a, fast) and fuzzy trigram-shingle Jaccard ≥0.85 (whitespace, timestamps, single-line edits no longer defeat dedup). |
| Summarize fallback | Outputs exceeding 500 lines are replaced with a ≤40-line dense summary (top errors, files, test result, tail). Benign outputs get 2× the threshold so successful builds stay verbatim. |
| Adaptive intensity | Truly adaptive: Full (×0.6 limits) below 80% of token budget, Ultra (×0.3) above. Used to be always-Ultra; now actually responds to session pressure. |
| MCP server | squeez mcp runs a JSON-RPC 2.0 server over stdio exposing 13 read-only tools so any MCP-compatible LLM can query session memory directly. Hand-rolled, no mcp.server dependency. |
| Auto-teach payload | squeez protocol (or the squeez_protocol MCP tool) prints a 2.4 KB self-describing payload — the LLM learns squeez's markers and protocol on first call. |
| Caveman persona | Injects an ultra-terse prompt at session start so the model responds with fewer tokens. |
| Memory-file compression | squeez compress-md compresses CLAUDE.md / AGENTS.md / copilot-instructions.md in-place — pure Rust, zero LLM. i18n-aware: set lang = pt (or --lang pt) for pt-BR article/filler/phrase dropping and Unicode-correct matching. |
| Session memory | On SessionStart, injects a structured summary of the previous session: files investigated, learned facts (errors + git events), completed work (builds, test passes), and next steps (unresolved errors, failing tests). Summaries carry temporal validity (valid_from/valid_to). |
| Token tracking | Every PostToolUse result (Bash, Read, Grep, Glob) feeds a SessionContext so squeez knows what the agent has already seen. |
| Token economy | Sub-agent cost tracking (~200K tokens/spawn), burn rate prediction ([budget: ~N calls left]), session efficiency scoring, tool result size budgets. |
| Auto-calibration | squeez calibrate runs benchmarks on install and generates an optimized config.ini (aggressive / balanced / conservative profiles). |
squeez optimizes what it can reach — the surfaces exposed by each host's hook API. It cannot fix token leaks outside those surfaces.
| Surface | How | When | Supported hosts |
|---|---|---|---|
| Bash stdout/stderr | PreToolUse wraps command w/ 4-stage pipeline (smart-filter → dedup → grouping → truncation) | Every Bash invocation | all 5 |
| Read / Grep / Glob limits | PreToolUse injects limit / head_limit per read_max_lines / grep_max_results | Every Read/Grep/Glob call | Claude Code, Copilot, OpenCode (hard); Gemini + Codex soft via GEMINI.md / AGENTS.md |
| Agent / Task prompt | PreToolUse compresses tool_input.prompt (markdown-aware, via compress-prompt) | When prompt > agent_prompt_max_tokens | Claude Code (post–v1.8.0) |
| Session memory | SessionStart injects prior session summary + file-access cache | Once per session start | all 5 |
| Markdown viewing | Bash handler routes .md reads through compress-md when auto_compress_md=true | Viewer commands on .md paths | all 5 |
Agent/Task returned output. PostToolUse hooks are observation-only — squeez sees the result after the model has received it. No hook API surface exists to rewrite an Agent's return value. Workaround: keep agent prompts compact (squeez compresses at dispatch time), and use squeez_agent_costs MCP tool to monitor spawn overhead.
Skills & slash-command files. Claude Code loads these into the system prompt before any hook fires. squeez has no visibility into session-start system prompt construction.
User's top-level prompt. squeez runs per tool call, not on user turns.
Tools whose host doesn't expose PreToolUse / BeforeTool. E.g. Codex PreToolUse today only fires on Bash (tracked upstream at openai/codex#18491), so Read/Grep caps for Codex are soft hints in AGENTS.md, not hard injections.
[budget: ~N calls left] nudges so the user changes behavior before context pressure spikessqueez cannot automate these, but you can:
squeez_agent_costs to track, then refactor tasks to batch worksqueez compress-md --ultra to drop abbreviations and fillerMeasured on macOS (Apple Silicon). Token count = chars / 4 (matches Claude's ~4 chars/token). Run squeez benchmark to reproduce.
| Scenario | Before | After | Reduction | Latency |
|----------|--------|-------|-----------|---------|
| summarize_huge | 82,257 tk | 420 tk | -99% | 55.7 ms |
| repetitive_output | 4,692 tk | 37 tk | -99% | 211 µs |
| high_context_adaptive | 4,418 tk | 52 tk | -99% | 791 µs |
| ps_aux | 40,373 tk | 2,352 tk | -94% | 2.7 ms |
| git_log_200 | 2,692 tk | 289 tk | -89% | 218 µs |
| tsc_errors | 731 tk | 101 tk | -86% | 30 µs |
| cargo_build_noisy | 2,106 tk | 452 tk | -79% | 247 µs |
| docker_logs | 665 tk | 186 tk | -72% | 49 µs |
| find_deep | 424 tk | 134 tk | -68% | 83 µs |