by elara-labs
Save 94% on Claude Code tokens. Index your codebase, AI searches instead of reading files. Local MCP server. Free, open source.
# Add to your Claude Code skills
git clone https://github.com/elara-labs/code-context-engineGuides for using mcp servers skills like code-context-engine.
| | Use case | How CCE helps | |---|---|---| | ๐ฐ | Reduce Claude Code costs | 94% fewer input tokens per session | | ๐ | Keep code private | Everything local, no cloud indexing | | ๐ | Multi-editor teams | One index across Claude Code, Cursor, VS Code, Gemini CLI | | ๐ง | Cross-session memory | Decisions and context survive restarts | | โก | Faster responses | Less context = faster Claude replies | | ๐ | Track actual savings | Dollar amounts, not estimates |
uv tool install code-context-engine
cd /path/to/your/project
cce init
That's it. Claude now searches your index instead of reading entire files. No config needed.
cmake (needed to build tree-sitter grammars)| Platform | Setup |
|----------|-------|
| macOS | xcode-select --install (provides compiler and cmake) |
| Ubuntu/Debian | sudo apt install build-essential cmake |
| Fedora/RHEL | sudo dnf install gcc gcc-c++ cmake |
| Windows | Install Visual Studio Build Tools (C++ workload) and CMake |
Tested on all three platforms in CI (macOS, Linux, Windows ร Python 3.11/3.12/3.13).
uv tool install code-context-engine # or: pipx install code-context-engine
cd /path/to/your/project
cce init # index, install hooks, register MCP server
Embedding backends: CCE auto-detects the best available backend. If you have Ollama running, it uses nomic-embed-text with zero extra dependencies. For offline/local embedding without Ollama, install the [local] extra:
uv tool install "code-context-engine[local]" # includes fastembed + ONNX Runtime
Restart your editor. Done. Every question now hits the index instead of re-reading files.
cce init auto-detects your editor and writes the right config:
| Editor | Config written | Instructions |
|--------|---------------|--------------|
| Claude Code | .mcp.json | CLAUDE.md |
| VS Code / Copilot | .vscode/mcp.json | |
| Cursor | .cursor/mcp.json | .cursorrules |
| Gemini CLI | .gemini/settings.json | GEMINI.md |
| OpenAI Codex | ~/.codex/config.toml (user-global, per-project section) | |
| OpenCode | opencode.json | |
| Tabnine | .tabnine/agent/settings.json | TABNINE.md |
Multiple editors in the same project? All get configured in one command.
Codex note: Codex CLI reads MCP servers from ~/.codex/config.toml only โ it has no per-project config. cce init adds one [mcp_servers.cce-<project>-<hash>] section per project so multiple projects coexist; cce uninstall removes only the section for the current project.
my-project ยท 38 queries
โ โ โ โถ โถ โถ โถ โถ โถ โถ 94% tokens saved
Without CCE 48.0k tokens $0.14
With CCE 3.4k tokens $0.01
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
Saved 44.6k tokens $0.13
Cost estimate based on Sonnet input pricing ($3/1M tokens)
Input tokens are 85-95% of your Claude Code bill. CCE cuts them by 94% (benchmarked on FastAPI).
Without CCE: Claude reads payments.py + shipping.py = 45,000 tokens
With CCE: context_search "payment flow" = 800 tokens
| | Without CCE | With CCE | |---|---|---| | Session startup | Re-reads files every time | Queries the index | | Finding a function | Read entire 800-line file | Get the 40-line function | | Cross-session memory | None | Decisions + code areas persisted | | Token cost (Sonnet, medium project) | ~$0.14/session | ~$0.04/session |
We benchmarked CCE against FastAPI (53 source files, 180K tokens) with 20 real coding questions. No cherry-picking, no synthetic queries.
Methodology: For each query, "without CCE" means reading the full content of every file the query touches. "With CCE" means the relevant chunks after compression.
Important baseline note: The 94% number is measured against full-file reads, not against what Claude Code actually does. In practice, Claude Code already uses grep, partial file reads, and targeted tools, so the real-world savings compared to normal Claude Code behavior will be lower than 94%. We use full-file as the baseline because it's reproducible and deterministic (no agent behavior variability). The benchmark measures CCE's retrieval efficiency, not a head-to-head comparison with Claude Code's built-in exploration.
| Metric | Result | |--------|--------| | Retrieval savings | 94% (83,681 โ 4,927 tokens/query) | | Compression (additional, on retrieved chunks) | 89% (4,927 โ 523 tokens/query) | | Recall@10 (found the right files) | 0.90 | | Latency p50 | 0.4ms | | Queries tested | 20 |
| Layer | What it does | Savings | Method | |-------|-------------|---------|--------| | Retrieval | Full files โ relevant code chunks | 94% | measured | | Chunk Compression | Raw chunks โ signatures + docstrings | 89% | measured | | Grammar | Drops articles/fillers from memory text | 13% | measured |
Output compression (reducing Claude's reply length) provides additional savings (~65% estimated) but is not included in the headline number above.
| Repo | Language | Files | Retrieval savings | Recall@10 | |------|----------|-------|-------------------|-----------| | FastAPI | Python | 53 | 94% | 0.90 | | chi | Go | 94 | 76% | 0.67 | | fiber | Go (monorepo) | 396 | 93% | 0.07 |
Go's shorter files reduce the retrieval headroom (smaller baseline). Monorepos dilute recall at top-10 (fiber). Middleware queries with one-feature-per-file hit R=1.00 consistently.
Reproduce it yourself:
pip install code-context-engine
python benchmarks/run_benchmark.py --repo https://github.com/fastapi/fastapi.git --source-dir fastapi
python benchmarks/run_benchmark.py --repo https://github.com/go-chi/chi.git --source-dir .
Full results in benchmarks/results/. Queries and methodology in benchmarks/.
No comments yet. Be the first to share your thoughts!
Top skills in this category by stars