by ory
Reduce Claude Code, Codex, OpenCode wall clock and token use by 50% with open source, local semantic search. Works for small and large codebases and monorepos! Enterprise-ready and fully compliant via Ollama and SQLite-vec.
# Add to your Claude Code skills
git clone https://github.com/ory/lumen
Claude reads entire files to find what it needs. Lumen gives it a map.
Lumen is a 100% local semantic code search engine for AI coding agents. No API keys, no cloud, no external database, just open-source embedding models (Ollama or LM Studio), SQLite, and your CPU. A single static binary and your own local embedding server.
The payoff is measurable and reproducible: across 8 benchmark runs on 8 languages and real GitHub bug-fix tasks, Lumen cuts cost in every single language — up to 39%. Output tokens drop by up to 66%, sessions complete up to 53% faster, and patch quality is maintained in every task. All verified with a transparent, open-source benchmark framework that you can run yourself.
| | With Lumen | Baseline (no Lumen) | | ---------------------- | ----------------------------- | -------------------- | | Cost (avg, bug-fix) | $0.29 (-26%) | $0.40 | | Time (avg, bug-fix) | 125s (-28%) | 174s | | Output tokens (avg) | (-37%) | 8,323 | | JavaScript (marked) | (-33%, -53%) | $0.48, 255s | | Rust (toml) | (-39%, -34%) | $0.61, 310s | | PHP (monolog) | (-27%, -34%) | $0.19, 52s | | TypeScript (commander) | (-27%, -33%) | $0.19, 84s | | Patch quality | | — |
No comments yet. Be the first to share your thoughts!
Claude Code asking about the
Prometheus codebase. Lumen's
semantic_search finds the relevant code without reading entire files.
Prerequisites:
Platform support: Linux, macOS, and Windows. File locking for background indexing coordination uses
flock(2)on Unix andLockFileExon Windows (via gofrs/flock).
ollama pull ordis/jina-embeddings-v2-base-code
Install:
/plugin marketplace add ory/claude-plugins
/plugin install lumen@ory
That's it. On first session start, Lumen:
semantic_search MCP tool that Claude uses automaticallyTwo skills are also available: /lumen:doctor (health check) and
/lumen:reindex (forced re-indexing).
Lumen sits between your codebase and Claude as an MCP server. When a session starts, it walks your project and builds a Merkle tree over file hashes: only changed files get re-chunked and re-embedded. Each file is split into semantic chunks (functions, types, methods) using Go's native AST or tree-sitter grammars for other languages. Chunks are embedded and stored in SQLite + sqlite-vec using cosine-distance KNN for retrieval.
Files → semantic chunks → vector embeddings → SQLite/sqlite-vec → KNN search
When Claude needs to understand code, it calls semantic_search instead of
reading entire files. The index is stored outside your repo
(~/.local/share/lumen/<hash>/index.db), keyed by project path and model name —
different models never share an index.
Lumen is evaluated using bench-swe: a SWE-bench-style harness that runs Claude on real GitHub bug-fix tasks and measures cost, time, output tokens, and patch quality — with and without Lumen. All results are reproducible: raw JSONL streams, patch diffs, and judge ratings are committed to this repository.
Key results — 8 runs across 8 languages, hard difficulty, real GitHub
issues (ordis/jina-embeddings-v2-base-code, Ollama):
| Language | Cost Reduction | Time Reduction | Output Token Reduction | Quality | | ---------- | -------------- | -------------- | ----------------------- | -------------- | | Rust | -39% | -34% | -31% (18K → 12K) | Poor (both) | | JavaScript | -33% | -53% | -66% (14K → 5K) | Perfect (both) | | TypeScript | -27% | -33% | -64% (5K → 1.8K) | Good (both) | | PHP | -27% | -34% | -59% (1.9K → 0.8K) | Good (both) | | Ruby | -24% | -11% | -9% (6.1K → 5.6K) | Good (both) | | Python | -20% | -29% | -36% (1.7K → 1.1K) | Perfect (both) | | Go | -12% | -9% | -10% (11K → 10K) | Good (both) | | C++ | -8% | -3% | +42% (feature task) | Good (both) |
Cost was reduced in every language tested. Quality was maintained in every task — zero regressions. JavaScript and TypeScript show the most dramatic efficiency gains: same quality fixes in half the time with two-thirds fewer tokens. Even on tasks too hard for either approach (Rust), Lumen cuts the cost of failure by 39%.
See docs/BENCHMARKS.md for all 8 per-language deep dives, judge rationales, and reproduce instructions.
Supports 12 language families with semantic chunking (9 benchmarked):
| Language | Parser | Extensions | Benchmark status |
| ---------------- | ----------- | ----------------------------------------- | --------------------------------------------- |
| Go | Native AST | .go | Benchmarked: -12% cost, Good quality |
| Python | tree-sitter | .py | Benchmarked: Perfect quality, -36% tokens |
| TypeScript / TSX | tree-sitter | .ts, .tsx | Benchmarked: -64% tokens, -33% time |
| JavaScript / JSX | tree-sitter | .js, .jsx, .mjs | Benchmarked: -66% tokens, -53% time |
| Dart | tree-sitter | .dart | Benchmarked: -76% cost, -82% tokens, -79% time |
| Rust | tree-sitter | .rs | Benchmarked: -39% cost, -34% time |
| Ruby | tree-sitter | .rb | Benchmarked: -24% cost, -11% time |
| PHP | tree-sitter | .php | Benchmarked: -59% tokens, -34% time |
| C / C++ | tree-sitter | .c, .h, .cpp, .cc, .cxx, .hpp | Benchmarked: -8% cost (C++ feature task) |
| Java | tree-sitter | .java | Supported |
| C# | tree-sitter | .cs | Supported |
Go uses the native Go AST parser for the most precise chunks. All other languages use tree-sitter grammars. See docs/BENCHMARKS.md for all 9 per-language benchmark deep dives.
All configuration is via environment variables:
| Variable | Default | Description |
| ------------------------ | ------------------------ | ------------------------------------------------------------- |
| LUMEN_EMBED_MODEL | see note ¹ |