by iohub
A Repository-Aware, Knowledge-Learning local MCP that understands your codebase in Real Time powered by Hybrid Semantic + Full-Text Engine, both supports claude code and codex
# Add to your Claude Code skills
git clone https://github.com/iohub/codexrayGuides for using mcp servers skills like codexray.
codexray is an open-source mcp servers skill for AI coding assistants such as Claude Code, Codex CLI, and ChatGPT, built by iohub. A Repository-Aware, Knowledge-Learning local MCP that understands your codebase in Real Time powered by Hybrid Semantic + Full-Text Engine, both supports claude code and codex. It has 62 GitHub stars.
codexray's catalog security scan is still queued. You can run an instant dependency and prompt-injection check now with the "Scan for vulnerabilities" button above.
Clone the repository with "git clone https://github.com/iohub/codexray" and add it to your Claude Code skills directory (see the Installation section above).
codexray is primarily written in Rust. It is open-source under iohub on GitHub, so you can review or fork the full source.
Yes. SkillsLLM lists many other MCP Servers skills you can browse and compare side by side. Open the MCP Servers category from the badge at the top of this page, or use the Related Skills and comparison links further down to weigh codexray against similar tools.
No comments yet. Be the first to share your thoughts!
Top skills in this category by stars
Unlocks once the catalog security scan passes (runs nightly).
The deep catalog scan for this skill is still queued. Run an instant dependency check now instead.
Code search & knowledge engine for the AI era. Semantic + full-text hybrid search, real-time indexing, call graph + code vectors + commit vectors + knowledge vectors โ unified into one native MCP server.
Built natively for Claude Code/Codex CLI โ zero daemon, zero config overhead.
๐ ไธญๆๆๆกฃ
Goes beyond keyword matching. Dense vector search understands code intent ("login logic" โ authenticateUser), while BM25 full-text search locks in exact matches. Results are fused via RRF and re-ranked by Cross-Encoder for precision. When embedding API is unavailable, gracefully falls back to graph search โ never breaks.
Call graph + code vectors + commit vectors + knowledge vectors โ four dimensions of codebase awareness. Tree-sitter AST parses 7 languages to build complete function/class/method relationships:
Full build on first run, only re-processes changed files thereafter (MD5 diff). Auto-indexes on MCP startup and watches file changes during runtime. Auto-cleans orphaned embeddings โ index never bloats.
Built specifically for Claude Code/Codex CLI MCP stdio protocol. Install registers MCP automatically โ no manual config, no persistent daemon. Starts and exits with Claude Code, zero residue. All code and data stay local, no SaaS required.
curl -fsSL https://raw.githubusercontent.com/iohub/codexray/main/install.sh | sh
Auto-detects OS/arch/libc, downloads, installs, and registers MCP. Restart Claude Code after โ done.
First run: codexray install auto-launches an interactive setup wizard for the embedding API (graph search works without configuration). CodeXray works out of the box for call graph and name search.
curl -L -o codexray.tar.gz https://github.com/iohub/codexray/releases/latest/download/codexray-linux-x64-musl.tar.gz
tar -xzf codexray.tar.gz
./codexray install && rm codexray.tar.gz
Other platforms: replace linux-x64-musl with darwin-arm64, darwin-x64, or linux-x64 from the latest release.
git clone https://github.com/iohub/codexray.git && cd codexray
cargo build --release && ./rust-core/target/release/codexray install
Source files
โ Tree-sitter AST parse (7 languages)
โ Extract functions / classes / methods
โ Build call graph (PetCodeGraph)
โ Batch embed via API (SQLite cache)
โ Store vectors in LanceDB
โ Build BM25 index in Tantivy
โ Save to ~/.codexray/<project_hash>/
Idempotent: index builds are incremental โ the first run is a full build, subsequent runs compare MD5 hashes and only re-process changed files.
CodeXray search) โโโโโโโโโโโโโโโโโโโโโโโ
User query โโโโโโโโโโโโโโ Embedding Model โโโโ Query vector
โโโโโโโโโโโโโโโโโโโโโโโ
โ
โโโโโโโโโโโโโโโโโโโโโโโโโผโโโโโโโโโโโโโโโโโโโโโโโโ
โผ โผ โผ
โโโโโโโโโโโโโโโ โโโโโโโโโโโโโโโ โโโโโโโโโโโโโโโ
โ Dense Search โ โ Sparse Searchโ โ Graph Search โ
โ (LanceDB ANN)โ โ (Tantivy BM25)โ โ (PetCodeGraph)โ
โโโโโโโโฌโโโโโโโโ โโโโโโโโฌโโโโโโโ โโโโโโโโฌโโโโโโโ
โ โ โ
โโโโโโโโโโโโโโโโโโโโโโโโผโโโโโโโโโโโโโโโโโโโโโโโ
โผ
โโโโโโโโโโโโโโโโโโโ
โ RRF Fusion โ โ Reciprocal Rank Fusion
โ (Top-20 candidates)โ
โโโโโโโโโโฌโโโโโโโโโ
โ
โผ
โโโโโโโโโโโโโโโโโโโ
โ Reranker โ โ Cross-Encoder fine re-ranking
โ (Qwen3-Reranker)โ scores each (query, code) pair
โโโโโโโโโโฌโโโโโโโโโ
โ
โผ
โโโโโโโโโโโโโโโโโโโ
โ Final Results โ โ Top-5 (or Top-N)
โโโโโโโโโโโโโโโโโโโ
| Stage | Technology | Role |
|---|---|---|
| Dense Search | LanceDB + Embedding Model | Semantic vector similarity |
| Sparse Search | Tantivy BM25 | Keyword & token matching |
| RRF Fusion | Reciprocal Rank Fusion | Merge heterogeneous scores fairly |
| Reranker | Cross-Encoder (Qwen3-Reranker-4B) | Full-interaction precision scoring |
| Fallback | PetCodeGraph | Graph-based name search (no API needed) |
If embedding/reranker are unavailable, the pipeline falls back gracefully to graph-based name search and BM25-only mode.
| Mode | When | Trigger |
|---|---|---|
| MCP server | On startup + file changes | codexray install + restart Claude Code |
The MCP server automatically indexes on startup, watches file changes during runtime, and injects CLAUDE.md for tool discovery.
~/.codexray/config.json (global, shared across all projects)~/.codexray/<md5(project_root)>/
project.json โ Project metadatagraph.bin โ Serialized call graphembeddings.lance/ โ LanceDB vector datatantivy_bm25/ โ BM25 full-text indexfile_hashes.json โ MD5 incremental trackingembedding_hashes.json โ Embedding incremental trackingNo daemon, no HTTP server. Every CLI command is a standalone process.
| Language | Functions | Structs/Classes | Call Graph |
|---|---|---|---|
| Rust | โ | โ | โ |
| Python | โ | โ | โ |
| JavaScript | โ | โ | โ |
| TypeScript | โ | โ | โ |
| Go | โ | โ | โ |
| C/C++ | โ | โ | โ |
| Java | โ | โ | โ |
~/.codexray/config.json:
{
"embedding": {
"provider": "openai-compatible",
"model": "Qwen/Qwen3-Embedding-4B",
"api_token": "sk-...",
"api_base_url": "https://api.siliconflow.cn/v1",
"dimensions": 2560
},
"index": {
"min_code_block_length": 16,
"enable_reranker": true,
"hybrid": {
"enable_bm25": true,
"bm25_top_k": 100,
"vector_top_k": 100,
"rrf_k": 60,
"rrf_top_k": 20,
"short_code_threshold": 30,
"short_code_penalty": 0.5
},
"reranker": {
"enabled": true,
"model": "Qwen/Qwen3-Reranker-4B",
"api_token": "sk-...",
"api_base_url": "https://api.siliconflow.cn/v1/rerank",
"top_n": 5,
"candidate_multiplier": 5,
"timeout_secs": 60
}
},
"installed_hooks": {}
}
| Model | Role | When |
|---|---|---|
Qwen/Qwen3-Embedding-4B |
Converts code โ vectors for dense search | Index building |
Qwen/Qwen3-Reranker-4B |
Scores (query, code) pairs for precision | Search time |
Set via the interactive wizard on first run, or create manually. If embedding API is unavailable, graph-based search still works.
cd rust-core
# Build
cargo build
# Build release
cargo build --release
# Run tests
cargo test
# Run specific test
cargo test test_build_graph_functionality -- --nocapture
MIT
Built with: Tree-sitter ยท Petgraph ยท LanceDB ยท Tantivy ยท Tokio ยท Clap ยท Axum