by lyonzin
[knowledge-rag] - Drop docs, search instantly from Claude Code — 13 MCP tools, 20 format parsers, hybrid search + reranking. Zero servers, zero API keys, 100% local.
# Add to your Claude Code skills
git clone https://github.com/lyonzin/knowledge-ragGuides for using mcp servers skills like knowledge-rag.
Last scanned: 6/22/2026
{
"issues": [
{
"file": "README.md",
"line": 129,
"type": "remote-install",
"message": "Install command (remote install script piped to a shell — review the source before running): \"curl -fsSL .../install.sh | bash\"",
"severity": "low"
}
],
"status": "PASSED",
"scannedAt": "2026-06-22T09:50:27.550Z",
"npmAuditRan": true,
"pipAuditRan": false,
"promptInjectionRan": true
}knowledge-rag is an open-source mcp servers skill for AI coding assistants such as Claude Code, Codex CLI, and ChatGPT, built by lyonzin. [knowledge-rag] - Drop docs, search instantly from Claude Code — 13 MCP tools, 20 format parsers, hybrid search + reranking. Zero servers, zero API keys, 100% local. It has 100 GitHub stars.
Yes. knowledge-rag passed SkillsLLM's automated security scan — a dependency vulnerability audit plus prompt-injection heuristics — with no high-severity issues. You can read the full report in the Security Report section on this page.
Clone the repository with "git clone https://github.com/lyonzin/knowledge-rag" and add it to your Claude Code skills directory (see the Installation section above).
knowledge-rag is primarily written in Python. It is open-source under lyonzin on GitHub, so you can review or fork the full source.
Yes. SkillsLLM lists many other MCP Servers skills you can browse and compare side by side. Open the MCP Servers category from the badge at the top of this page, or use the Related Skills and comparison links further down to weigh knowledge-rag against similar tools.
No comments yet. Be the first to share your thoughts!
Top skills in this category by stars
Drop your PDFs, markdown, code, notebooks — 1800+ files, 39K chunks, indexed in under 3 minutes. Hybrid search (BM25 + semantic vectors + cross-encoder reranking) through 13 MCP tools. Everything runs locally via ONNX. No Docker, no Ollama, no API keys, no data leaves your machine.
pip install knowledge-rag → restart Claude Code → search_knowledge("your query")
13 MCP Tools | Hybrid Search + Reranking | 20 File Formats | Optional NVIDIA GPU | 100% Local
What's New | Supported Formats | Installation | Configuration | API Reference | Architecture
128× faster BM25 search — replaced rank-bm25 full-corpus scan with a custom inverted-index implementation. Only documents containing query terms are scored, using numpy.argpartition for O(n) top-k selection. Adjacent chunk fetching now uses a single batched ChromaDB call instead of N round-trips, and an O(1) reverse lookup (_source_to_docid) eliminates linear scans.
Smarter output — two new parameters on search_knowledge:
snippet_mode (default: true) — truncates content to ~500 characters at natural break points, reducing token consumption by ~72%. Adds content_length field with original size; use get_document() for full content.min_score — filters results below a normalized relevance threshold (0.0–1.0). Eliminates low-quality noise from results. Response includes filtered_by_score count for transparency.Both parameters are fully backwards-compatible (existing callers see no change in behavior).
The server now supports SSE and streamable-http transport modes. Instead of spawning a separate process per client (stdio), a single server process serves all clients with shared resources — 1 embedding model, 1 ChromaDB, 1 query cache.
# config.yaml
server:
transport: "sse" # "stdio" | "sse" | "streamable-http"
host: "127.0.0.1"
port: 8179
Or via CLI: knowledge-rag --transport sse
Optional enterprise features (all disabled by default):
/metrics endpoint on separate portAll 13 MCP tools are instrumented with @rate_limited and @instrument decorators — zero overhead when features are disabled. Default transport remains stdio for full backwards compatibility.
Migration: Existing users need zero changes. SSE mode is opt-in via
server.transport: "sse"in config.yaml. See Configuration for details.
Every PR (including dependabot bumps and one-line fixes) is now evaluated against 35+ automated checks spread across 7 pillars before any human review:
| Pillar | What it enforces | Tools |
|---|---|---|
| 1 Security | SAST, secrets, CVEs, supply chain | bandit, semgrep, gitleaks, pip-audit, dependency-review, Snyk, CodeQL, Socket |
| 2 Stability | Flake detection, coverage trend, test count, deterministic runs | pytest-rerunfailures, codecov ±0.5pp, test-count guard |
| 3 Memory Leak | RSS bounded under 1000-query load, no idle bloat | psutil-based baseline tests + nightly 50K-iteration soak |
| 4 Versatility | 9 OS×Python combos, 14 format parsers, 4 config presets, locale tolerance, property-based fuzzing | matrix CI on Linux+Windows+macOS × 3.11+3.12+3.13, Hypothesis |
| 5 Scalability | Performance regression > 10% blocks merge, public bench dashboard | pytest-benchmark, GH Pages chart |
| 6 Versioning | Atomic version sync, API surface diff, conventional commits, CHANGELOG enforcement, backwards compat | griffe-style AST diff, custom guards |
| 7 Quality | Type strictness, docstring coverage, complexity, dead code | mypy strict, interrogate ≥80%, radon, vulture |
Plus a nightly resilience workflow that runs chaos failure-injection (HF down, ChromaDB corruption, watchdog crash, ONNX zero-byte replay), determinism check (full suite × 3), and mutation testing on selected modules.
Read the full philosophy in CONTRIBUTING.md. Report bugs via SECURITY.md or the issue templates.
FastEmbedEmbeddings.__call__ no longer swallows exceptions and returns [[0.0]*dim, ...] when the ONNX model fails to load. That bug pre-existed in master but was silent: ChromaDB happily stored zero embeddings, count() reported normal numbers, smart-reindex skipped them as "already indexed", and queries returned garbage similarity with no error visible. Now raises EmbeddingModelLoadError / EmbeddingError loudly. All v3.8.0 users should upgrade. Full details in Changelog.
The FastEmbed ONNX model (~200MB resident) now loads on the first query, not at startup. Idle knowledge-rag processes are now genuinely cheap. Why this matters: MCP stdio is one-process-per-client by protocol — multiple Claude Code windows, Claude Desktop + IDE simultaneously, or review/approval flows that open extra connections all spawn their own processes. Before v3.8.0, every one of them paid the full embedding-model cost up front. Now only processes that actually serve queries load the model. Public API is unchanged.
For users who measured their setup and want a hard cap of one server per data_dir:
export KNOWLEDGE_RAG_SINGLE_INSTANCE=1
A second instance exits immediately with code 75. OFF by default so multi-client MCP usage continues to work unchanged. Stale-PID recovery + SIGINT/SIGTERM cleanup wired correctly. Full guide in docs/single-instance.md. Sample MCP config in examples/mcp-config-single-instance.json.
npx -y knowledge-rag # NPM — zero setup, auto-manages Python venv
pip install knowledge-rag # PyPI — classic Python install
curl -fsSL .../install.sh | bash # One-line installer (Linux/macOS/Windows)
docker pull ghcr.io/lyonzin/knowledge-rag # Docker — models pre-downloaded
git clone ... && pip install -r ... # From source
All methods produce the same MCP server. See Installation for full instructions.
--transport CLIgpu: false), BASE_DIR resolution fix for editable installs<3.13 upper bound — 3.13 and 3.14 now supported