knowledge-rag

Name: knowledge-rag
Author: lyonzin

Verified

[knowledge-rag] - Drop docs, search instantly from Claude Code — 13 MCP tools, 20 format parsers, hybrid search + reranking. Zero servers, zero API keys, 100% local.

100stars

17forks

Python

Installation

# Add to your Claude Code skills
git clone https://github.com/lyonzin/knowledge-rag

Getting Started

Guides for using mcp servers skills like knowledge-rag.

Best MCP Servers in 2026
Category-by-category picks: databases, dev tools, productivity, browser automation.
What is an AI Skills Marketplace?
Definitions, how marketplaces work, and how to choose between them in 2026.
Getting Started with AI Skills
First-time install walkthrough for Claude Code, Codex CLI, and ChatGPT.

Security ReportVerified

Last scanned: 6/22/2026

{
  "issues": [
    {
      "file": "README.md",
      "line": 129,
      "type": "remote-install",
      "message": "Install command (remote install script piped to a shell — review the source before running): \"curl -fsSL .../install.sh | bash\"",
      "severity": "low"
    }
  ],
  "status": "PASSED",
  "scannedAt": "2026-06-22T09:50:27.550Z",
  "npmAuditRan": true,
  "pipAuditRan": false,
  "promptInjectionRan": true
}

README.md

Frequently Asked Questions

What is knowledge-rag?

knowledge-rag is an open-source mcp servers skill for AI coding assistants such as Claude Code, Codex CLI, and ChatGPT, built by lyonzin. [knowledge-rag] - Drop docs, search instantly from Claude Code — 13 MCP tools, 20 format parsers, hybrid search + reranking. Zero servers, zero API keys, 100% local. It has 100 GitHub stars.

Is knowledge-rag safe to use?

Yes. knowledge-rag passed SkillsLLM's automated security scan — a dependency vulnerability audit plus prompt-injection heuristics — with no high-severity issues. You can read the full report in the Security Report section on this page.

How do I install knowledge-rag?

Clone the repository with "git clone https://github.com/lyonzin/knowledge-rag" and add it to your Claude Code skills directory (see the Installation section above).

What programming language is knowledge-rag written in?

knowledge-rag is primarily written in Python. It is open-source under lyonzin on GitHub, so you can review or fork the full source.

Are there alternatives to knowledge-rag?

Yes. SkillsLLM lists many other MCP Servers skills you can browse and compare side by side. Open the MCP Servers category from the badge at the top of this page, or use the Related Skills and comparison links further down to weigh knowledge-rag against similar tools.

MCP for Beginners

Build MCP servers that give AI assistants real capabilities

36 minBeginner

Comments (0)

to leave a comment.

No comments yet. Be the first to share your thoughts!

Related Skills

ECC

by affaan-m

The agent harness performance optimization system. Skills, instincts, memory, security, and research-first development for Claude Code, Codex, Opencode, Cursor and beyond.

219,557

Popular in MCP Servers

Top skills in this category by stars

Scrapling

by D4Vinci

🕷️ An adaptive Web Scraping framework that handles everything from a single request to a full-scale crawl!

65,485

cve-search_mcp medical-mcp

Knowledge RAG

Your docs, your machine, zero cloud. Claude Code searches them natively.

Drop your PDFs, markdown, code, notebooks — 1800+ files, 39K chunks, indexed in under 3 minutes. Hybrid search (BM25 + semantic vectors + cross-encoder reranking) through 13 MCP tools. Everything runs locally via ONNX. No Docker, no Ollama, no API keys, no data leaves your machine.

pip install knowledge-rag → restart Claude Code → search_knowledge("your query")

13 MCP Tools | Hybrid Search + Reranking | 20 File Formats | Optional NVIDIA GPU | 100% Local

Star History

What's New in v4.2.0

Search Performance & Output Quality (v4.2.0)

128× faster BM25 search — replaced rank-bm25 full-corpus scan with a custom inverted-index implementation. Only documents containing query terms are scored, using numpy.argpartition for O(n) top-k selection. Adjacent chunk fetching now uses a single batched ChromaDB call instead of N round-trips, and an O(1) reverse lookup (_source_to_docid) eliminates linear scans.

Smarter output — two new parameters on search_knowledge:

snippet_mode (default: true) — truncates content to ~500 characters at natural break points, reducing token consumption by ~72%. Adds content_length field with original size; use get_document() for full content.
min_score — filters results below a normalized relevance threshold (0.0–1.0). Eliminates low-quality noise from results. Response includes filtered_by_score count for transparency.

Both parameters are fully backwards-compatible (existing callers see no change in behavior).

Enterprise Concurrent Access — SSE/HTTP Transport (v4.0.0)

The server now supports SSE and streamable-http transport modes. Instead of spawning a separate process per client (stdio), a single server process serves all clients with shared resources — 1 embedding model, 1 ChromaDB, 1 query cache.

# config.yaml
server:
  transport: "sse"        # "stdio" | "sse" | "streamable-http"
  host: "127.0.0.1"
  port: 8179

Or via CLI: knowledge-rag --transport sse

Optional enterprise features (all disabled by default):

Rate limiting: Sliding-window counter, configurable RPM and burst
Prometheus metrics: /metrics endpoint on separate port
Bearer auth: Token validation for SSE/HTTP connections

All 13 MCP tools are instrumented with @rate_limited and @instrument decorators — zero overhead when features are disabled. Default transport remains stdio for full backwards compatibility.

Migration: Existing users need zero changes. SSE mode is opt-in via server.transport: "sse" in config.yaml. See Configuration for details.

Quality Gate — 7-Pillar PR Validation

Every PR (including dependabot bumps and one-line fixes) is now evaluated against 35+ automated checks spread across 7 pillars before any human review:

Pillar	What it enforces	Tools
1 Security	SAST, secrets, CVEs, supply chain	bandit, semgrep, gitleaks, pip-audit, dependency-review, Snyk, CodeQL, Socket
2 Stability	Flake detection, coverage trend, test count, deterministic runs	pytest-rerunfailures, codecov ±0.5pp, test-count guard
3 Memory Leak	RSS bounded under 1000-query load, no idle bloat	psutil-based baseline tests + nightly 50K-iteration soak
4 Versatility	9 OS×Python combos, 14 format parsers, 4 config presets, locale tolerance, property-based fuzzing	matrix CI on Linux+Windows+macOS × 3.11+3.12+3.13, Hypothesis
5 Scalability	Performance regression > 10% blocks merge, public bench dashboard	pytest-benchmark, GH Pages chart
6 Versioning	Atomic version sync, API surface diff, conventional commits, CHANGELOG enforcement, backwards compat	griffe-style AST diff, custom guards
7 Quality	Type strictness, docstring coverage, complexity, dead code	mypy strict, interrogate ≥80%, radon, vulture

Plus a nightly resilience workflow that runs chaos failure-injection (HF down, ChromaDB corruption, watchdog crash, ONNX zero-byte replay), determinism check (full suite × 3), and mutation testing on selected modules.

Read the full philosophy in CONTRIBUTING.md. Report bugs via SECURITY.md or the issue templates.

Critical Hotfix — No More Silent Zero-Vector Corruption (v3.8.1)

FastEmbedEmbeddings.__call__ no longer swallows exceptions and returns [[0.0]*dim, ...] when the ONNX model fails to load. That bug pre-existed in master but was silent: ChromaDB happily stored zero embeddings, count() reported normal numbers, smart-reindex skipped them as "already indexed", and queries returned garbage similarity with no error visible. Now raises EmbeddingModelLoadError / EmbeddingError loudly. All v3.8.0 users should upgrade. Full details in Changelog.

Lazy-Loaded Embeddings — Cheaper Idle Processes (v3.8.0)

The FastEmbed ONNX model (~200MB resident) now loads on the first query, not at startup. Idle knowledge-rag processes are now genuinely cheap. Why this matters: MCP stdio is one-process-per-client by protocol — multiple Claude Code windows, Claude Desktop + IDE simultaneously, or review/approval flows that open extra connections all spawn their own processes. Before v3.8.0, every one of them paid the full embedding-model cost up front. Now only processes that actually serve queries load the model. Public API is unchanged.

Opt-In Single-Instance Guard (v3.8.0)

For users who measured their setup and want a hard cap of one server per data_dir:

export KNOWLEDGE_RAG_SINGLE_INSTANCE=1

A second instance exits immediately with code 75. OFF by default so multi-client MCP usage continues to work unchanged. Stale-PID recovery + SIGINT/SIGTERM cleanup wired correctly. Full guide in docs/single-instance.md. Sample MCP config in examples/mcp-config-single-instance.json.

5 Ways to Install

npx -y knowledge-rag                    # NPM — zero setup, auto-manages Python venv
pip install knowledge-rag               # PyPI — classic Python install
curl -fsSL .../install.sh | bash        # One-line installer (Linux/macOS/Windows)
docker pull ghcr.io/lyonzin/knowledge-rag  # Docker — models pre-downloaded
git clone ... && pip install -r ...     # From source

All methods produce the same MCP server. See Installation for full instructions.

Recent Highlights

v4.0.0 — Enterprise concurrent access: SSE/HTTP transport (1 server → N clients), thread-safe shared state, optional rate limiting + Prometheus metrics, ChromaDB WAL mode, --transport CLI
v3.9.0 — Quality Gate activated: 35+ automated PR checks across 7 pillars (Security, Stability, Memory Leak, Versatility, Scalability, Versioning, Quality) + nightly resilience suite (chaos, soak, determinism, mutation)
v3.8.1 — Critical hotfix: loud-fail embeddings (no more silent zero-vector corruption); Windows CI flake erradicated (HF_HUB_OFFLINE + shell:bash + atexit wrapper)
v3.8.0 — Lazy-load embeddings, opt-in single-instance guard, version sync across PyPI/NPM/Docker
v3.6.0 — Multi-language code parsing (C/C++/JS/TS/XML), NPM wrapper, Docker image, automated release pipeline
v3.5.2 — CUDA DLL auto-discovery from pip packages, graceful GPU→CPU fallback, explicit CPU provider (no CUDA noise when gpu: false), BASE_DIR resolution fix for editable installs
v3.5.1 — Remove Python <3.13 upper bound — 3.13 and 3.14 now supported
v3.5.0 — Optional GPU acceleration, suppo