by Ontos-AI
Knowhere extracts, parses, and outputs structured chunks ready for AI Agents and RAG.
# Add to your Claude Code skills
git clone https://github.com/Ontos-AI/knowhereWe're not developing the next MinerU, instead, we're building document memory infrastructure that agents can effectively consume.
Knowhere turns unstructured documents into persistent, navigable memory for AI agents. It handles parsing, hierarchy extraction, multi-modal structuring, and graph construction, giving your agents structured, high-quality context for Agentic RAG, traditional RAG, or any LLM workflow.
[!TIP] Knowhere stands on the shoulders of giants like MinerU and Pymupdf. We take their output, optimize it, and then build hierarchical structure and multi-modal cross-document graphs on top. The result is a persistent, citable memory layer purpose-built for agent consumption.
[!NOTE] Get started in seconds with Knowhere Cloud. Avoid the complexity of self-deployment. Use our managed API at knowhereto.ai and enjoy $5 in free credits upon registration.
No comments yet. Be the first to share your thoughts!
Knowhere turns raw documents into a structured memory store that AI agents can navigate and cite. The process follows two steps:
Parsing, chunking, hierarchy extraction, and graph construction are unified into one outcome: a navigable memory layer for AI agents.
Agents retrieve by navigating memory instead of depending on a single flat vector lookup.
Knowhere enhances the accuracy of AI agents when performing tasks (e.g., searching, modifying, and answering) in real-world data. Compared to providing raw documents directly to agents or .md/.json files produced by other parsers, Knowhere achieves higher success rates with fewer resources.
(Data generated from internal evaluation across identical agentic RAG tasks.)
[!NOTE] π Benchmarks are actively expanding. The comparison above currently covers MinerU as the baseline parser. We are continuously adding more parsing tools and retrieval baselines β stay tuned for updated results.
| Repository | Description | |---|---| | knowhere | This repo. Backend API and worker β document ingestion, parsing, graph construction, and retrieval. | | π₯οΈ knowhere-dashboard | The web UI. Connects to the API for the full product experience. | | π³ knowhere-self-hosted | Docker Compose stack for self-hosted deployments. Packages the API, worker, and dashboard together. | | π knowhere-python-sdk | Official Python SDK for the Knowhere Cloud API. | | π¦ knowhere-node-sdk | Official Node.js SDK for the Knowhere Cloud API. |
Q: Is MinerU strictly required for Knowhere to work? A: No. While MinerU is currently our default choice for parsing PDFs and PPT, because it performs the best in our experiments, any tool that can convert documents to Markdown works. Knowhere's real value lies in what happens alongside and after the initial conversion: memory-oriented parsing optimizations (fixing real-world parser deficiencies), reconstructing the hierarchical structure, normalizing multi-modal assets, and building the cross-document navigation graph.
Q: What are the LLM / VLM dependencies?
A: Knowhere requires standard language models to structure the document memory. By default, it uses DeepSeek (deepseek-chat) for text/table summarization and hierarchy generation, and Qwen-VL (qwen3.5-flash) for image OCR and visual descriptions. However, it is entirely model-agnosticβyou can easily configure it to use OpenAI, DashScope (Ali), Zhipu (GLM), or Volcengine (ARK) via environment variables.
Q: How does Agentic Retrieval differ from traditional RAG? A: Traditional RAG relies on flat vector similarity, which often retrieves isolated, out-of-context text snippets. Knowhere's Agentic Retrieval instead uses a multi-agent workflow to actively navigate the hierarchical section tree and cross-document graph. Agents read the document structure like a human would, drilling down into relevant sections to find precise, well-contextualized evidence.
Q: Can it handle multi-modal data like images and tables? A: Yes. Knowhere extracts inline images and tables, passes them through Vision-Language Models (VLMs) for summarization and feature extraction, and explicitly links them back to their original text chunks. This ensures that agents can retrieve and cite multi-modal assets accurately during inference.
β Supported
.pdf .docx .pptx .xlsx .csv.jpg .png.md .txt .jsonβ³ Coming Soon
.epub .html .xml.mp4 .mp3.skills.mdWant to see a new format supported? Adding a parser is a great first contribution. Check out CONTRIBUTING.md to get started.
uvdocker composeuv sync --all-packages
cp apps/api/.env.example apps/api/.env
cp apps/worker/.env.example apps/worker/.env