by introfini
AI semantic search for Zotero, with a built-in MCP server for AI agents (Claude Code, Codex). Find papers by meaning. 100% local and private.
# Add to your Claude Code skills
git clone https://github.com/introfini/ZotSeekLast scanned: 6/12/2026
{
"issues": [
{
"type": "npm-audit",
"message": "@protobufjs/utf8: protobufjs has overlong UTF-8 decoding",
"severity": "medium"
},
{
"type": "npm-audit",
"message": "brace-expansion: brace-expansion: Zero-step sequence causes process hang and memory exhaustion",
"severity": "medium"
},
{
"type": "npm-audit",
"message": "defu: defu: Prototype pollution via `__proto__` key in defaults argument",
"severity": "high"
},
{
"type": "npm-audit",
"message": "esbuild: esbuild enables any website to send any requests to the development server and read the response",
"severity": "medium"
},
{
"type": "npm-audit",
"message": "lodash: lodash vulnerable to Code Injection via `_.template` imports key names",
"severity": "high"
},
{
"type": "npm-audit",
"message": "minimatch: minimatch has a ReDoS via repeated wildcards with non-matching literal in pattern",
"severity": "high"
},
{
"type": "npm-audit",
"message": "picomatch: Picomatch: Method Injection in POSIX Character Classes causes incorrect Glob Matching",
"severity": "high"
},
{
"type": "npm-audit",
"message": "protobufjs: Arbitrary code execution in protobufjs",
"severity": "critical"
},
{
"type": "npm-audit",
"message": "tar: node-tar Vulnerable to Arbitrary File Creation/Overwrite via Hardlink Path Traversal",
"severity": "high"
},
{
"type": "npm-audit",
"message": "yaml: yaml is vulnerable to Stack Overflow via deeply nested YAML collections",
"severity": "medium"
}
],
"status": "FAILED",
"scannedAt": "2026-06-12T08:26:14.331Z",
"npmAuditRan": true,
"pipAuditRan": true,
"promptInjectionRan": true
}No comments yet. Be the first to share your thoughts!
Requires a passing catalog security scan. Resolve the flagged issues and resubmit to enable featuring.
Find similar papers by meaning, not just keywords. 100% local, no data leaves your machine. Now with a built-in MCP server for AI agents.
Status: ✅ Stable release · Zotero 8 & 9 · Transformers.js running locally
New: 🤖 MCP server built in — Claude Code, Codex, and any MCP client can search your library and cite papers with links that open straight to the matched PDF page. Fully local, read-only, opt-in. Set it up in one line →

zotseek-exclude to skip them during indexingZotSeek is designed with privacy as a core principle:
| Aspect | Guarantee |
|---|---|
| AI Model | Bundled with the plugin (131MB) — no downloads, no API calls |
| Processing | All AI inference runs locally on your CPU/GPU |
| Your Papers | Only indexes items from your local Zotero library |
| Network | Zero network requests for search or indexing |
| Storage | Embeddings saved locally in zotseek.sqlite in your Zotero data folder |
| Offline | Works completely offline after installation |
What this means:





flowchart TD
subgraph INDEX["1️⃣ INDEX"]
A[📄 Paper] --> B[🤖 AI Model] --> C[768 numbers]
end
subgraph SEARCH["2️⃣ SEARCH"]
D[🔍 Query] --> E[Query → 768 numbers]
E --> F{Compare all papers}
F --> G[📊 Ranked results]
end
C -.->|stored| F
How it works: Each paper becomes 768 numbers capturing its meaning. To search, we convert your query to numbers and find papers with similar numbers.
When you use "Index Current Collection" or "Update Library Index":
For each paper:
1. Extract title + abstract (Abstract mode)
— OR —
Extract PDF text page-by-page with exact page numbers (Full Document mode)
2. Split into paragraphs, filter out References/Bibliography
3. Send to local AI model (nomic-embed-text-v1.5)
4. Model outputs 768 numbers per chunk (the "embedding")
5. Save embeddings + location metadata to local database (zotseek.sqlite)
Time: ~3 seconds per chunk
When you right-click → "Find Similar Documents":
1. Load the selected paper's embedding
2. Compare against all indexed papers (cached in memory)
3. Rank by semantic similarity
4. Show top results
Time: ~70ms (with cache)
The plugin combines semantic search (AI embeddings) with Zotero's keyword search using Reciprocal Rank Fusion (RRF) for optimal results.
| Mode | Best For | How It Works |
|---|---|---|
| 🔗 Hybrid (Recommended) | Most searches | Combines semantic + keyword results |
| 🧠 Semantic Only | Conceptual queries | Finds related papers by meaning |
| 🔤 Keyword Only | Author/year searches | Exact title, author, year matching |
| Query Type | Pure Semantic | Pure Keyword | Hybrid |
|---|---|---|---|
| "trust in AI" | ✅ Great | ❌ Poor | ✅ Great |
| "Smith 2023" | ❌ Poor | ✅ Great | ✅ Great |
| "RLHF" | ⚠️ Maybe | ✅ Exact only | ✅ Both |
| Icon | Meaning |
|---|---|
| 🔗 | Found by BOTH semantic and keyword (high confidence) |
| 🧠 | Found by semantic search only (conceptually related) |
| 🔤 | Found by keyword search only (exact match) |
The Source column shows which section of the paper matched your query:
| Source | Section Type |
|---|---|
| Abstract | Title + Abstract |
| Methods | Introduction, Background, Methods |
| Results | Results, Discussion, Conclusions |
| Content | Generic (sections not detected) |
Hover any result row to see a tooltip with the exact passage that matched your query, along with its location (page & paragraph), section type, and match score. This lets you judge whether a result is relevant without opening the paper. In Keyword and Hybrid searches the query terms are highlighted inside the passage, and the preview is centered on the first match so the relevant text is always in view. (Pure semantic search has no literal terms to highlight, so the passage is shown without highlighting.)
When using Full Document indexing mode, you can toggle between two result views:
| Mode | Results | Best For |
|---|---|---|
| By Section (default) | 1 result per paper, best matching section, with the location of that match | Overview of matching papers |
| By Location | Every matching paragraph with exact page & paragraph | Finding specific passages |
By Section - Aggregates all chunks per paper and shows the highest-scoring match. The Location column shows where that best match was found (page & paragraph), so you get one diverse result per paper without losing the exact location:

By Location - Returns every matching paragraph individually with its own score:

In By Location mode, clicking a result opens the PDF to the exact page where the match was found.
Combine up to 4 search queries to find papers at the intersection of multiple topics:
| Operator | Behavior | Best For |
|---|---|---|
| AND | Papers must match ALL queries | Finding topic intersections |
| OR | Papers can match ANY query | Broadening search with synonyms |
| AND Formula | How It Works | Use When |
|---|---|---|
| Minimum (default) | Uses lowest score across queries | You want strict intersection |
| Product | Geometric mean of scores | Balanced relevance across all queries |
| Average | Arithmetic mean of scores | More lenient matching |
Example: Search for papers about "machine learning" AND "healthcare" AND "ethics" to find AI ethics papers specifically in the medical domain.
Match column with multiple queries: Shows combined score plus individual per-query scores:
73% (77|73|68) = 73% combined, with 77% for Q1, 73% for Q2, 68% for Q3For technical details, see docs/SEARCH_ARCHITECTURE.md.
| Mode | What Gets Indexed | Best For |
|---|---|---|
| Abstract | Title + Abstract | Fast indexing, quick setup |
| Full Document (default) | PDF content split by sections | Deep content search, better results |
Configure via **Zote