ZotSeek

Name: ZotSeek
Author: introfini

Issues

AI semantic search for Zotero, with a built-in MCP server for AI agents (Claude Code, Codex). Find papers by meaning. 100% local and private.

166stars

8forks

TypeScript

Installation

# Add to your Claude Code skills
git clone https://github.com/introfini/ZotSeek

Getting Started

Guides for using ai agents skills like ZotSeek.

Caveman: Cut Claude Token Use by 65%
How agent-side prompt compression works, when to use it, and when not to.
What is an AI Skills Marketplace?
Definitions, how marketplaces work, and how to choose between them in 2026.
Getting Started with AI Skills

Security ReportIssues

Last scanned: 6/12/2026

{
  "issues": [
    {
      "type": "npm-audit",
      "message": "@protobufjs/utf8: protobufjs has overlong UTF-8 decoding",
      "severity": "medium"
    },
    {
      "type": "npm-audit",
      "message": "brace-expansion: brace-expansion: Zero-step sequence causes process hang and memory exhaustion",
      "severity": "medium"
    },
    {
      "type": "npm-audit",
      "message": "defu: defu: Prototype pollution via `__proto__` key in defaults argument",
      "severity": "high"
    },
    {
      "type": "npm-audit",
      "message": "esbuild: esbuild enables any website to send any requests to the development server and read the response",
      "severity": "medium"
    },
    {
      "type": "npm-audit",
      "message": "lodash: lodash vulnerable to Code Injection via `_.template` imports key names",
      "severity": "high"
    },
    {
      "type": "npm-audit",
      "message": "minimatch: minimatch has a ReDoS via repeated wildcards with non-matching literal in pattern",
      "severity": "high"
    },
    {
      "type": "npm-audit",
      "message": "picomatch: Picomatch: Method Injection in POSIX Character Classes causes incorrect Glob Matching",
      "severity": "high"
    },
    {
      "type": "npm-audit",
      "message": "protobufjs: Arbitrary code execution in protobufjs",
      "severity": "critical"
    },
    {
      "type": "npm-audit",
      "message": "tar: node-tar Vulnerable to Arbitrary File Creation/Overwrite via Hardlink Path Traversal",
      "severity": "high"
    },
    {
      "type": "npm-audit",
      "message": "yaml: yaml is vulnerable to Stack Overflow via deeply nested YAML collections",
      "severity": "medium"
    }
  ],
  "status": "FAILED",
  "scannedAt": "2026-06-12T08:26:14.331Z",
  "npmAuditRan": true,
  "pipAuditRan": true,
  "promptInjectionRan": true
}

README.md

Frequently Asked Questions

What is ZotSeek?

ZotSeek is an open-source ai agents skill for AI coding assistants such as Claude Code, Codex CLI, and ChatGPT, built by introfini. AI semantic search for Zotero, with a built-in MCP server for AI agents (Claude Code, Codex). Find papers by meaning. 100% local and private. It has 166 GitHub stars.

Is ZotSeek safe to use?

ZotSeek failed SkillsLLM's automated security scan, which flagged one or more high-severity issues. Review the Security Report section carefully before using it.

How do I install ZotSeek?

Clone the repository with "git clone https://github.com/introfini/ZotSeek" and add it to your Claude Code skills directory (see the Installation section above).

What programming language is ZotSeek written in?

ZotSeek is primarily written in TypeScript. It is open-source under introfini on GitHub, so you can review or fork the full source.

Are there alternatives to ZotSeek?

Yes. SkillsLLM lists many other AI Agents skills you can browse and compare side by side. Open the AI Agents category from the badge at the top of this page, or use the Related Skills and comparison links further down to weigh ZotSeek against similar tools.

Agentic AI for Beginners

Build your first AI agent from scratch - tool use, ReAct pattern, memory, deployment

41 minBeginner

Comments (0)

to leave a comment.

No comments yet. Be the first to share your thoughts!

Related Skills

superpowers

by obra

An agentic skills framework & software development methodology that works.

234,966

Crayotter ai-coding-principles

ZotSeek | AI-Powered Semantic Search & MCP Server for Zotero

Find similar papers by meaning, not just keywords. 100% local, no data leaves your machine. Now with a built-in MCP server for AI agents.

Status: ✅ Stable release · Zotero 8 & 9 · Transformers.js running locally

New: 🤖 MCP server built in — Claude Code, Codex, and any MCP client can search your library and cite papers with links that open straight to the matched PDF page. Fully local, read-only, opt-in. Set it up in one line →

Features

🔒 100% Local - No data sent to cloud, works completely offline
🧠 True Semantic Search - Find papers by meaning, not just keywords
🤖 AI Agent Access (MCP) - Let Claude Code and other MCP clients search your library, fully local and opt-in (docs)
🔍 Find Similar Documents - Right-click any paper → discover related research
📖 Search from PDF Selection - Select text while reading → right-click → find documents about that concept
🔎 Natural Language Search - Search with queries like "machine learning in healthcare"
🔀 Multi-Query Search - Combine up to 4 queries with AND/OR logic to find topic intersections
🔗 Hybrid Search - Combines AI + keyword search for best results
⚡ Lightning Fast - Searches complete in <100ms
📑 Section-Aware - See which section matched (Abstract, Methods, Results)
📄 Matched-Passage Preview - Hover a result to read the exact passage that matched, with query terms highlighted
📍 Passage-Level Location - Jump to exact page & paragraph in Full Document mode
✅ Multi-Select in Results - Select multiple search results, right-click to add to collections
📁 Save Results as Collection - One click saves the full result set into a new Zotero collection so you can revisit the list later without re-running the search
🧩 Selectable Embedding Models - Choose from 4 curated local models, including multilingual options; non-bundled models download once from Hugging Face to your machine
🔄 Auto-Index - Automatically index new papers when you add them to your library
👥 Group Libraries (opt-in) - Extend indexing and search to your Zotero group libraries with the Index scope setting
🗑️ Auto-Cleanup - Embeddings automatically removed when items are deleted or trashed
🚫 Tag-Based Exclusion - Tag items with zotseek-exclude to skip them during indexing
📊 Indexing Status Column - "ZotSeek" column in the item list shows whether each paper is fully indexed, partially indexed (chunk limit hit), out of date, excluded, or not indexed
⏸️ Pause & Cancel - Pause or cancel long-running indexing operations at any time
💾 Crash-Resilient - Checkpoint saving every 10 items, auto-resume on next startup if a bulk run was interrupted, worker recovery on sleep/wake, skips problematic chunks automatically
🔌 Plugin API - Other Zotero plugins can call ZotSeek's search programmatically
⚙️ Configurable - Customize via Zotero Settings → ZotSeek (also accessible from search dialog)
🌐 Localized - UI available in English and Chinese (zh-CN)

Privacy & Security

ZotSeek is designed with privacy as a core principle:

Aspect	Guarantee
AI Model	Default model bundled (~130MB); optional models download once from Hugging Face on demand — no API keys, no subscription
Processing	All AI inference runs locally on your CPU/GPU
Your Papers	Only indexes items from your local Zotero library
Network	Zero network requests for search or indexing
Storage	Embeddings saved locally in `zotseek.sqlite` in your Zotero data folder
Offline	Works completely offline after installation

What this means:

Your research never leaves your machine
No cloud services, no API keys, no subscriptions
No telemetry or usage tracking
Uninstalling the plugin removes all ZotSeek data

More Screenshots

Find Similar Documents

Find Similar Results

Context Menu

PDF Selection Context Menu

Settings Panel

Indexing Progress

How It Works

The Big Picture

flowchart TD
    subgraph INDEX["1️⃣ INDEX"]
        A[📄 Paper] --> B[🤖 AI Model] --> C[768 numbers]
    end
    
    subgraph SEARCH["2️⃣ SEARCH"]
        D[🔍 Query] --> E[Query → 768 numbers]
        E --> F{Compare all papers}
        F --> G[📊 Ranked results]
    end
    
    C -.->|stored| F

How it works: Each paper becomes 768 numbers capturing its meaning. To search, we convert your query to numbers and find papers with similar numbers.

Step-by-Step Process

1️⃣ Indexing Your Library

When you use "Index Current Collection" or "Update Library Index":

For each paper:
  1. Extract title + abstract (Abstract mode)
     — OR —
     Extract PDF text page-by-page with exact page numbers (Full Document mode)
  2. Split into paragraphs, filter out References/Bibliography
  3. Send to local AI model (nomic-embed-text-v1.5)
  4. Model outputs 768 numbers per chunk (the "embedding")
  5. Save embeddings + location metadata to local database (zotseek.sqlite)

Time: ~3 seconds per chunk

2️⃣ Finding Similar Documents

When you right-click → "Find Similar Documents":

  1. Load the selected paper's embedding
  2. Compare against all indexed papers (cached in memory)
  3. Rank by semantic similarity
  4. Show top results

Time: ~70ms (with cache)

Hybrid Search

The plugin combines semantic search (AI embeddings) with Zotero's keyword search using Reciprocal Rank Fusion (RRF) for optimal results.

Search Modes

Mode	Best For	How It Works
🔗 Hybrid (Recommended)	Most searches	Combines semantic + keyword results
🧠 Semantic Only	Conceptual queries	Finds related papers by meaning
🔤 Keyword Only	Author/year searches	Exact title, author, year matching

Why Hybrid Search?

Query Type	Pure Semantic	Pure Keyword	Hybrid
"trust in AI"	✅ Great	❌ Poor	✅ Great
"Smith 2023"	❌ Poor	✅ Great	✅ Great
"RLHF"	⚠️ Maybe	✅ Exact only	✅ Both

Result Indicators

Icon	Meaning
🔗	Found by BOTH semantic and keyword (high confidence)
🧠	Found by semantic search only (conceptually related)
🔤	Found by keyword search only (exact match)

Section-Aware Results

The Source column shows which section of the paper matched your query:

Source	Section Type
Abstract	Title + Abstract
Methods	Introduction, Background, Methods
Results	Results, Discussion, Conclusions
Content	Generic (sections not detected)

Matched-Passage Preview

Hover any result row to see a tooltip with the exact passage that matched your query, along with its location (page & paragraph), section type, and match score. This lets you judge whether a result is relevant without opening the paper. In Keyword and Hybrid searches the query terms are highlighted inside the passage, and the preview is centered on the first match so the relevant text is always in view. (Pure semantic search has no literal terms to highlight, so the passage is shown without highlighting.)

Result Granularity (Full Document Mode)

When using Full Document indexing mode, you can toggle between two result views:

Mode	Results	Best For
By Section (default)	1 result per paper, best matching section, with the location of that match	Overview of matching papers
By Location	Every matching paragraph with exact page & paragraph	Finding specific passages

By Section - Aggregates all chunks per paper and shows the highest-scoring match. The Location column shows where that best match was found (page & paragraph), so you get one diverse result per paper without losing the exact location:

By Location - Returns every matching paragraph individually with its own score:

In By Location mode, clicking a result opens the PDF to the exact page where the match was found.

Multi-Query Search

Combine up to 4 search queries to find papers at the intersection of multiple topics:

Click the "+" button next to the search field to add more queries
Choose AND or OR to combine results
For AND mode, select a combination formula

Operator	Behavior	Best For
AND	Papers must match ALL queries	Finding topic intersections
OR	Papers can match ANY query	Broadening search with synonyms

AND Formula	How It Works	Use When
Minimum (default)	Uses lowest score across queries	You want strict intersection
Product	Geometric mean of scores	Balanced relevance across all queries
Average	Arithmetic mean of scores	More lenient matching

Example: Search for papers about "machine learning" AND "healthcare" AND "ethics" to find AI ethics papers specifically in the medical domain.

Match column with multiple queries: Shows combined score plus individual per-query scores:

73% (77|73|68) = 73% combined, with 77% for Q1, 73% for Q2, 68% for Q3

For technical details,