pdf-reader-mcp

Name: pdf-reader-mcp
Author: SylphxAI

Verified

📄 The PDF intelligence layer for AI agents — Agent Document Twin, evidence-first extraction, visual crops, OCR provenance, trust reports, and benchmark-gated releases. MCP server for Claude, Cursor, VS Code, and any MCP client.

815stars

70forks

TypeScript

Installation

# Add to your Claude Code skills
git clone https://github.com/SylphxAI/pdf-reader-mcp

Getting Started

Guides for using ai agents skills like pdf-reader-mcp.

Caveman: Cut Claude Token Use by 65%
How agent-side prompt compression works, when to use it, and when not to.
What is an AI Skills Marketplace?
Definitions, how marketplaces work, and how to choose between them in 2026.
Getting Started with AI Skills

Security ReportVerified

Last scanned: 5/8/2026

{
  "issues": [],
  "status": "PASSED",
  "scannedAt": "2026-05-08T05:59:11.857Z",
  "semgrepRan": false,
  "npmAuditRan": true,
  "pipAuditRan": true
}

README.md

Frequently Asked Questions

What is pdf-reader-mcp?

pdf-reader-mcp is an open-source ai agents skill for AI coding assistants such as Claude Code, Codex CLI, and ChatGPT, built by SylphxAI. 📄 The PDF intelligence layer for AI agents — Agent Document Twin, evidence-first extraction, visual crops, OCR provenance, trust reports, and benchmark-gated releases. MCP server for Claude, Cursor, VS Code, and any MCP client. It has 815 GitHub stars.

Is pdf-reader-mcp safe to use?

Yes. pdf-reader-mcp passed SkillsLLM's automated security scan — a dependency vulnerability audit plus prompt-injection heuristics — with no high-severity issues. You can read the full report in the Security Report section on this page.

How do I install pdf-reader-mcp?

Clone the repository with "git clone https://github.com/SylphxAI/pdf-reader-mcp" and add it to your Claude Code skills directory (see the Installation section above).

What programming language is pdf-reader-mcp written in?

pdf-reader-mcp is primarily written in TypeScript. It is open-source under SylphxAI on GitHub, so you can review or fork the full source.

Are there alternatives to pdf-reader-mcp?

Yes. SkillsLLM lists many other AI Agents skills you can browse and compare side by side. Open the AI Agents category from the badge at the top of this page, or use the Related Skills and comparison links further down to weigh pdf-reader-mcp against similar tools.

Agentic AI for Beginners

Build your first AI agent from scratch - tool use, ReAct pattern, memory, deployment

41 minBeginner

Comments (0)

to leave a comment.

No comments yet. Be the first to share your thoughts!

Related Skills

superpowers

by obra

An agentic skills framework & software development methodology that works.

234,966

Developers Also Liked

Based on votes and bookmarks from developers who liked this skill

Agent-Reach

by Panniantong

Give your AI agent eyes to see the entire internet. Read & search Twitter, Reddit, YouTube, GitHub, Bilibili, XiaoHongShu — one CLI, zero API fees.

54,563

hermes-browser-extension context-space

📄 PDF Reader MCP

Your agent read the PDF. Did it read the truth?

The most-starred PDF MCP server on GitHub. One call turns any PDF into an Agent Document Twin — structured text, tables, trust signals, and source evidence you can search, crop, and cite.

Local-first · One smart read_pdf call · Evidence with page + bbox · 397 tests · 39/39 release-gate checks

⭐ Star this repo if agents should cite PDFs with proof, not guess from plain text. · Quick start · See it work · Roadmap · Why not plain text?

The problem

PDFs are not text files. They are layout, pixels, tables, hidden text, scanned pages, and reading order that breaks the moment you flatten them.

Most PDF tools give agents a text dump. Tables disappear. Scanned pages go blank. Hidden text sneaks in. Citations become guesses. Then the model hallucinates — confidently.

PDF Reader MCP is built for the moment your agent needs to prove an answer, not just sound plausible.

Why not a plain text dump?

Typical PDF path	PDF Reader MCP
Dump text into context	Return markdown, chunks, tables, and a linked document map
"Trust the summary"	Page numbers, bounding boxes, crop IDs, and render evidence
Hope tables survived	Cells, geometry, confidence, warnings, continuation hints
Scanned pages silently empty	OCR path with word boxes and provenance
No idea what is risky	Trust report for hidden text, spoofing, unsafe links, injection-like content
Ship and pray	39/39 SOTA release-gate checks on every version

Full capability matrix: comparison guide.

See it work

Install once. Call once.

claude mcp add pdf-reader -- npx @sylphx/pdf-reader-mcp

{
  "sources": [{ "path": "/absolute/path/to/report.pdf" }]
}

read_pdf inspects the PDF, picks the extraction route, and returns the Agent Document Twin — no manual include_* flags required:

{
  "auto_read": {
    "workflow": "digital_text_route",
    "selected_arguments": {
      "include_markdown": true,
      "include_tables": true,
      "include_chunks": true,
      "include_trust_report": true,
      "include_document_map": true
    }
  },
  "markdown": "# Annual Report 2026\n\n## Executive Summary\n\n...",
  "tables": [
    {
      "page": 5,
      "cells": [
        { "row": 0, "col": 0, "text": "Quarter", "bbox": [72, 650, 180, 670] },
        { "row": 0, "col": 1, "text": "Revenue", "bbox": [200, 650, 300, 670] }
      ],
      "confidence": 0.95
    }
  ],
  "trust_report": { "risk_level": "low", "findings": [] }
}

Abbreviated shape — see full example and workflows.

Search, then verify the source region:

{
  "sources": [{ "path": "/absolute/path/to/report.pdf" }],
  "query": "revenue recognition",
  "max_matches_per_source": 10
}

Use the returned page and bounding box with pdf_evidence (render_page or extract_regions) when the agent needs visual proof before citing.

Evidence-first PDF workflow

Why agents use it

Need	What you get
Read the document	Markdown, JSON, HTML, page text, metadata, chunks, and semantic AST.
Prove the answer	Page numbers, bounding boxes, evidence IDs, region crops, and source renders.
Handle scanned PDFs	Rendered pages routed through configured OCR providers with word boxes and provenance.
Recover tables	Selectable-text and OCR-derived tables with cells, geometry, confidence, warnings, and continuation hints.
See what text extraction misses	Visual page evidence, focused crops, and configured visual-region provider adapters.
Protect the agent	Trust reports for hidden text, prompt-injection-like content, visual spoofing, unsafe links, and redaction.
Route accessibility work	Tagged-PDF coverage, tag-visible coverage, headings, images, forms, links, permissions, and page grades.
Ship with proof	CI, package smoke, deterministic quality benchmarks, provider artifacts, and release gates.

Quick Start

Claude Code

claude mcp add pdf-reader -- npx @sylphx/pdf-reader-mcp

Claude Desktop

Add this to claude_desktop_config.json:

{
  "mcpServers": {
    "pdf-reader": {
      "command": "npx",
      "args": ["@sylphx/pdf-reader-mcp"]
    }
  }
}

Any MCP Client

npx @sylphx/pdf-reader-mcp

Node.js >=22.13 is required. The default package works without downloading OCR models, vision models, Ollama, LM Studio, llama.cpp, or cloud credentials.

Docker

# Pre-built image from GitHub Container Registry
docker run --rm -i -v /path/to/pdfs:/workspace ghcr.io/sylphxai/pdf-reader-mcp

# Or build locally
docker build -t pdf-reader-mcp . && \
  docker run --rm -i -v /path/to/pdfs:/workspace pdf-reader-mcp

Need Cursor, VS Code, Windsurf, Cline, Warp, HTTP transport, Docker customization, or filesystem sandboxing? See the installation guide.

MCP Tool Surface

Tool	Use it when the agent needs to...
`read_pdf`	Use first. With only `sources`, it auto-inspects and reads the PDF in one call; with explicit `include_*` options, it runs precise manual extraction.
`search_pdf`	Search selectable text and optional OCR text with snippets, offsets, boxes, and provenance.
`pdf_evidence`	One focused evidence tool for `inspect`, `render_page`, `extract_regions`, `ocr_pages`, and `analyze_regions` operations.

Full request and response details live in the API reference.

Agents can force auto: false for precise manual extraction, or use auto_detail: "fast", "balanced", or "full" to control output depth without learning dozens of switches.

Agent Document Twin

The Agent Document Twin is the main reason to use this project instead of a plain text extractor. It keeps the document readable by agents while preserving the evidence needed to verify the answer.

Layer	Output
Lossless PDF layer	Text runs, lines, words, characters, fonts, transforms, page geometry, metadata coverage, outlines, forms, attachments, annotations, permissions, and structure signals where available.
Visual layer	Page renders, region crops, crop provenance, visual candidates, OCR source renders, and provider-normalized visual evidence.
Semantic layer	Page, section, paragraph, list, caption, header, footer, table, image, chart, formula, figure, and diagram nodes where available.
Evidence layer	Stable IDs, page ranges, bounding boxes, crop IDs, confidence, warnings, and extraction method provenance.
Agent layer	Markdown, JSON, HTML, citation chunks, routing plans, trust report, accessibility report, and document map indexes.

Example: Read With Evidence

{
  "sources": [{ "path": "/absolute/path/to/report.pdf" }],
  "include_markdown": true,
  "include_chunks": true,
  "include_tables": true,
  "include_text_layer": true,
  "include_document_map": true,
  "include_document_ast": true,
  "include_trust_report": true,
  "include_accessibility_report": true
}

Provider-Enabled Intelligence

The current package stays local-first. The roadmap target is a Rust MCP server with the same public tool contract, plus optional deployment-controlled providers for OCR and visual enrichment.

Capability	Default behavior	Enable with
Selectable-text PDFs	Works out of the box	No extra dependency
Rendering and crops	Works out of the box	No extra dependency
Trust and accessibility reports	Works out of the box	No extra dependency
OCR for scanned pages	Provider-ready	`MCP_PDF_OCR_*`
Visual table/chart/formula/figure/image enrichment	Provider-ready	`MCP_PDF_REGION_ANALYSIS_*`

Supported visual provider paths include local commands, local HTTP servers, Ollama, OpenAI-compatible endpoints, LM Studio, and llama.cpp. Request payloads cannot choose arbitrary executables or arbitrary provider URLs; providers are configured by the deployment environment.

# Example shape only. Point these at your own local OCR command.
export MCP_PDF_OCR_COMMAND="tesseract"
export MCP_PDF_OCR_ARGS_JSON='["{input}", "stdout", "tsv"]'

See the guide and API reference for provider configuration details.

Release Proof

Claims are backed by