by SylphxAI
๐ Production-ready MCP server for PDF processing - 5-10x faster with parallel processing and 94%+ test coverage
# Add to your Claude Code skills
git clone https://github.com/SylphxAI/pdf-reader-mcpProduction-ready PDF processing server for AI agents
5-10x faster parallel processing โข Y-coordinate content ordering โข 94%+ test coverage โข 103 tests passing
PDF Reader MCP is a production-ready Model Context Protocol server that empowers AI agents with enterprise-grade PDF processing capabilities. Extract text, images, and metadata with unmatched performance and reliability.
The Problem:
// Traditional PDF processing
- Sequential page processing (slow)
- No natural content ordering
- Complex path handling
- Poor error isolation
No comments yet. Be the first to share your thoughts!
The Solution:
// PDF Reader MCP
- 5-10x faster parallel processing โก
- Y-coordinate based ordering ๐
- Flexible path support (absolute/relative) ๐ฏ
- Per-page error resilience ๐ก๏ธ
- 94%+ test coverage โ
Result: Production-ready PDF processing that scales.
Real-world performance from production testing:
| Operation | Ops/sec | Performance | Use Case | |-----------|---------|-------------|----------| | Error handling | 12,933 | โกโกโกโกโก | Validation & safety | | Extract full text | 5,575 | โกโกโกโก | Document analysis | | Extract page | 5,329 | โกโกโกโก | Single page ops | | Multiple pages | 5,242 | โกโกโกโก | Batch processing | | Metadata only | 4,912 | โกโกโก | Quick inspection |
| Document | Sequential | Parallel | Speedup | |----------|-----------|----------|---------| | 10-page PDF | ~2s | ~0.3s | 5-8x faster | | 50-page PDF | ~10s | ~1s | 10x faster | | 100+ pages | ~20s | ~2s | Linear scaling with CPU cores |
Benchmarks vary based on PDF complexity and system resources.
claude mcp add pdf-reader -- npx @sylphx/pdf-reader-mcp
Add to claude_desktop_config.json:
{
"mcpServers": {
"pdf-reader": {
"command": "npx",
"args": ["@sylphx/pdf-reader-mcp"]
}
}
}
~/Library/Application Support/Claude/claude_desktop_config.json%APPDATA%\Claude\claude_desktop_config.json~/.config/Claude/claude_desktop_config.jsoncode --add-mcp '{"name":"pdf-reader","command":"npx","args":["@sylphx/pdf-reader-mcp"]}'
npx @sylphx/pdf-reader-mcpAdd to your Windsurf MCP config:
{
"mcpServers": {
"pdf-reader": {
"command": "npx",
"args": ["@sylphx/pdf-reader-mcp"]
}
}
}
Add to Cline's MCP settings:
{
"mcpServers": {
"pdf-reader": {
"command": "npx",
"args": ["@sylphx/pdf-reader-mcp"]
}
}
}
npx, Args: @sylphx/pdf-reader-mcpnpx -y @smithery/cli install @sylphx/pdf-reader-mcp --client claude
# Quick start - zero installation
npx @sylphx/pdf-reader-mcp
# Or install globally
npm install -g @sylphx/pdf-reader-mcp
{
"sources": [{
"path": "documents/report.pdf"
}],
"include_full_text": true,
"include_metadata": true,
"include_page_count": true
}
Result:
{
"sources": [{
"path": "documents/manual.pdf",
"pages": "1-5,10,15-20"
}],
"include_full_text": true
}
// Windows - Both formats work!
{
"sources": [{
"path": "C:\\Users\\John\\Documents\\report.pdf"
}],
"include_full_text": true
}
// Unix/Mac
{
"sources": [{
"path": "/home/user/documents/contract.pdf"
}],
"include_full_text": true
}
No more "Absolute paths are not allowed" errors!
{
"sources": [{
"path": "presentation.pdf",
"pages": [1, 2, 3]
}],
"include_images": true,
"include_full_text": true
}
Response includes:
{
"sources": [
{ "path": "C:\\Reports\\Q1.pdf", "pages": "1-10" },
{ "path": "/home/user/Q2.pdf", "pages": "1-10" },
{ "url": "https://example.com/Q3.pdf" }
],
"include_full_text": true
}
โก All PDFs processed in parallel automatically!
// โ
Windows
{ "path": "C:\\Users\\John\\Documents\\report.pdf" }
{ "path": "C:/Users/John/Documents/report.pdf" }
// โ
Unix/Mac
{ "path": "/home/john/documents/report.pdf" }
{ "path": "/Users/john/Documents/report.pdf" }
// โ
Relative (still works)
{ "path": "documents/report.pdf" }
Other Improvements:
v1.2.0 - Content Ordering
v1.1.0 - Image Extraction & Performance
read_pdf ToolThe single tool that handles all PDF operations.
| Parameter | Type | Description | Default |
|-----------|------|-------------|---------|
| sources | Array | List of PDF sources to process | Required |
| include_full_text | boolean | Extract full text content | false |
| include_metadata | boolean | Extract PDF metadata | true |
| include_page_count | boolean | Include total page count | true |
| include_images | boolean | Extract embedded images | false |
{
path?: string; // Local file path (absolute or relative)
url?: string; // HTTP/HTTPS URL to PDF
pages?: string | number[]; // Pages to extract: "1-5,10" or [1,2,3]
}
Metadata only (fast):
{
"sources": [{ "path": "large.pdf" }],
"include_metadata": true,
"include_page_count": true,
"include_full_text": false
}
From URL:
{
"sources": [{
"url": "https://arxiv.org/pdf/2301.00001.pdf"
}],
"include_full_text": true
}
Page ranges:
{