by blazickjp
A Model Context Protocol server for searching and analyzing arXiv papers
# Add to your Claude Code skills
git clone https://github.com/blazickjp/arxiv-mcp-serverNo comments yet. Be the first to share your thoughts!
π Enable AI assistants to search and access arXiv papers through a simple MCP interface.
The ArXiv MCP Server provides a bridge between AI assistants and arXiv's research repository through the Model Context Protocol (MCP). It allows AI models to search for papers and access their content in a programmatic way.
π€ Contribute β’ π Report Bug
Paper content retrieved from arXiv is untrusted external input.
When an AI assistant downloads or reads a paper through this server, the paper's text is passed directly into the model's context. A maliciously crafted paper could embed adversarial instructions designed to hijack the AI's behavior β for example, instructing it to exfiltrate data, invoke other tools with unintended arguments, or override system-level instructions. This is a known class of attack described by OWASP as LLM01: Prompt Injection and by the OWASP Agentic AI framework as AG01: Prompt Injection in LLM-Integrated Systems.
To install ArXiv Server for Claude Desktop automatically via Smithery:
npx -y @smithery/cli install arxiv-mcp-server --client claude
Important β use
uv tool install, notuv pip installRunning
uv pip install arxiv-mcp-serverinstalls the package into the current virtual environment but does not place thearxiv-mcp-serverexecutable on yourPATH. You must useuv tool installso that uv creates an isolated environment and exposes the executable globally:
uv tool install arxiv-mcp-server
After this, the arxiv-mcp-server command will be available on your PATH.
PDF fallback (older papers): Most arXiv papers have an HTML version which the base install handles automatically. For older papers that only have a PDF, the server needs the
[pdf]extra (pymupdf4llm). Install it with:uv tool install 'arxiv-mcp-server[pdf]'
You can verify it with:
arxiv-mcp-server --help
If you previously ran uv pip install arxiv-mcp-server and the command is
missing, uninstall it and re-install with uv tool install as shown above.
For development:
# Clone and set up development environment
git clone https://github.com/blazickjp/arxiv-mcp-server.git
cd arxiv-mcp-server
# Create and activate virtual environment
uv venv
source .venv/bin/activate
# Install with test dependencies (development only β no global executable)
uv pip install -e ".[test]"
This repository now includes a Codex plugin manifest at .codex-plugin/plugin.json
and a portable MCP config at .mcp.json so Codex-oriented tooling can discover
the server without inventing its own install recipe.
The Codex integration uses the same stdio launch path documented elsewhere in this README:
{
"mcpServers": {
"arxiv": {
"command": "uvx",
"args": ["arxiv-mcp-server"]
}
}
}
If your Codex client supports plugin manifests, point it at
./.codex-plugin/plugin.json. If it only supports raw MCP configuration, use
./.mcp.json directly.
Add this configuration to your MCP client config file:
{
"mcpServers": {
"arxiv-mcp-server": {
"command": "uv",
"args": [
"tool",
"run",
"arxiv-mcp-server",
"--storage-path", "/path/to/paper/storage"
]
}
}
}
For Development:
{
"mcpServers": {
"arxiv-mcp-server": {
"command": "uv",
"args": [
"--directory",
"path/to/cloned/arxiv-mcp-server",
"run",
"arxiv-mcp-server",
"--storage-path", "/path/to/paper/storage"
]
}
}
}
arXiv papers are user-generated, untrusted content. Paper text returned by this server may contain prompt injection attempts β crafted text designed to manipulate an AI assistant's behavior. Treat all paper content as untrusted input.
In production environments, apply appropriate sandboxing and avoid feeding raw paper content into agentic pipelines that have access to sensitive tools or data without review. See SECURITY.md for the full security policy.
The typical workflow for deep paper research is:
search_papers β download_paper β read_paper
list_papers shows what you have locally. semantic_search searches across your local collection.
Search arXiv with optional category, date, and boolean filters. Enforces arXiv's 3-second rate limit automatically. If rate limited, wait 60 seconds before retrying.
result = await call_tool("search_papers", {
"query": "\"KAN\" OR \"Kolmogorov-Arnold Networks\"",
"max_results": 10,
"date_from": "2024-01-01",
"categories": ["cs.LG", "cs.AI"],
"sort_by": "date" # or "relevance" (default)
})
Supported categories include cs.AI, cs.LG, cs.CL, cs.CV, cs.NE, stat.ML, math.OC, quant-ph, eess.SP, and more. See tool description for the full list.
Download a paper by its arXiv ID. Tries HTML first, falls back to PDF. Stores the paper locally for read_paper and semantic_search.
result = await call_tool("download_paper", {
"paper_id": "2401.12345"
})
For older papers that only have a PDF, install the
[pdf]extra:uv tool install 'arxiv-mcp-server[pdf]'
List all papers downloaded locally. Returns arXiv IDs only β use read_paper to access content.
result = await call_tool("list_papers", {})
Read the full text of a locally downloaded paper in markdown. Requires download_paper to be called first.
result = await call_tool("read_paper", {
"paper_id": "2401.12345"
})
The server offers specialized prompts to help analyze academic papers: