by rawveg
An MCP Server for Ollama
# Add to your Claude Code skills
git clone https://github.com/rawveg/ollama-mcpSupercharge your AI assistant with local LLM access
An MCP (Model Context Protocol) server that exposes the complete Ollama SDK as MCP tools, enabling seamless integration between your local LLM models and MCP-compatible applications like Claude Desktop and Cline.
Features โข Installation โข Available Tools โข Configuration โข Retry Behavior โข Development
No comments yet. Be the first to share your thoughts!
This MCP server gives Claude the tools to interact with Ollama - but you'll get even more value by also installing the Ollama Skill from the Skillsforge Marketplace:
The Ollama Skill teaches Claude:
Install both for the complete experience:
Result: Claude doesn't just have the car - it knows how to drive! ๐๏ธ
Add to your Claude Desktop config (~/Library/Application Support/Claude/claude_desktop_config.json on macOS):
{
"mcpServers": {
"ollama": {
"command": "npx",
"args": ["-y", "ollama-mcp"]
}
}
}
npm install -g ollama-mcp
Add to your Cline MCP settings (cline_mcp_settings.json):
{
"mcpServers": {
"ollama": {
"command": "npx",
"args": ["-y", "ollama-mcp"]
}
}
}
| Tool | Description |
|------|-------------|
| ollama_list | List all available local models |
| ollama_show | Get detailed information about a specific model |
| ollama_pull | Download models from Ollama library |
| ollama_push | Push models to Ollama library |
| ollama_copy | Create a copy of an existing model |
| ollama_delete | Remove models from local storage |
| ollama_create | Create custom models from Modelfile |
| Tool | Description |
|------|-------------|
| ollama_ps | List currently running models |
| ollama_generate | Generate text completions |
| ollama_chat | Interactive chat with models (supports tools/functions) |
| ollama_embed | Generate embeddings for text |
| Tool | Description |
|------|-------------|
| ollama_web_search | Search the web with customizable result limits (requires OLLAMA_API_KEY) |
| ollama_web_fetch | Fetch and parse web page content (requires OLLAMA_API_KEY) |
Note: Web tools require an Ollama Cloud API key. They connect to
https://ollama.com/apifor web search and fetch operations.
| Variable | Default | Description |
|----------|---------|-------------|
| OLLAMA_HOST | http://127.0.0.1:11434 | Ollama server endpoint (use https://ollama.com for cloud) |
| OLLAMA_API_KEY | - | API key for Ollama Cloud (required for web tools and cloud models) |
{
"mcpServers": {
"ollama": {
"command": "npx",
"args": ["-y", "ollama-mcp"],
"env": {
"OLLAMA_HOST": "http://localhost:11434"
}
}
}
}
To use Ollama's cloud platform with web search and fetch capabilities:
{
"mcpServers": {
"ollama": {
"command": "npx",
"args": ["-y", "ollama-mcp"],
"env": {
"OLLAMA_HOST": "https://ollama.com",
"OLLAMA_API_KEY": "your-ollama-cloud-api-key"
}
}
}
}
Cloud Features:
ollama_web_search (requires API key)ollama_web_fetch (requires API key)Get your API key: Visit ollama.com to sign up and obtain your API key.
You can use both local and cloud models by pointing to your local Ollama instance while providing an API key:
{
"mcpServers": {
"ollama": {
"command": "npx",
"args": ["-y", "ollama-mcp"],
"env": {
"OLLAMA_HOST": "http://127.0.0.1:11434",
"OLLAMA_API_KEY": "your-ollama-cloud-api-key"
}
}
}
}
This configuration:
The MCP server includes intelligent retry logic for handling transient failures when communicating with Ollama APIs:
Web Tools (ollama_web_search and ollama_web_fetch):
Retry-After header when provided by the APIRetry-After is not presentThe server intelligently handles the standard HTTP Retry-After header in two formats:
1. Delay-Seconds Format:
Retry-After: 60
Waits exactly 60 seconds before retrying.
2. HTTP-Date Format:
Retry-After: Wed, 21 Oct 2025 07:28:00 GMT
Calculates delay until the specified timestamp.
When Retry-After is not provided or invalid:
random(0, min(initialDelay ร 2^attempt, maxDelay))Example retry delays:
Retried Errors (transient failures):
Non-Retried Errors (permanent failures):
The retry mechanism ensures robust handling of temporary API issues while respecting server-provided retry guidance and preventing excessive request rates. Transient 5xx errors (500, 502, 503, 504) are safe to retry for the idempotent POST operations used by ollama_web_search and ollama_web_fetch. Individual requests timeout after 30 seconds to prevent indefinitely hung connections.
// MCP clients can invoke:
{
"tool": "ollama_chat",
"arguments": {
"model": "llama3.2:latest",
"messages": [
{ "role": "user", "content": "Explain quantum computing" }
]
}
}
{
"tool": "ollama_embed",
"arguments": {
"model": "nomic-embed-text",
"input": ["Hello world", "Embeddings are great"]
}
}
{
"tool": "ollama_web_search",
"arguments": {
"query": "latest AI developments",
"max_results": 5
}
}
This server uses a hot-swap autoloader pattern:
src/
โโโ index.ts # Entry point (27 lines)
โโโ server.ts # MCP server creation
โโโ autoloader.ts # Dynamic tool discovery
โโโ tools/ # Tool implementations
โโโ chat.ts # Each exports toolDefinition
โโโ generate.ts
โโโ ...
Key Benefits:
src/tools/# Clone repository
git clone https://github.com/rawveg/ollama-mcp.git
cd ollama-mcp
# Install dependencies
npm install
# Build project
np