by comet-ml
Model Context Protocol (MCP) implementation for Opik enabling seamless IDE integration and unified access to prompts, projects, traces, and metrics.
# Add to your Claude Code skills
git clone https://github.com/comet-ml/opik-mcpGuides for using mcp servers skills like opik-mcp.
No comments yet. Be the first to share your thoughts!
Model Context Protocol server for Opik + Ollie. Plug your AI host (Claude Code, Cursor, VS Code Copilot, MCP Inspector) directly into your Opik workspace — read traces, log scores, save prompt versions, and ask Ollie investigative questions, all from the chat.
Built for LLM engineers who already run Opik and want to drive it from the same AI assistant they code with.
You: "Why did the experiment 'gpt-4o-rerank-v3' regress on factuality?"
Claude: → ask_ollie → reads experiment + traces → "Three traces failed because…"
You: "Score trace 7f2e… 0.9 on helpfulness with reason 'great recovery'."
Claude: → write(score.create) → done
opik-mcp is a Python package (requires Python 3.13+). The recommended way to
run it is uvx, which fetches and runs the latest published version on demand —
no global install, no virtualenv juggling.
Install uv once:
curl -LsSf https://astral.sh/uv/install.sh | sh # macOS / Linux
# or: brew install uv
You'll need two things from your Opik workspace:
OPIK_API_KEY — get it from comet.com/api/my/settings/.COMET_WORKSPACE — your workspace name (lowercase, as it appears in the URL). E.g. https://www.comet.com/acme-ai/... → COMET_WORKSPACE=acme-ai. Required for ask_ollie; optional but recommended everywhere else (used for scoping and analytics).Pre-release note:
opik-mcp(Python) is not yet published to PyPI. Until the first PyPI release lands, replaceuvx opik-mcpin any snippet below with:uvx --from git+https://github.com/comet-ml/opik-mcp.git opik-mcp
Add the server with one command:
claude mcp add --transport stdio opik-mcp \
--env OPIK_API_KEY=<your-key> \
--env COMET_WORKSPACE=<your-workspace> \
-- uvx opik-mcp
Or edit ~/.claude.json directly:
{
"mcpServers": {
"opik-mcp": {
"type": "stdio",
"command": "uvx",
"args": ["opik-mcp"],
"env": {
"OPIK_API_KEY": "<your-key>",
"COMET_WORKSPACE": "<your-workspace>"
}
}
}
}
Restart Claude Code. Verify with /mcp — opik-mcp should appear as connected.
Then, in the chat, ask: "list my Opik projects" — Claude will call the list
tool and you'll see your workspace's projects.
Edit ~/.cursor/mcp.json (global) or .cursor/mcp.json (project), or open
Cmd+Shift+J → Features → Model Context Protocol:
{
"mcpServers": {
"opik-mcp": {
"type": "stdio",
"command": "uvx",
"args": ["opik-mcp"],
"env": {
"OPIK_API_KEY": "<your-key>",
"COMET_WORKSPACE": "<your-workspace>"
}
}
}
}
Reload Cursor; the green dot next to opik-mcp in the MCP panel confirms the
connection. Ask in chat: "list my Opik projects".
Cursor 60s timeout. Cursor enforces a hard tool-call timeout that doesn't reset on progress notifications. Long
ask_ollieturns will fail on Cursor. See Known host limits.
.vscode/mcp.json in your workspace (or User Settings JSON):
{
"servers": {
"opik-mcp": {
"type": "stdio",
"command": "uvx",
"args": ["opik-mcp"],
"env": {
"OPIK_API_KEY": "<your-key>",
"COMET_WORKSPACE": "<your-workspace>"
}
}
}
}
Reload the window; the Copilot Chat MCP indicator shows opik-mcp once
the server is reachable. Ask in chat: "list my Opik projects".
OPIK_API_KEY=<your-key> COMET_WORKSPACE=<your-workspace> \
npx @modelcontextprotocol/inspector uvx opik-mcp
Add COMET_URL_OVERRIDE (and OPIK_URL if Opik lives at a non-default path) to
the same env block in your host config:
{
"mcpServers": {
"opik-mcp": {
"type": "stdio",
"command": "uvx",
"args": ["opik-mcp"],
"env": {
"OPIK_API_KEY": "<your-key>",
"COMET_URL_OVERRIDE": "https://opik.your-company.com",
"OPIK_MCP_ANALYTICS_SOURCE": ""
}
}
}
}
ask_ollie and run_experiment are available on Comet Cloud only — on
self-hosted those calls will fail at dispatch, so use read / list / write
directly. Setting OPIK_MCP_ANALYTICS_SOURCE="" opts your install out of the
cloud-Comet source label on telemetry events.
opik-mcp exposes a small, outcome-oriented surface — six tools that cover
the full lifecycle (read → annotate → curate → author → iterate).
| Tool | Purpose |
|---|---|
| read | Universal read by id / name / opik:// URI |
| list | Universal list with optional name filter + pagination |
| ask_ollie | Investigate / synthesize via the Opik in-product assistant |
| write | Universal write — log traces/spans, score, comment, save prompts, manage test suites & experiments |
| schema | Introspect write-operation schemas (used by the LLM to construct valid payloads) |
| run_experiment | Run an evaluation experiment end-to-end via Ollie |
readOne tool for any "show me X" question. Takes an entity_type plus an id
(UUID or, for nameable types, a name) or a full opik:// URI. Composite reads
(trace, prompt) inline their children so a single call returns the full
picture.
Supported entities: project, trace, span, test_suite, experiment,
prompt. Name-based lookup is available for project, experiment, prompt,
test_suite (slower — two API calls — and may return multiple matches).
read(entity_type="trace", id="7f2e3c8a-…")
read(entity_type="project", id="demo") # name lookup
read(entity_type="trace", id="opik://traces/7f2e3c8a-…")
listBrowse a collection with optional name filter and pagination. Project-scoped
types (trace, test_suite_item, prompt_version) require their parent UUID.
list(entity_type="experiment", page=1, size=25)
list(entity_type="experiment", name="rerank") # name substring filter
list(entity_type="trace", project_id="<project-uuid>") # traces of one project
ask_ollieFor investigative questions, cross-entity synthesis, or anything that needs Opik domain expertise. Ollie has direct read access to your workspace and can execute writes (scores, comments, test-suite items, prompt versions) mid-stream when asked.
ask_ollie(query="Why are spans in project 'demo' slower this week than last?")
ask_ollie(query="Compare experiments A and B on factuality. Score the bottom 5 traces of A 0.2 with reason.")
Returns the assistant's final text plus a thread_id. Pass it back on
follow-ups to preserve context — Ollie has no memory across threads.
YOLO mode (default). Writes Ollie performs mid-stream execute without a
per-action confirmation. Each auto-approval is logged as a JSON audit row on
the opik_mcp.audit Python logger. To require confirmation instead, set
OPIK_MCP_AUTO_APPROVE=disabled — Ollie's confirm requests then surface as
typed errors you can manually re-issue.
Available on Comet Cloud only.
writeUniversal write dispatcher. Pass operation + data and the dispatcher
validates the payload, applies the right REST verb, and returns the
backend response.
Operations:
| Operation | What it does |
|---|---|
| trace.create | Log a single trace (or a batch). Parent for spans / scores / comments. |
| trace.update | Finalize or amend an existing trace. |
| span.create | Log a span on an existing trace (or a batch). |
| score.create | Attach a numeric feedback score to a trace, span, or thread. |
| comment.create | Attach a free-text comment to a trace, span, or thread. |
| prompt_version.save | Save a new prompt version (creates the prompt by name if missing). |
| test_suite.create | Create an evaluation test suite. |
| test_suite_item.upsert | Upsert items into a test suite (always the envelope shape). |
| experiment.create | Create an experiment scoped to a test suite. |
| experiment_item.create | Attach trace + dataset_item rows to an experiment. |
write(operation="score.create", data={
"target": "trace",
"target_id": "7f2e3c8a-…",
"name": "helpfulness",
"value": 0.9,
"reason": "great recovery"
})
schemaInspect the exact JSON shape and required fields of any write operation before
you call it — useful when you're not sure what data should look like. Returns
the schema, OAuth scope, and one validated example. Pure lookup, no backend
call.
schema(operation="score.create")
schema(operation="prompt_version.save")
run_experimentRun an evaluation experiment end-to-end via Ollie. Takes a single
experiment_config dict that mirrors Opik's experiment shape (prompt, test
suite, scorers); Ollie executes the run and writes results back as an Opik
experiment.
run_experiment(experiment_config={
"test_suite_name": "qa-eval-v2",
"prompt_name": "welcome-msg",
# … see `schema(operation="experiment.create")` for the full shape
})
Available on Comet Cloud only.
Every setting is an environment variable. Required ones in bold.
| Variable | Default | Notes |
|---|---|---|
| OPIK_API_KEY | — | Required for ask_ollie and any authenticated read/write. |
| COMET_WORKSPACE | — | Workspace name. Required for ask_ollie. |
| COMET_WORKSPACE_ID | — | Optional workspace UUID. Stamped into analytics events when set so BI can join on a stable id rather than the (mutable) workspace name. |
| COMET_URL_OVERRIDE | https://www.comet.com | Set to your self-hosted Comet host, or https://dev.comet.com for staging. |
| OPIK_URL | derived from COMET_URL_OVERRIDE + /opik/api | Override only if Opik lives on a diffe