DINO-X MCP Server

English | 中文

DINO-X Official MCP Server — powered by the DINO-X and Grounding DINO models — brings fine-grained object detection and image understanding to your multimodal applications.

<p align="center"> <video width="800" controls> <source src="https://dds-frontend.oss-cn-shenzhen.aliyuncs.com/dinox-mcp/dinox-mcp-en-overveiw.mp4" type="video/mp4"> Your browser does not support the video tag. </video> </p>

Why DINO-X MCP?

With DINO-X MCP, you can:

Fine-Grained Understanding: Full image detection, object detection, and region-level descriptions.
Structured Outputs: Get object categories, counts, locations, and attributes for VQA and multi-step reasoning tasks.
Composable: Works seamlessly with other MCP servers to build end-to-end visual agents or automation pipelines.

Transport Modes

DINO-X MCP supports two transport modes:

DINO-X-MCP

DINO-X MCP Server

Why DINO-X MCP?

Transport Modes

Quick Start

1. Prepare an MCP client

2. Get your API key

3. Configure MCP

Option A: Official Hosted Streamable HTTP (Recommended)

Option B: Use the NPM package locally (STDIO)