<h1 align="center"> 🦊 MCPBench: A Benchmark for Evaluating MCP Servers </h1> <div align="center">

[![Documentation][docs-image]][docs-url] [![Package License][package-license-image]][package-license-url]

</div> <div align="center"> <h4 align="center">

</h4> </div>

MCPBench is an evaluation framework for MCP Servers. It supports the evaluation of three types of servers: Web Search, Database Query and GAIA, and is compatible with both local and remote MCP Servers. The framework primarily evaluates different MCP Servers (such as Brave Search, DuckDuckGo, etc.) in terms of task completion accuracy, latency, and token consumption under the same LLM and Agent configurations. Here is the evaluation report.

The implementation refers to LangProBe: a Language Programs Benchmark.
Big thanks to Qingxu Fu for the initial implementation!

<hr>

📋 Table of Contents

🔥 News
🛠️ Installation
🚀 Quick Start
- Launch MCP Server
- Launch Evaluation
🧂 Datasets and Experiments
🚰 Cite

🔥 News

Sep. 1, 2025 🌟 Modelscope AI hackathon will be hold on Sep. 23rd, ref: https://modelscope.cn/active/aihackathon-mcp-agent
🌟 Update the code for evaluating the MCP Server Package within GAIA.

MCPBench

📋 Table of Contents

🔥 News

🛠️ Installation

🚀 Quick Start

Launch MCP Server (optional for stdio)

Launch Evaluatio...