by modelscope
The evaluation benchmark on MCP servers
# Add to your Claude Code skills
git clone https://github.com/modelscope/MCPBench[![Documentation][docs-image]][docs-url] [![Package License][package-license-image]][package-license-url]
</div> <div align="center"> <h4 align="center"> </h4> </div>MCPBench is an evaluation framework for MCP Servers. It supports the evaluation of three types of servers: Web Search, Database Query and GAIA, and is compatible with both local and remote MCP Servers. The framework primarily evaluates different MCP Servers (such as Brave Search, DuckDuckGo, etc.) in terms of task completion accuracy, latency, and token consumption under the same LLM and Agent configurations. Here is the evaluation report.
<img src="assets/figure1.png" alt="MCPBench Overview" width="600"/><hr>The implementation refers to LangProBe: a Language Programs Benchmark.
Big thanks to Qingxu Fu for the initial implementation!
Sep. 1, 2025 π Modelscope AI hackathon will be hold on Sep. 23rd, ref: https://modelscope.cn/active/aihackathon-mcp-agentApr. 29, 2025Apr. 14, 2025 π We are proud to announce that MCPBench is now open-sourced.The framework requires Python version >= 3.11, nodejs and jq.
conda create -n mcpbench python=3.11 -y
conda activate mcpbench
pip install -r requirements.txt
Please first determine the type of MCP server you want to use:
First, you need to write the following configuration:
{
"mcp_pool": [
{
"name": "firecrawl",
"run_config": [
{
"command": "npx -y firecrawl-mcp",
"args": "FIRECRAWL_API_KEY=xxx",
"port": 8005
}
]
}
]
}
Save this config file in the configs folder and launch it using:
sh launch_mcps_as_sse.sh YOUR_CONFIG_FILE
For example, save the above configuration in the configs/firecrawl.json file and launch it using:
sh launch_mcps_as_sse.sh firecrawl.json