by QuantaAlpha
Repo-level benchmark for real-world Code Agents: from repo understanding → env setup → incremental dev/bug-fixing → task delivery, with cost-aware α metric.
# Add to your Claude Code skills
git clone https://github.com/QuantaAlpha/GitTaskBench2025.09.19 🎉 Excited to announce that our papers have been accepted to <u>NeurIPS 2025</u> — RepoMaster as a Spotlight (≈3.2%) and SE-Agent as a Poster (≈24.52%)!
2025.08.28 🎉 We open-sourced RepoMaster — an AI agent that leverages GitHub repos to solve complex real-world tasks.
2025.08.26 🎉 We open-sourced — a repo-level benchmark & tooling suite for real-world tasks.
2025.08.10 🎉 We open-sourced SE-Agent — a self-evolution trajectory framework for multi-step reasoning.
🔗 Ecosystem: RepoMaster · GitTaskBench · SE-Agent · Team Homepage
The ultimate vision for AI agents is to enable users to accomplish real-world tasks simply by describing their needs in natural language—leaving all planning and execution to the agent, which delivers the final results autonomously.
⚠️ While existing benchmarks evaluate various agent capabilities, few focus on tasks that reflect genuine real-world practicality, especially those requiring comprehensive understanding and use of full-scale project repositories.
👋 To address this gap, we introduce GitTaskBench. Our benchmark focuses on tasks whose complexity and practical value demand leveraging repository-level code, mirroring how developers solve real problems using...