by microsoft
Windows Agent Arena (WAA) 🪟 is a scalable OS platform for testing and benchmarking of multi-modal AI agents.
# Add to your Claude Code skills
git clone https://github.com/microsoft/WindowsAgentArenaWindows Agent Arena (WAA) 🪟 is a scalable Windows AI agent platform for testing and benchmarking multi-modal, desktop AI agents. WAA provides researchers and developers with a reproducible and realistic Windows OS environment for AI research, where agentic AI workflows can be tested across a diverse range of tasks.
WAA supports the deployment of agents at scale using the Azure ML cloud infrastructure, allowing for the parallel running of multiple agents and delivering quick benchmark results for hundreds of tasks in minutes, not days.
<div align="center"> <video src="https://github.com/user-attachments/assets/e0a8d88d-d28a-493d-b74f-2455f36c21f1" alt="waa_intro"> </div>diff_lvl="normal" to diff_lvl="hard" in src/win-arena-container/start_client.sh. Under the harder difficulty, in many tasks, agents must also learn to initialize/set up the task themselves (e.g., finding and opening the right program/application for the task) rather than have the task "set up" for them by the task config../run-local.sh --som-origin mixed-omni --gpu-enabled true