by showlab
💻 A curated list of papers and resources for multi-modal Graphical User Interface (GUI) agents.
# Add to your Claude Code skills
git clone https://github.com/showlab/Awesome-GUI-AgentGuides for using ai agents skills like Awesome-GUI-Agent.
Last scanned: 4/30/2026
{
"issues": [],
"status": "PASSED",
"scannedAt": "2026-04-30T06:28:13.464Z",
"semgrepRan": false,
"npmAuditRan": true,
"pipAuditRan": true
}No comments yet. Be the first to share your thoughts!
30 days in the Featured rail · terms & refunds
A curated list of papers, projects, and resources for multi-modal Graphical User Interface (GUI) agents.
WELCOME CONTRIBUTE!
🔥 This project is actively maintained, and we welcome your contributions. If you have any suggestions, such as missing papers or information, please feel free to open an issue or submit a pull request.
🤖 Try our Awesome-Paper-Agent. Just provide an arXiv URL link, and it will automatically return formatted information, like this:
User:
https://arxiv.org/abs/2312.13108
GPT:
+ [AssistGUI: Task-Oriented Desktop Graphical User Interface Automation](https://arxiv.org/abs/2312.13108) (Dec. 2023)
[](https://github.com/showlab/assistgui)
[](https://arxiv.org/abs/2312.13108)
[](https://showlab.github.io/assistgui/)
So then you can easily copy and use this information in your pull requests.
⭐ If you find this repository useful, please give it a star.
Quick Navigation: [Datasets / Benchmarks] [Models / Agents] [Surveys] [Projects]
World of Bits: An Open-Domain Platform for Web-Based Agents (Aug. 2017, ICML 2017)
A Unified Solution for Structured Web Data Extraction (Jul. 2011, SIGIR 2011)
Rico: A Mobile App Dataset for Building Data-Driven Design Applications (Oct. 2017)
Reinforcement Learning on Web Interfaces using Workflow-Guided Exploration (Feb. 2018, ICLR 2018)
Mapping Natural Language Instructions to Mobile UI Action Sequences (May. 2020, ACL 2020)
WebSRC: A Dataset for Web-Based Structural Reading Comprehension (Jan. 2021, EMNLP 2021)
AndroidEnv: A Reinforcement Learning Platform for Android (May. 2021)
A Dataset for Interactive Vision-Language Navigation with Unknown Command Feasibility (Feb. 2022)
META-GUI: Towards Multi-modal Conversational Agents on Mobile GUI (May. 2022)
WebShop: Towards Scalable Real-World Web Interaction with Grounded Language Agents (Jul. 2022)
Language Models can Solve Computer Tasks (Mar. 2023)
Mobile-Env: Building Qualified Evaluation Benchmarks for LLM-GUI Interaction (May. 2023)
Mind2Web: Towards a Generalist Agent for the Web (Jun. 2023)
Android in the Wild: A Large-Scale Dataset for Android Device Control (Jul. 2023)
WebArena: A Realistic Web Environment for Building Autonomous Agents (Jul. 2023)
Interactive Evolution: A Neural-Symbolic Self-Training Framework For Large Language Models (Nov. 2023)
AssistGUI: Task-Oriented Desktop Graphical User Interface Automation (Dec. 2023, CVPR 2024)
VisualWebArena: Evaluating Multimodal Agents on Realistic Visual Web Tasks (Jan. 2024, ACL 2024)
OmniACT: A Dataset and Benchmark for Enabling Multimodal Generalist Autonomous Agents for Desktop and Web (Feb. 2024)
WebLINX: Real-World Website Navigation with Multi-Turn Dialogue (Feb. 2024)
On the Multi-turn Instruction Following for Conversational Web Agents (Feb. 2024)
AgentStudio: A Toolkit for Building General Virtual Agents (Mar. 2024)