by showlab
π» A curated list of papers and resources for multi-modal Graphical User Interface (GUI) agents.
# Add to your Claude Code skills
git clone https://github.com/showlab/Awesome-GUI-AgentA curated list of papers, projects, and resources for multi-modal Graphical User Interface (GUI) agents.
WELCOME CONTRIBUTE!
π₯ This project is actively maintained, and we welcome your contributions. If you have any suggestions, such as missing papers or information, please feel free to open an issue or submit a pull request.
π€ Try our Awesome-Paper-Agent. Just provide an arXiv URL link, and it will automatically return formatted information, like this:
User:
https://arxiv.org/abs/2312.13108
GPT:
+ [AssistGUI: Task-Oriented Desktop Graphical User Interface Automation](https://arxiv.org/abs/2312.13108) (Dec. 2023)
[](https://github.com/showlab/assistgui)
[](https://arxiv.org/abs/2312.13108)
[](https://showlab.github.io/assistgui/)
So then you can easily copy and use this information in your pull requests.
β If you find this repository useful, please give it a star.
Quick Navigation: [Datasets / Benchmarks] [Models / Agents] [Surveys] [Projects]
World of Bits: An Open-Domain Platform for Web-Based Agents (Aug. 2017, ICML 2017)
No comments yet. Be the first to share your thoughts!
A Unified Solution for Structured Web Data Extraction (Jul. 2011, SIGIR 2011)
Reinforcement Learning on Web Interfaces using Workflow-Guided Exploration (Feb. 2018, ICLR 2018)
Mapping Natural Language Instructions to Mobile UI Action Sequences (May. 2020, ACL 2020)
WebSRC: A Dataset for Web-Based Structural Reading Comprehension (Jan. 2021, EMNLP 2021)
Language Models can Solve Computer Tasks (Mar. 2023)
Mind2Web: Towards a Generalist Agent for the Web (Jun. 2023)
Interactive Evolution: A Neural-Symbolic Self-Training Framework For Large Language Models (Nov. 2023)
AssistGUI: Task-Oriented Desktop Graphical User Interface Automation (Dec. 2023, CVPR 2024)
VisualWebArena: Evaluating Multimodal Agents on Realistic Visual Web Tasks (Jan. 2024, ACL 2024)
OmniACT: A Dataset and Benchmark for Enabling Multimodal Generalist Autonomous Agents for Desktop and Web (Feb. 2024)