by Core-Mate
OpenGUI is an Android GUI agent framework for phone-use AI that can see, plan, and operate real mobile apps through the GUI.
# Add to your Claude Code skills
git clone https://github.com/Core-Mate/OpenGUIGuides for using ai agents skills like OpenGUI.
Demo video coming soon.
The first demo will show OpenGUI operating a real Android app on an Android device, including screen understanding, tapping, typing, and returning a structured result.
The fastest way to try OpenGUI is to let Claude Code or Codex bootstrap it for you.
Read ./skills/open-gui-bootstrap/SKILL.md and help me run OpenGUI. Only ask me for phone-side actions.
You will need:
OpenGUI will use the repository scripts to start the backend and install the Android client:
cd server
./start.sh
cd client
./start.sh
After the backend and Android client are running, send a first task:
cd server
pnpm opengui -- devices --json
pnpm opengui -- do "Observe the current Android screen and summarize what you see" --json
Manual setup guide: docs/get-started.md.
[2026.5.16] Added Codex / Claude Code remote control with a local REST API, pnpm opengui -- ... CLI, and the open-gui-remote-control Skill for dispatching Android app tasks from coding agents.[2026.5.9] Added a Discord IM channel for remote Android task dispatch, including prefix commands, slash commands, allowlists, and guild-scoped command registration.[2026.5.7] Hardened local startup to avoid common PostgreSQL and Redis port conflicts during Docker-based backend setup.[2026.5.1] Improved backend onboarding with .env.example, startup checks, and graph-agent VLM environment configuration.OpenGUI provides an Android GUI agent stack for screen understanding, task planning, action execution, review, and recovery.
You can use the same repository in four practical ways:
Operate mainstream Android apps: let AI handle mobile tasks inside X, Reddit, Hacker News, Telegram, WeChat, Weibo, Xiaohongshu, and other Android apps on a real phone.
Run shipped workflows: the repository already includes a runnable backend, Android client, standby dispatch path, and a set of built-in task capabilities.
Let Claude or Codex bootstrap it for you: point the model at skills/open-gui-bootstrap/SKILL.md, describe the goal in plain language, and let it handle setup, build, install, and local debugging.
Let Codex control Android apps: after OpenGUI is running, point Codex or Claude Code at skills/open-gui-remote-control/SKILL.md to list devices, dispatch tasks, and track executions through the local CLI.
Operate phones as remote workers: dispatch tasks through Feishu, Telegram, Discord, or REST API, keep devices on standby, and get structured results back from the backend.
Plan Supervisor maintains task state and continuation, Executor Graph runs screenshot, vision, action, and call-user loops on top of live device state, and Summarizer closes the run with a structured result.OpenGUI is built as a mobile operator system with explicit orchestration layers.
The source code currently exposes these pieces:
server/apps/backend/src/modules/graph-agent/graph/mobile-agent.graph.ts for the main graphserver/apps/backend/src/modules/graph-agent/graph/executor.graph.ts for the device-side execution loopserver/apps/backend/src/common/ws/standby.gateway.ts for standby device dispatchclient/core_network/.../StandbySocketManager.kt for persistent device standby connectionsclient/core_accessibility/.../GestureService.kt for Android-side action execution| Dimension | Typical phone-agent demo | OpenGUI | |---|---|---| | Execution model | Short interactive loop | Main graph plus executor subgraph | | Task state | Usually local and session-bound | Task state managed in the backend graph | | Device path | Often laptop-driven control | Android client with standby and execution sockets | | Model usage | One model does most of the work | Planning and VLM paths can be split across providers | | Remote operation | Optional add-on | Feishu, Telegram, Discord, REST API, and standby dispatch are built into the backend |
Start with skills/open-gui-bootstrap/SKILL.md.
The intended flow is simple:
It should only stop for:
After the backend and Android client are running, use skills/open-gui-remote-control/SKILL.md to let Codex or Claude Code control the phone through the local CLI:
cd server
pnpm opengui -- devices --json
pnpm opengui -- do "Observe the current Android screen and summarize what you see" --json
pnpm opengui -- status <executionId> --json
Recommended profiles:
Use the latest Claude Opus model family across planning, supervision, review, and vision when you want the strongest overall quality.
This is the easiest way to get the best execution quality, and it is the most expensive path.
Use Qwen 3.6 Plus for text-side roles such as Planner and Supervisor, and use Doubao Pro for the VLM side.
This usually preserves the overall system shape while lowering model cost by roughly 10x to 15x compared with an all-Opus setup, depending on task length, screenshot volume, and token mix.
Recommended prompts:
Read ./skills/open-gui-bootstrap/SKILL.md and help me run OpenGUI. Only ask me for phone-side actions.
Read ./skills/open-gui-bootstrap/SKILL.md and bootstrap OpenGUI with the latest Claude Opus model family for planning, supervision, review, and vision.
Read ./skills/open-gui-bootstrap/SKILL.md and set up OpenGUI with Qwen 3.6 Plus for Planner and Supervisor, and Doubao Pro for VLM execution.
Read ./skills/open-gui-bootstrap/SKILL.md and use my existing model APIs to get OpenGUI w
No comments yet. Be the first to share your thoughts!