by eggbrid2
Open Android AI agent runtime for phone control, app automation, VLM screen reading, skill routing, mini apps, and Mihomo VPN workflows.
# Add to your Claude Code skills
git clone https://github.com/eggbrid2/mobileClawMobileClaw is an experimental Android app for running LLM agents on a real phone. It sits at the intersection of Android automation, mobile AI agents, accessibility-based phone control, on-device Python tools, multi-agent workflows, and VPN/proxy operations.
The idea is simple: a mobile agent should not just chat about your device. It should be able to observe the screen, choose the right tools, act through Android capabilities, create new workflows, and keep enough memory to improve across tasks.
Most mobile AI apps are chat surfaces. MobileClaw is closer to a small operating layer for agents.
A user request is turned into a scoped task. The task gets a role, a short plan, a filtered tool set, and an execution loop. That shape is the core of the project:
user goal -> task type -> role scheduler -> planner -> allowed skills -> observe -> act -> verify
No comments yet. Be the first to share your thoughts!
This matters because phone automation fails quickly when every tool is always available. MobileClaw keeps phone control, web research, file work, app building, image generation, VPN control, skill management, and code execution in different task modes.
The project is still moving fast. Some pieces are stable enough to use daily; some are research-grade and need device-specific fixes. The code is open because this kind of Android agent needs real devices, real ROM quirks, and real users to become good.
see_screen, which captures a screenshot, marks interactive targets, and returns coordinates for direct action.screenshot fallback when XML is empty or misleading, especially for Flutter, React Native, WebView, and game-like UIs.bg_launch, bg_read_screen, bg_screenshot, bg_stop.TaskClassifier maps requests into task types such as PHONE_CONTROL, WEB_RESEARCH, APP_BUILD, VPN_CONTROL, SKILL_MANAGEMENT, and CODE_EXECUTION.TaskPlanner makes a planning call before tool execution.TaskToolPolicy controls which tools are visible for each task.RoleScheduler chooses from built-in and user-created roles.AgentRuntime runs a ReAct-style loop with repeated-perception guards, screenshot context trimming, structured observations, and task events.Built-in roles include:
Roles are not just personas. They can declare preferred task types, keywords, scheduler priority, forced skills, and model overrides. User-created roles participate in the same scheduler.
MobileClaw has a native skill registry with injection levels:
Built-in skill groups include:
see_screen, screenshot, read_screen, tap, scroll, input_text, navigate, list_apps.web_search, fetch_url, hidden WebView browsing, page content extraction, JavaScript execution.vpn_control.Dynamic skills can be Python or HTTP definitions saved under app storage. Native and shell skills are intentionally not generated by the agent through the normal meta-skill path.
MobileClaw has two app-building paths:
Claw JavaScript bridge for HTTP, SQLite, Python, shell, memory, config, files, clipboard, device info, app launch, URL opening, sharing, and asking the agent.Both are created from chat through skills. Mini apps are good for fast web-like tools. AI Pages are better when a workflow should feel native.
MobileClaw includes a VPN stack designed for Android agent use:
MATCH,GLOBAL.VpnService creates the TUN interface.hev-socks5-tunnel bridges Android TUN traffic to mihomo.This stack does not use Xray. mihomo handles the proxy protocols; hev is kept because Android still needs a TUN-to-SOCKS bridge.
console_editor.app/src/main/java/com/mobileclaw
├─ agent
│ ├─ TaskSession.kt task types, task plans, tool policy
│ ├─ AgentRuntime.kt ReAct loop and task events
│ ├─ AgentContext.kt prompt construction
│ ├─ Role.kt built-in roles and role metadata
│ └─ RoleScheduler.kt automatic role routing
├─ skill
│ ├─ SkillRegistry.kt registration, injection levels, overrides
│ ├─ SkillLoader.kt dynamic Python/HTTP skill persistence
│ ├─ builtin/ native skills
│ └─ executor/ Python, HTTP, shell executors
├─ perception
│ ├─ ClawAccessibilityService.kt
│ ├─ ScreenshotController.kt
│ ├─ ActionController.kt
│ ├─ VirtualDisplayManager.kt
│ └─ ClawIME.kt
├─ ui
│ ├─ ChatScreen.kt main chat
│ ├─ GroupChatScreen.kt multi-agent group chat
│ ├─ DynamicUiRenderer.kt inline generated UI blocks
│ ├─ MiniAppActivity.kt WebView mini apps
│ └─ aipage/ native AI page runtime
├─ vpn
│ ├─ VpnManager.kt
│ ├─ ClashParser.kt
│ ├─ MihomoConfigBuilder.kt
│ ├─ MihomoProcess.kt
│ └─ ClawVpnService.kt
├─ memory
│ ├─ SemanticMemory.kt
│ ├─ EpisodicMem