by agents-io
PokeClaw (PocketClaw) — first on-device AI that controls your Android phone. Gemma 4, no cloud, no API key. Poke is short for Pocket.
# Add to your Claude Code skills
git clone https://github.com/agents-io/PokeClawLast scanned: 5/6/2026
{
"issues": [],
"status": "PASSED",
"scannedAt": "2026-05-06T06:32:04.845Z",
"semgrepRan": false,
"npmAuditRan": true,
"pipAuditRan": true
}PokeClaw, also known as PocketClaw, is an open-source Android app for AI phone automation.
It can run Gemma 4 on-device for local, private phone control, and it also supports optional cloud models when you want stronger reasoning for harder tasks.
The current public build is a local-first prototype for turning an Android phone into an AI-operated device.
In Local mode, model execution stays inside your device. No account or API key is required for Local mode.
Everyone else: Phone → Internet → Cloud API → Internet → Phone
💳Credit card needed, API key required. Monthly bill attached.
PokeClaw local: Phone → LLM → Phone
Local-first when you want it. Optional cloud when you need it.
AI can control your phone, with local-first execution and optional cloud help.
The current public build is open-source and already handles real chat, task, and automation flows on Android.
Monitor a WhatsApp contact and auto-reply:
Context-aware WhatsApp auto-reply:
https://github.com/user-attachments/assets/5a43d4d5-458a-4eea-a0a5-58d113255741
https://github.com/user-attachments/assets/5c2966c5-04e6-4b22-8d66-11915ae62096
☝️ Auto-reply demo: PokeClaw monitors messages from Mom, reads what she said, and replies based on context using the on-device LLM. Watch in higher resolution on YouTube
☝️ Context demo: Mom asks "what did I tell you to bring?" — the AI opens the chat, reads the full conversation on screen, sees the earlier message about wine, and replies correctly. This is the difference between context-aware and context-free replies.
No comments yet. Be the first to share your thoughts!
https://github.com/user-attachments/assets/89999dd8-a1be-49ad-9419-60c2b38f6374
Why is the "hi" demo slow? That clip was recorded on a CPU-only Android device with no usable GPU or NPU path. Running Gemma 4 E2B on pure CPU takes about 45 seconds to warm up. On stronger phones it is much faster:
- Google Tensor G3/G4 (Pixel 8, Pixel 9)
- Snapdragon 8 Gen 2/3 (Galaxy S24, OnePlus 12)
- Dimensity 9200/9300 (recent MediaTek flagships)
- Snapdragon 7+ Gen 2+ (mid-range with GPU)
On these devices, warmup drops to seconds. Same model, better hardware.
I'm building this solo. When Gemma 4 landed with native tool calling on LiteRT-LM, I wanted to know whether a phone could become a real on-device agent instead of just another chatbot. PokeClaw is the result.
The interesting part is not just chatting with a local model. The interesting part is getting a local model to read the screen, choose tools, operate apps, keep task state, and finish real phone workflows. That is exactly what this project is built for.
PokeClaw already supports fully on-device automation with Gemma 4 and optional cloud models for stronger task execution. The current focus is broader device support, more generic skills, more local model options, and a cleaner public release path.
If you hit something interesting, open an issue. Real device reports are how this gets better fast.
PokeClaw is not just a chat app with a few phone-control tricks glued on top.
At its core, it is becoming a mobile agent harness:
That distinction matters. The long-term goal is not to hardcode one-off app flows forever. The goal is to build the strongest practical harness for AI phone agents on Android, then ship product experiences on top of that foundation.
That is also why the project invests so heavily in:
That direction affects how bugs are prioritized. PokeClaw should fix deterministic harness, runtime, hardware, storage, accessibility, foreground-service, signing, and QA-runner problems before drilling into one flaky model task.
Examples of harness problems worth fixing immediately:
Examples that should usually be treated as model-performance or exploratory-agent limits unless logs prove otherwise:
Prompts, tools, skills, and playbooks should stay generic. Add structure when it improves a reusable class of tasks; avoid one-off prompt hacks or coordinate scripts just to make a single demo pass.
There are already strong mobile-agent frameworks for developers, benchmarks, and cloud/desktop-controlled devices. DroidRun/Mobilerun, minitap/mobile-use, Mobile-Agent, and AppAgent-style systems are useful references, especially for planning, UI observation, benchmark design, and failure recovery. They are not the same deployment category as PokeClaw:
PokeClaw should not try to become another PC/SDK/ADB-driven mobile automation framework. Its product lane is different:
External automation tools are still useful. Tasker, MacroDroid, Locale, and similar apps are good deterministic trigger engines. PokeClaw should let them trigger an AI task, but PokeClaw should remain the phone-resident execution harness.
👉 Try the interactive demo on our landing page — click through every screen without installing anything.
The model picks the right tool, fills in the parameters, and executes. You don't configure anything per-app. It just reads the screen and acts.
These are tasks we have already run end-to-end during on-device QA.
pokeclawEvery number below comes from repeated trials on a physical Pixel 8 Pro running release builds. No cherry-picked runs, no emulators. The full verified task list and tier breakdown is in thoughts/verified-task-capabilities.md.
| Task | Result | Rounds | What h