name: gui-agent description: "Visually operate any application on macOS — native apps, browsers, or anything with a GUI window. Open apps, click buttons, type text, send messages, fill forms, navigate menus, all driven by visual UI detection. Use when the task requires interacting with what's on screen."

GUI Agent Skill

You ARE the agent loop. Every GUI task follows this flow, in order:

INTENT MATCH → OBSERVE → ENSURE APP READY → ACT → VERIFY → SAVE WORKFLOW → REPORT

Sub-Skills

Each step has detailed instructions in its own skill file:

| Step | Skill | When to read | |------|-------|-------------| | Observe | skills/gui-observe/SKILL.md | Before any action — screenshot, OCR, identify state | | Learn | skills/gui-learn/SKILL.md | App not in memory, or match rate < 80% | | Act | skills/gui-act/SKILL.md | Clicking, typing, sending messages, waiting for UI | | Memory | skills/gui-memory/SKILL.md | Visual memory — profiles, components, pages, CRUD, cleanup | | Workflow | skills/gui-workflow/SKILL.md | Intent matching, saving/replaying workflows, meta-workflows | | Setup | skills/gui-setup/SKILL.md | First-time setup on a new machine |

Read the relevant sub-skill when you reach that step. You don't need to read all of them upfront.

agent.py — Unified Entry Point

All GUI operations go through scripts/agent.py. Do not call app_memory.py or gui_agent.py directly.

source ~/gui-agent-env/bin/activate

# Core operations
python3 scripts/agent.py open --app AppName
python3 scripts/agent.py learn --app AppName
python3 scripts/agent.py detect --app AppName
python3 scripts/agent.py click --app AppName --component ButtonName
python3 scripts/agent.py list --app AppName
python3 scripts/agent.py read_screen --app AppName
python3 scripts/agent.py wait_for --app AppName --component X
python3 scripts/agent.py cleanup --app AppName
python3 scripts/agent.py navigate --url "https://example.com"
python3 scripts/agent.py workflows --app AppName
python3 scripts/agent.py all_workflows

# Messaging
python3 scripts/agent.py send_message --app WeChat --contact "小明" --message "明天见"
python3 scripts/agent.py read_messages --app WeChat --contact "小明"

🔥 News

[03/19/2026] v0.4.0 — Workflow memory + async polling: Saved workflows auto-matched by LLM intent; wait_for command (template-match polling, no blind clicks); mandatory timing & token delta reporting; multi-window fix (selects largest window).

[03/19/2026] v0.3.0 — Click-graph state architecture: UI modeled as a graph of states; each click creates a new state entry; state identification via OCR text matching. Removed pages/regions/overlays complexity.

[03/17/2026] v0.2.0 — Workflow-based revise, event-driven polling, mandatory operation protocol (observe→verify→act→confirm), per-app visual memory with auto-cleanup.

[03/16/2026] v0.1.0 — GPA-GUI-Detector integration, Apple Vision OCR, template matching, browser automation, per-site memory.

[03/10/2026] v0.0.1 — Initial release: WeChat/Discord/Telegram automation, app profiles, fuzzy app matching.

💬 What It Looks Like

You: "Send a message to John in WeChat saying see you tomorrow"

OBSERVE → Screenshot, identify current state ├── Current app: Finder (not WeChat) └── Action: need to switch to WeChat STATE → Check WeChat memory ├── Learned before? Yes (24 components) ├── OCR visible text: ["Chat", "Cowork", "Code", "Search", ...] ├── State identified: "initial" (89% match) └── Components for this state: 18 → use these for matching NAVIGATE → Find contact "John" ├── Template match search_bar → found (conf=0.96) → click ├── Paste "John" into search field (clipboard → Cmd+V) ├── OCR search results → found → click └── New state: "click:John" (chat opened) VERIFY → Confirm correct chat opened ├── OCR chat header → "John" ✅ └── Wrong contact? → ABORT ACT → Send message ├── Click input field (template match) ├── Paste "see you tomorrow" (clipboard → Cmd+V) └── Press Enter CONFIRM → Verify message sent ├── OCR chat area → "see you tomorrow" visible ✅ └── Done

GUIClaw

GUI Agent Skill

Sub-Skills

agent.py — Unified Entry Point

🔥 News

💬 What It Looks Like

Related Skills

Execution Flow

STEP -1: INTENT MATCHING

STEP 0: OBSERVE

STEP 1: ENSURE APP READY

STEP 2: LEARN (when needed)

STEP 3: ACT

STEP 4: POST-ACTION VERIFY

STEP 5: SAVE WORKFLOW

STEP 6: REPORT

Key Principles

Safety Rules

Memory System

File Structure

"Scan my Mac for malware"

"Check if my GPU training is still running"

"Kill GlobalProtect via Activity Monitor"

🚀 Quick Start

🧠 How It Works

Learn Once, Match Forever

🔍 Detection Stack

📁 App Visual Memory

Click Graph