by jfarcand
MCP server for controlling a real iPhone via macOS iPhone Mirroring...and any MacOs app. Screenshot, tap, swipe, type — from any MCP client.
# Add to your Claude Code skills
git clone https://github.com/jfarcand/mirroir-mcpNo comments yet. Be the first to share your thoughts!
Give your AI eyes, hands, and a real iPhone. An MCP server that lets any AI agent see the screen, tap what it needs, and figure the rest out — through macOS iPhone Mirroring. Experimental support for macOS windows. 32 tools, any MCP client.
/bin/bash -c "$(curl -fsSL https://mirroir.dev/get-mirroir.sh)"
or via npx:
npx -y mirroir-mcp install
or via Homebrew:
brew tap jfarcand/tap && brew install mirroir-mcp
The first time you take a screenshot, macOS will prompt for Screen Recording and Accessibility permissions. Grant both.
claude mcp add --transport stdio mirroir -- npx -y mirroir-mcp
Install from the MCP server gallery: search @mcp mirroir in the Extensions view, or add to .vscode/mcp.json:
{
"servers": {
"mirroir": {
"type": "stdio",
"command": "npx",
"args": ["-y", "mirroir-mcp"]
}
}
}
Add to .cursor/mcp.json in your project root:
{
"mcpServers": {
"mirroir": {
"command": "npx",
"args": ["-y", "mirroir-mcp"]
}
}
}
codex mcp add mirroir -- npx -y mirroir-mcp
Or add to ~/.codex/config.toml:
[mcp_servers.mirroir]
command = "npx"
args = ["-y", "mirroir-mcp"]
git clone https://github.com/jfarcand/mirroir-mcp.git
cd mirroir-mcp
./mirroir.sh
Use the full path to the binary in your .mcp.json: <repo>/.build/release/mirroir-mcp.
Every interaction follows the same loop: observe, reason, act. describe_screen gives the AI every text element with tap coordinates (eyes). The LLM decides what to do next (brain). tap, type_text, swipe execute the action (hands) — then it loops back to observe. No scripts, no coordinates, just intent.
mirroir can explore any iOS app blindly, but it works better when you tell it what to expect. Write an APP.md file and mirroir reads it before exploration starts:
---
app: Santé
archetype: dashboard
obstacle_mode: auto
---
## Structure
Dashboard with 4 tabs: Résumé, Partage, Parcourir, Profil.
## Résumé Tab
- Summary cards for health metrics that drill down to charts
- Cards often show "Aucune donnée" on test devices
## Obstacles
- Health Access permission → tap "Autoriser"
- Notification permission → tap "Ne pas autoriser"
## Skip
- Supprimer les données de Santé
- Réinitialiser
What the code actually uses today: archetype overrides recipe auto-detection; obstacles are auto-dismissed when obstacle_mode: auto; skip merges with permissions.json.skipElements; tabs (inline or as a section) are injected as high-priority targets; Structure + tab body + Tips become AI context in generated skills.
See the APP.md specification for the complete field list, loader resolution rules, and the permission-system bridge. Three levels of patterns work together — elements (what rows look like), screens (what the page layout means), and apps (what the developer knows). Patterns & Skills covers the full system.
Paste any of these into Claude Code, Claude Desktop, ChatGPT, Cursor, or any MCP client:
Open Messages, find my conversation with Alice, and send "running 10 min late".
Open Calendar, create a new event called "Dentist" next Tuesday at 2pm.
Open my Expo Go app, tap "LoginDemo", test the login screen with
test@example.com / password123. Screenshot after each step.
Start recording, open Settings, scroll to General > About, stop recording.
describe_screen is the AI's eyes. Three backends work together to give the agent a complete picture of what's on screen — text, icons, and semantic UI structure.
The default backend uses Apple's Vision framework to detect every text element on screen and return exact tap coordinates. This is fast, local, and requires no API keys or external services.
Text-only OCR misses non-text UI elements — buttons, toggles, tab bar icons, activity rings. Drop a YOLO CoreML model (.mlmodelc) in ~/.mirroir-mcp/models/ and the server auto-detects it at startup, merging icon detection results with OCR text. The AI gets tap targets for elements that text-only OCR cannot see.
| Mode | ocrBackend setting | Behavior |
|------|---------------------|----------|
| Auto-detect (default) | "auto" | Uses Vision + YOLO if a model is installed, Vision only otherwise |
| Vision only | "vision" | Apple Vision OCR text only |
| YOLO only | "yolo" | CoreML element detection only |
| Both | "both" | Always merge both backends (falls back to Vision if no model) |
Instead of local OCR, describe_screen can send the screenshot to an AI vision model that identifies UI elements semantically — cards, tabs, buttons, icons, navigation structure — not just raw text. This produces richer context for the agent, especially on screens with complex layouts.
The embacle runtime is embedded directly into the mirroir-mcp binary via Rust FFI. describe_screen calls the embedded runtime in-process — no separate server, no network round-trip, no additional setup. The FFI layer (EmbacleFFI.swift → libembacle.a) handles initialization, chat completion requests, and memory management across the Swift/Rust boundary.
embacle routes vision requests through already-authenticated CLI tools (GitHub Copilot, Claude Code) so there is no separate API key to manage. If you have a Copilot or Claude Code subscription, you already have access.
brew tap dravr-ai/tap
brew install embacle # CLI tools (embacle-server, embacle-mcp)
brew install embacle-ffi # Rust FFI static library (libembacle.a)
Then rebuild mirroir-mcp from source (or reinstall via Homebrew) so the binary links against libembacle.a:
# From source
swift build -c release
# Or via Homebrew (rebuilds automatically)
brew reinstall mirroir-mcp
When the embacle FFI is linked into the binary, screenDescriberMode defaults to "auto" which automatically resolves to vision mode. No settings change required — install embacle-ffi, rebuild, and describe_screen starts using AI vision.
To force local OCR even when embacle is available, explicitly set "ocr":
// .mirroir-mcp/settings.json
{
"screenDescriberMode": "ocr"
}
See Configuration for all available settings.
When you find yourself repeating the same agent workflow, capture it as a skill. Skills are SKILL.md files — numbered steps the AI follows, adapting to layout changes and unexpected dialogs. Steps like Tap "Email" use OCR — no hardcoded coordinates.
Place files in ~/.mirroir-mcp/skills/ (global) or <cwd>/.mirroir-mcp/skills/ (project-local).
Describe your app's structure to guide exploration — see Describe Your App above and the full APP.md specification. Place APP.md files in ~/.mirroir-mcp/skills/ or the mirroir-skills repo at patterns/apps/.
---
version: 1
name: Commute ETA Notification
app: Waze, Messages
tags: ["workflow", "cross-app"]
---
## Steps
1. Launch **Waze**
2. Wait for "Où va-t-on ?" to appear
3. Tap "Où va-t-on ?"
4. Wait for "${DESTINATION:-Travail}" to appear
5. Tap "${DESTINATION:-Travail}"
6. Wait for "Y aller" to appear
7. Tap "Y aller"
8. Wait for "min" to appear
9. Remember: Read the commute time and ETA.
10. Press Home
11. Launch **Messages**
12. Tap "New Message"
13. Type "${RECIPIENT}" and select the contact
14. Type "On my way! ETA {eta}"
15. Press **Return**
16. Screenshot: "message_sent"
${VAR} placeholders resolve from environment variables. ${VAR:-default} for fallbacks.
Install ready-to-use skills from jfarcand/mirroir-skills:
git clone https://github.com/jfarcand/mirroir-skills ~/.mirroir-mcp/skills
The generate_skill tool lets an AI agent explore an app and produce SKILL.md files. It uses [breadth-first search](https://en.wikipedia.org/wiki/