by sonichi
Summon your AI superpower — voice, vision, and autonomous action
# Add to your Claude Code skills
git clone https://github.com/sonichi/sutandoSummon your AI superpower — voice, vision, and autonomous action.
It shares your screen, joins your meetings, makes phone calls, and builds itself.
It belongs entirely to you.
Named after Stands from JoJo's Bizarre Adventure — a personal spirit that fights on your behalf. Like a Stand, Sutando starts unnamed. As it learns your style and earns real capabilities, it names itself and generates its own avatar — your Stand, unique to you.
Unmute to hear the real-time conversation. In this demo, the user controls their Mac entirely from a phone call — sharing the screen to Zoom, recording a narrated video, adding subtitles, and playing it back. No keyboard, no mouse. Watch on YouTube →
Talk while you work. You're looking at a doc. You say "make this paragraph shorter." Sutando sees your screen, rewrites the paragraph, and replaces the original text directly.
Join meetings for you. "Join my 2pm call." It reads your calendar and joins — Zoom via the desktop app, Google Meet via the browser — with computer audio. It can also dial in by phone when you ask. It takes screenshots to identify participants, does live research when someone asks a question, and writes you a summary when the call ends. Meeting access is gated — it messages you on Telegram asking for approval before enabling task delegation.
Make calls for you. "Call her and leave a message." Sutando looks up the contact, dials the number, has the conversation, and reports back — while you keep working. It can even make concurrent calls while in a meeting.
Work from your phone. Call Sutando and say "summon." It opens Zoom with screen sharing — join from your phone to see its screen in real time. "What's on my screen?" — it takes a screenshot and tells you. "Fix the typo in that file" — done. You scroll, switch apps, navigate — all by voice while walking around.
Get better on its own. When you're not giving it tasks, Sutando runs an autonomous build loop — it monitors its own health, detects patterns in how you work, discovers new skills, and builds missing capabilities. Most of Sutando's code was written this way. It learns from your corrections and adapts over time.
Remember everything — and act on it. You have an idea while walking. Say it out loud. Sutando captures it, tags it, and saves it as a searchable note. If there's something actionable, it starts working on it right away or queues it for the next free cycle.
Reach you anywhere. Voice, Zoom, Google Meet, Telegram, Discord, web, phone, or email — same agent, same memory, any channel.
Scale across machines. Plug in a second Mac and Sutando sets it up — the original agent opens a Discord channel, sends setup commands, and migrates services. The new machine handles phone calls 24/7 while your laptop stays portable. No migration scripts needed — the two agents coordinate the handoff themselves.
This is an early-stage project. Honest status:
| | Count | Details | |---|---|---| | Verified working | 30 | Voice, screen capture, notes, calendar, reminders, contacts, browser, phone calls, meeting dial-in, task delegation, pattern detection, health check, dashboard, Telegram, Discord, multi-machine migration, onboarding tutorial, and more | | Needs external setup | 3 | Twilio (phone), Telegram bot, Discord bot |
We're looking for contributors to help test and harden these capabilities. If you try something and it breaks, open an issue.
You ──voice──► Voice agent ──────────┐
│ │ file bridge
├──telegram──► Telegram bridge ─────┤ tasks/ ──► Core agent
│ │ ◄── results/ │
├──discord───► Discord bridge ─────┤ │ uses anything:
│ │ ▼ email, calendar, browser,
└──browser───► Web client ──────────┘ speaks / files, phone, reminders...
replies
Two processes work together:
They communicate through files: voice agent writes tasks, the core agent executes them, writes results back, voice agent speaks the answer.
Prerequisites:
claude once to complete login)brew install node)brew install fswatch)# Clone
git clone https://github.com/sonichi/sutando.git
cd sutando
# Configure (minimum: GEMINI_API_KEY is required)
cp .env.example .env
# Edit .env — add your GEMINI_API_KEY (from Google AI Studio)
# Start everything
bash src/startup.sh
This starts all services (voice agent, web client, dashboard, API, Sutando menu bar app) and opens http://localhost:8080 in your browser. The autonomous loop starts automatically — click Connect and start talking. Look for S in your menu bar — it provides shortcuts (⌃C context drop, ⌃V voice toggle, ⌃M mute) plus Open Core (Claude Code terminal) and Open Dashboard (status page).
Note:
startup.shruns Claude Code with--dangerously-skip-permissions, giving Sutando full system access (file operations, terminal commands, browser control). This is required for autonomous operation but means you should review what it does. All actions are logged. Keep the terminal window accessible — you may need to respond there when Claude Code runs out of quota or prompts for input (e.g., CLI commands, permission confirmations).
macOS permissions — grant these on first run (System Settings → Privacy & Security):
claude and nodeTry saying:
Verify your setup (optional):
bash src/verify-setup.sh
Troubleshooting:
logs/voice-agent.log for errors. Common causes:
GEMINI_API_KEY not set or invalid in .env — get one at ai.google.devlsof -i :9900 to checknpm install failed? Make sure Node.js 22+ is installed: node --versionGEMINI_API_KEY overriding .env — run unset GEMINI_API_KEY then restartscreencapture -v needs a TTY. Sutando uses ffmpeg instead — make sure it's installed: brew install ffmpegbash src/restart.sh — this kills all services and restarts freshShutting down:
bash src/restart.sh # stops all services (voice agent, web client, API, bridges, etc.)
pkill -f "src/Sutando" # stop the menu bar app
Exiting startup.sh alone does NOT stop background services. Always use restart.sh (or kill-all.sh if available) to cleanly shut everything down.
Uninstalling:
bash src/restart.sh && pkill -f "src/Sutando"rm -rf ~/Desktop/sutando (or wherever you cloned it)rm -rf ~/.claude/projects/*sutando*node_modules/ — deleted with the repobrew uninstall imsg wacli if installedThese unlock more capabilities. Add to .env when ready:
| Integration | What it unlocks | Setup |
|-------------|----------------|-------|
| Gmail | Read/send/search email from voice | gws auth setup --login (OAuth, no app password) |
| Twilio + ngrok | Phone calls, SMS, meeting dial-in, task delegation via phone | twilio.com (~$1/mo) + brew install ngrok |
| Telegram | Message Sutando from your phone | Create bot via @BotFather, then /telegram:configure <token> |
| Discord | Message Sutando from Discord (DM + channel @mentions) | Developer portal, then /discord:configure <token> |
| Claude for Chrome | Browser automation — navigate, read pages, fill forms, interact with web apps | Install extension, log in with the same account as Claude Code |
| Sutando app (menu bar) | Global shortcuts: ⌃C context drop, ⌃V voice toggle, ⌃M mute | Auto-launches via startup.sh |
| Capability | Script | Status |
|-----------|--------|--------|
| Voice conversation | voice-agent.ts | Verified |
| Task delegation (voice → Claud
No comments yet. Be the first to share your thoughts!