sutando

Name: sutando
Author: sonichi

by sonichi

Pending

My AI Stand. Realtime by day, rewriting itself by night. Summon my AI superpower.

330stars

64forks

Python

Added 4/10/2026

View on GitHub Download ZIP

AI Agentsai-agentautomationclaudegeminimacos

Installation

# Add to your Claude Code skills
git clone https://github.com/sonichi/sutando

Getting Started

Guides for using ai agents skills like sutando.

Caveman: Cut Claude Token Use by 65%
How agent-side prompt compression works, when to use it, and when not to.
What is an AI Skills Marketplace?
Definitions, how marketplaces work, and how to choose between them in 2026.
Getting Started with AI Skills

README.md

Agentic AI for Beginners

Build your first AI agent from scratch - tool use, ReAct pattern, memory, deployment

41 minBeginner

Comments (0)

to leave a comment.

No comments yet. Be the first to share your thoughts!

Related Skills

ECC

by affaan-m

The agent harness performance optimization system. Skills, instincts, memory, security, and research-first development for Claude Code, Codex, Opencode, Cursor and beyond.

191,281

subtask Aegis

everything-claude-code

by affaan-m

The agent harness performance optimization system. Skills, instincts, memory, security, and research-first development for Claude Code, Codex, Opencode, Cursor and beyond.

185,940

28,768

JavaScript

AI Agentsai-agentsanthropic

View details

Compare

claude-code

by anthropics

Claude Code is an agentic coding tool that lives in your terminal, understands your codebase, and helps you code faster by executing routine tasks, explaining complex code, and handling git workflows - all through natural language commands.

120,031

19,897

Shell

AI Agents

View details

Compare

Sutando

My AI Stand — Realtime by Day, Rewriting Itself by Night. Summon my AI superpower.

Voice, vision, screen, meetings, calls when I'm engaged. Learns my patterns, ships its own code when I'm not. Runs across my Macs, interacts with people & their Stands.

It belongs entirely to you.

🛠 Open source: this repo — clone, build, run locally on your own Mac. 🍎 Native app preview: sutando.ai — packaged Mac app, request access.

No Claude Extra usage required. Sutando runs on your existing Claude Code subscription ($20, $100, or $200/month) with minimal extra costs — no separate Anthropic API key to top up — unlike agents that route every action through pay-per-token APIs and hosted services.

Named after Stands from JoJo's Bizarre Adventure — a personal spirit that fights on your behalf. Like a Stand, Sutando starts unnamed. As it learns your style and earns real capabilities, it names itself and generates its own avatar — your Stand, unique to you.

https://github.com/user-attachments/assets/a86ec34e-3b26-4011-824c-d2d124753c25

24 tool calls. 6 tasks. 7 minutes. All by voice from a phone. Demo by @liususan091219. Watch on YouTube (full quality) →

⭐ If Sutando is useful to you, star this repo — it's how other people find it.

📺 Watch the vision talk at UC Berkeley → — the idea behind Sutando, before it existed.

🎬 Recent autonomous output

Sutando WIRE — Sutando reads the news, drafts the script, generates narration, renders the video. No editor, no animator, no narration session.

@sutando-ai channel · Sutando WIRE playlist

Recent capability proofs:

What can you do with it?

Talk while you work. You're looking at a doc. You say "make this paragraph shorter." Sutando sees your screen, rewrites the paragraph, and replaces the original text directly.

Join meetings for you. "Join my 2pm call." It reads your calendar and joins — Zoom via the desktop app, Google Meet via the browser — with computer audio. It can also dial in by phone when you ask. It takes screenshots to identify participants, does live research when someone asks a question, and writes you a summary when the call ends. Meeting access is gated — it messages you on Telegram asking for approval before enabling task delegation.

Make calls for you. "Call her and leave a message." Sutando looks up the contact, dials the number, has the conversation, and reports back — while you keep working. It can even make concurrent calls while in a meeting.

Work from your phone. Call Sutando and say "summon." It opens Zoom with screen sharing — join from your phone to see its screen in real time. "What's on my screen?" — it takes a screenshot and tells you. "Fix the typo in that file" — done. You scroll, switch apps, navigate — all by voice while walking around.

Get better on its own. When you're not giving it tasks, Sutando runs an autonomous build loop — it monitors its own health, detects patterns in how you work, discovers new skills, and builds missing capabilities. Most of Sutando's code was written this way. It learns from your corrections and adapts over time.

Remember everything — and act on it. You have an idea while walking. Say it out loud. Sutando captures it, tags it, and saves it as a searchable note. If there's something actionable, it starts working on it right away or queues it for the next free cycle.

Reach you anywhere. Voice, Zoom, Google Meet, Telegram, Discord (text + voice channel), web, phone, or email — same agent, same memory, any channel.

Scale across machines. Plug in a second Mac and Sutando sets it up — the original agent opens a Discord channel, sends setup commands, and migrates services. The new machine handles phone calls 24/7 while your laptop stays portable. No migration scripts needed — the two agents coordinate the handoff themselves.

Status: Alpha

This is an early-stage project. Honest status:

| | Count | Details | |---|---|---| | Verified working | 30 | Voice, screen capture, notes, calendar, reminders, contacts, browser, phone calls, meeting dial-in, task delegation, pattern detection, health check, dashboard, Telegram, Discord, multi-machine migration, onboarding tutorial, and more | | Needs external setup | 3 | Twilio (phone), Telegram bot, Discord bot |

We're looking for contributors to help test and harden these capabilities. If you try something and it breaks, open an issue.

How it works

    You ──voice (browser)──► Voice agent ─────────┐
     │                       (Gemini Live,        │
     │                        WS on :9900)        ├──► inline tools (instant,
     │                                            │    in-process: describe_screen,
     ├──phone (Twilio)─────► Conversation server ─┤    get_current_time, hang_up,
     │                       (Gemini Live,        │    dtmf, ...)
     │                        WS on :3100)        │
     │                                            └──┐
     │                                               │   file bridge       .──────▶────────.
     ├──telegram──────────► Telegram bridge ─────────┼── tasks/ ─────────► |               |
     │                                               │                    |   Core        |
     │                                               │                    |   agent ↻     |
     └──discord───────────► Discord bridge ──────────┘                    |               |
                                                                           `──────◀────────'
                                                                                  │
                                                                                  ▼
                                                                          uses anything:
                                                                          email, calendar,
                                                                          browser, files,
                                                                          phone, reminders...
                                    ◄── results/ ◄────────────────────────────────┘
                                (spoken via voice/phone,
                                 text via Telegram/Discord)

    ↻ = a cron job fires the `/proactive-loop` skill every 5 minutes
        (`*/5 * * * *` in `skills/schedule-crons/crons.json`). The skill
        runs as a 10-minute pass that keeps a persistent watcher on
        `tasks/` via Claude Code's `Monitor` tool — pending tasks are
        processed the moment they arrive, not just on the cron tick.
        Each pass also runs health checks and picks the next build-log
        item autonomously.

Four processes work together:

Voice agent (Gemini Live, WebSocket on :9900) — listens and talks in real time for browser voice.
Web client (com.sutando.web-client.plist, HTTP on :8080) — separate launchd service that serves the browser UI. The browser then connects directly to the voice agent's WebSocket on :9900 — the web client is not in the WebSocket data path.
Conversation server (Gemini Live, Twilio WebSocket on :3100) — same role as the voice agent for inbound and outbound phone calls.
Core agent (Claude Code CLI) — executes tasks with full system access. We use the CLI because it provides cron scheduling, plugins, and an interactive terminal that the SDK doesn't offer out of the box.

Voice agent and conversation server handle conversation-scope actions with inline tools — in-process calls that round-trip instantly (describe the screen, hang up, send DTMF, read the clipboard/current time, capture a screenshot). For anything outside that scope they write to tasks/; core reads them, executes, and writes to results/, which each channel speaks or messages back. Telegram and Discord bridges only use the tasks/ path.

Quick start

Prerequisites:

macOS 15+
Claude Code (run claude once t