openclaw-voice

Name: openclaw-voice
Author: Purple-Horizons

Pending

🦞 Open-source browser-based voice chat for AI assistants. Self-hosted, private, free. Whisper STT + ElevenLabs TTS. Works with OpenAI, Claude, or custom agents.

100stars

21forks

Python

Added 4/27/2026

View on GitHub Download ZIP

AI AgentsSKILL.mdai-assistantclaudeelevenlabsfastapiopen-sourceopenaipythonself-hostedspeech-to-texttext-to-speechvoice-aivoice-chatwebsocketwhisper

Installation

# Add to your Claude Code skills
git clone https://github.com/Purple-Horizons/openclaw-voice

SKILL.md

OpenClaw Voice Skill

Browser-based voice chat for OpenClaw agents.

What It Does

Adds voice chat capability to your OpenClaw agent. Users can speak to your agent via a web browser and hear responses in real-time.

Stack

STT: faster-whisper (local, no API costs)
TTS: ElevenLabs (cloud, high quality) or Chatterbox (self-hosted)
Transport: WebSocket
Backend: OpenClaw gateway (chatCompletions endpoint)

Quick Setup

1. Enable Gateway Endpoint

Add to your openclaw.json:

{
  "gateway": {
    "http": {
      "endpoints": {
        "chatCompletions": {
          "enabled": true
        }
      }
    }
  }
}

2. Add Voice Agent

{
  "agents": {
    "list": [
      {
        "id": "voice",
        "workspace": "/path/to/your/workspace",
        "model": "anthropic/claude-sonnet-4-5"
      }
    ]
  }
}

3. Run the Server

# Clone and setup
git clone https://github.com/Purple-Horizons/openclaw-voice.git
cd openclaw-voice
uv sync  # or pip install -r requirements.txt

# Configure
cp .env.example .env
# Edit .env with your OPENCLAW_GATEWAY_URL, OPENCLAW_GATEWAY_TOKEN, ELEVENLABS_API_KEY

# Run
PYTHONPATH=. python -m src.server.main

4. Access

Open http://localhost:8765 in your browser.

For HTTPS (required for mobile mic), use Tailscale Funnel or your own SSL.

Environment Variables

| Variable | Required | Description | |----------|----------|-------------| | OPENCLAW_GATEWAY_URL | Yes* | OpenClaw gateway URL (e.g., http://localhost:18789) | | OPENCLAW_GATEWAY_TOKEN | Yes* | Gateway auth token | | ELEVENLABS_API_KEY | Recommended | For high-quality TTS | | OPENAI_API_KEY | Fallback | Used if gateway not configured |

*Required for full agent integration. Falls back to direct OpenAI if not set.

Features

Push-to-talk: Hold button to speak
Continuous mode: Hands-free, auto-listens after responses
Keyboard shortcut: Spacebar toggles recording
Mobile support: Works on phones (requires HTTPS)

API Key Auth (Optional)

For production, enable auth:

OPENCLAW_REQUIRE_AUTH=true
OPENCLAW_MASTER_KEY=your-secret-key

Then generate user keys via POST /api/keys.

Files

src/server/main.py - FastAPI server
src/server/stt.py - Speech-to-text (Whisper)
src/server/tts.py - Text-to-speech (ElevenLabs/Chatterbox)
src/server/backend.py - AI backend (gateway or OpenAI)
src/client/index.html - Browser UI

License

MIT — Built by Purple Horizons

README.md

OpenClaw Voice

Open-source browser-based voice interface for AI assistants.

Talk to your AI like you talk to Alexa — but self-hosted, private, and connected to your own agent.

🌐 Website: openclawvoice.com

Features

| Feature | Description | |---------|-------------| | 🎤 Local STT | Whisper runs locally via faster-whisper. Your voice never leaves your machine. | | 🔊 Streaming TTS | ElevenLabs with sentence-by-sentence streaming. Hear responses while they generate. | | 🎯 Voice Activity Detection | Silero VAD filters background noise. Works in noisy environments. | | 🧹 Smart Text Cleaning | Strips markdown, hashtags, URLs before TTS. No more "hash hash". | | 🔌 Any AI Backend | OpenAI, Claude, or full OpenClaw agent with memory and tools. | | 🌐 Browser-Based | No app install. Works on desktop and mobile. | | 🚗 Continuous Mode | Hands-free conversation. Auto-listens after each response. |

Quick Start

# Clone
git clone https://github.com/Purple-Horizons/openclaw-voice.git
cd openclaw-voice

# Install
python3 -m venv .venv
source .venv/bin/activate
pip install -r requirements.txt

# Optional: Install VAD support (better noise handling)
pip install torch torchaudio

# Configure (create .env from example)
cp .env.example .env
# Edit .env with your API keys

# Run
PYTHONPATH=. python -m src.server.main

# Open http://localhost:8765

For OpenClaw Agents (Auto-Install)

One-liner for AI agents to self-install:

cd /tmp && \
git clone https://github.com/Purple-Horizons/openclaw-voice.git && \
cd openclaw-voice && \
python3 -m venv .venv && source .venv/bin/activate && \
pip install -r requirements.txt torch torchaudio && \
PYTHONPATH=. ELEVENLABS_API_KEY="$ELEVENLABS_API_KEY" OPENAI_API_KEY="$OPENAI_API_KEY" \
  nohup python -m src.server.main > /tmp/voice-server.log 2>&1 &

Configuration

Environment Variables

| Variable | Required | Default | Description | |----------|----------|---------|-------------| | ELEVENLABS_API_KEY | Yes* | — | ElevenLabs API key for TTS | | OPENAI_API_KEY | Yes* | — | OpenAI API key (if not using gateway) | | OPENCLAW_GATEWAY_URL | No | — | OpenClaw gateway URL for full agent | | OPENCLAW_GATEWAY_TOKEN | No | — | Gateway auth token | | OPENCLAW_PORT | No | 8765 | Server port | | OPENCLAW_STT_MODEL | No | base | Whisper model size | | OPENCLAW_STT_DEVICE | No | auto | Device: auto, cpu, cuda, mps | | OPENCLAW_REQUIRE_AUTH | No | false | Require API keys for clients |

*One of OPENAI_API_KEY or OPENCLAW_GATEWAY_URL required.

Whisper Model Sizes

| Model | Speed | Quality | VRAM | Best For | |-------|-------|---------|------|----------| | tiny | Fastest | Fair | ~400MB | Quick testing | | base | Fast | Good | ~1GB | Default. Good balance. | | small | Medium | Better | ~2GB | Clearer transcription | | medium | Slower | Great | ~5GB | Accuracy priority | | large-v3-turbo | Slow | Best | ~6GB | Maximum accuracy |

TTS Options

| Backend | Type | Quality | Latency | Notes | |---------|------|---------|---------|-------| | ElevenLabs | Cloud | Excellent | ~500ms | Default. Streaming supported. | | Chatterbox | Local | Very Good | ~1s | MIT license, voice cloning | | XTTS-v2 | Local | Excellent | ~1s | Voice cloning supported | | Mock | Local | None | 0ms | For testing (silence) |

ElevenLabs uses eleven_turbo_v2_5 for fastest response.

OpenClaw Gateway Integration

Connect to your full OpenClaw agent (same memory, tools, and persona as text chat):

# .env
OPENCLAW_GATEWAY_URL=http://localhost:18789
OPENCLAW_GATEWAY_TOKEN=your-token
ELEVENLABS_API_KEY=your-key

Add to your openclaw.json:

{
  "gateway": {
    "http": {
      "endpoints": {
        "chatCompletions": { "enabled": true }
      }
    }
  },
  "agents": {
    "list": [
      {
        "id": "voice",
        "workspace": "/path/to/workspace",
        "model": "anthropic/claude-sonnet-4-5"
      }
    ]
  }
}

Architecture

┌─────────────┐   WebSocket   ┌─────────────────────────────────────┐
│   Browser   │◄────────────►│          Voice Server               │
│  (mic/spk)  │               │                                     │
└─────────────┘               │  ┌─────────┐  ┌─────┐  ┌─────────┐ │
                              │  │ Whisper │→│ AI  │→│ElevenLabs│ │
                              │  │  (STT)  │  │     │  │  (TTS)  │ │
                              │  └─────────┘  └─────┘  └─────────┘ │
                              │       ↑                     │      │
                              │    [VAD]              [streaming]  │
                              └─────────────────────────────────────┘

Streaming Flow:

User speaks → Whisper transcribes locally
AI responds (streamed) → buffer sentences
First sentence complete → TTS starts immediately
Audio streams to browser while AI continues
Result: ~50% faster perceived response

HTTPS for Mobile

Mobile browsers require HTTPS for microphone access. Options:

Tailscale Funnel (easiest):

tailscale funnel 8765
# Access via https://your-machine.tailnet-name.ts.net

nginx + Let's Encrypt:

server {
    listen 443 ssl;
    server_name voice.yourdomain.com;
    
    location / {
        proxy_pass http://127.0.0.1:8765;
        proxy_http_version 1.1;
        proxy_set_header Upgrade $http_upgrade;
        proxy_set_header Connection "upgrade";
    }
}

API

WebSocket Protocol

Connect to ws://localhost:8765/ws:

// Start recording
{ "type": "start_listening" }

// Send audio (base64 PCM float32, 16kHz)
{ "type": "audio", "data": "base64..." }

// Stop recording
{ "type": "stop_listening" }

// Receive events:
{ "type": "transcript", "text": "...", "final": true }
{ "type": "response_chunk", "text": "..." }        // Streaming text
{ "type": "audio_chunk", "data": "...", "sample_rate": 24000 }  // Streaming audio
{ "type": "response_complete", "text": "..." }     // Full response
{ "type": "vad_status", "speech_detected": true }  // VAD feedback

Roadmap

[x] WebSocket voice gateway
[x] Whisper STT (local)
[x] ElevenLabs TTS
[x] Streaming TTS (sentence-by-sentence)
[x] Voice Activity Detection (Silero)
[x] Text cleaning (markdown/hashtags/URLs)
[x] Continuous conversation mode
[x] OpenClaw gateway integration
[ ] WebRTC for lower latency
[ ] Voice cloning UI
[ ] Docker support

License

MIT License — see LICENSE.

Credits

faster-whisper — Local STT
ElevenLabs — Text-to-Speech
Silero VAD — Voice Activity Detection
Built for OpenClaw

Made with 🦞 by Purple Horizons

Agentic AI for Beginners

Build your first AI agent from scratch - tool use, ReAct pattern, memory, deployment

41 minBeginner

Comments (0)

to leave a comment.

No comments yet. Be the first to share your thoughts!

Related Skills

n8n

by n8n-io

Fair-code workflow automation platform with native AI capabilities. Combine visual building with custom code, self-host or cloud, 400+ integrations.

185,737

57,157

TypeScript

MCP Serversaiapis

View details

Compare

everything-claude-code

by affaan-m

The agent harness performance optimization system. Skills, instincts, memory, security, and research-first development for Claude Code, Codex, Opencode, Cursor and beyond.

167,815

26,004

JavaScript

AI Agentsai-agentsanthropic

The agent that grows with you

118,914

17,646

Python

AI Agentsaiai-agent

View details

Compare

llama.cpp

by ggml-org

LLM inference in C/C++

103,839

16,880

C++

AI Agentsggml

An open-source AI agent that brings the power of Gemini directly into your terminal.

102,494

13,347

TypeScript

AI Agentsaiai-agents

An AI SKILL that provide design intelligence for building professional UI/UX multiple platforms

70,976

7,302

Python

CLI Toolsai-skillsantigravity

View details

Compare

kata-orchestrator secure-claude-code

OpenClaw Voice

Open-source browser-based voice interface for AI assistants.

Talk to your AI like you talk to Alexa — but self-hosted, private, and connected to your own agent.

🌐 Website: openclawvoice.com

Features

Quick Start

# Clone
git clone https://github.com/Purple-Horizons/openclaw-voice.git
cd openclaw-voice

# Install
python3 -m venv .venv
source .venv/bin/activate
pip install -r requirements.txt

# Optional: Install VAD support (better noise handling)
pip install torch torchaudio

# Configure (create .env from example)
cp .env.example .env
# Edit .env with your API keys

# Run
PYTHONPATH=. python -m src.server.main

# Open http://localhost:8765

For OpenClaw Agents (Auto-Install)

One-liner for AI agents to self-install:

cd /tmp && \
git clone https://github.com/Purple-Horizons/openclaw-voice.git && \
cd openclaw-voice && \
python3 -m venv .venv && source .venv/bin/activate && \
pip install -r requirements.txt torch torchaudio && \
PYTHONPATH=. ELEVENLABS_API_KEY="$ELEVENLABS_API_KEY" OPENAI_API_KEY="$OPENAI_API_KEY" \
  nohup python -m src.server.main > /tmp/voice-server.log 2>&1 &

Configuration

Environment Variables

*One of OPENAI_API_KEY or OPENCLAW_GATEWAY_URL required.

Whisper Model Sizes

TTS Options

ElevenLabs uses eleven_turbo_v2_5 for fastest response.

OpenClaw Gateway Integration

Connect to your full OpenClaw agent (same memory, tools, and persona as text chat):

# .env
OPENCLAW_GATEWAY_URL=http://localhost:18789
OPENCLAW_GATEWAY_TOKEN=your-token
ELEVENLABS_API_KEY=your-key

Add to your openclaw.json:

{
  "gateway": {
    "http": {
      "endpoints": {
        "chatCompletions": { "enabled": true }
      }
    }
  },
  "agents": {
    "list": [
      {
        "id": "voice",
        "workspace": "/path/to/workspace",
        "model": "anthropic/claude-sonnet-4-5"
      }
    ]
  }
}

Architecture

┌─────────────┐   WebSocket   ┌─────────────────────────────────────┐
│   Browser   │◄────────────►│          Voice Server               │
│  (mic/spk)  │               │                                     │
└─────────────┘               │  ┌─────────┐  ┌─────┐  ┌─────────┐ │
                              │  │ Whisper │→│ AI  │→│ElevenLabs│ │
                              │  │  (STT)  │  │     │  │  (TTS)  │ │
                              │  └─────────┘  └─────┘  └─────────┘ │
                              │       ↑                     │      │
                              │    [VAD]              [streaming]  │
                              └─────────────────────────────────────┘

Streaming Flow:

User speaks → Whisper transcribes locally
AI responds (streamed) → buffer sentences
First sentence complete → TTS starts immediately
Audio streams to browser while AI continues
Result: ~50% faster perceived response

HTTPS for Mobile

Mobile browsers require HTTPS for microphone access. Options:

Tailscale Funnel (easiest):

tailscale funnel 8765
# Access via https://your-machine.tailnet-name.ts.net

nginx + Let's Encrypt:

server {
    listen 443 ssl;
    server_name voice.yourdomain.com;
    
    location / {
        proxy_pass http://127.0.0.1:8765;
        proxy_http_version 1.1;
        proxy_set_header Upgrade $http_upgrade;
        proxy_set_header Connection "upgrade";
    }
}

API

WebSocket Protocol

Connect to ws://localhost:8765/ws:

// Start recording
{ "type": "start_listening" }

// Send audio (base64 PCM float32, 16kHz)
{ "type": "audio", "data": "base64..." }

// Stop recording
{ "type": "stop_listening" }

// Receive events:
{ "type": "transcript", "text": "...", "final": true }
{ "type": "response_chunk", "text": "..." }        // Streaming text
{ "type": "audio_chunk", "data": "...", "sample_rate": 24000 }  // Streaming audio
{ "type": "response_complete", "text": "..." }     // Full response
{ "type": "vad_status", "speech_detected": true }  // VAD feedback

Roadmap

[x] WebSocket voice gateway
[x] Whisper STT (local)
[x] ElevenLabs TTS
[x] Streaming TTS (sentence-by-sentence)
[x] Voice Activity Detection (Silero)
[x] Text cleaning (markdown/hashtags/URLs)
[x] Continuous conversation mode
[x] OpenClaw gateway integration
[ ] WebRTC for lower latency
[ ] Voice cloning UI
[ ] Docker support

License

MIT License — see LICENSE.

Credits

faster-whisper — Local STT
ElevenLabs — Text-to-Speech
Silero VAD — Voice Activity Detection
Built for OpenClaw

Made with 🦞 by Purple Horizons