π¦ Open-source browser-based voice chat for AI assistants. Self-hosted, private, free. Whisper STT + ElevenLabs TTS. Works with OpenAI, Claude, or custom agents.
# Add to your Claude Code skills
git clone https://github.com/Purple-Horizons/openclaw-voiceBrowser-based voice chat for OpenClaw agents.
Adds voice chat capability to your OpenClaw agent. Users can speak to your agent via a web browser and hear responses in real-time.
Add to your openclaw.json:
{
"gateway": {
"http": {
"endpoints": {
"chatCompletions": {
"enabled": true
}
}
}
}
}
{
"agents": {
"list": [
{
"id": "voice",
"workspace": "/path/to/your/workspace",
"model": "anthropic/claude-sonnet-4-5"
}
]
}
}
# Clone and setup
git clone https://github.com/Purple-Horizons/openclaw-voice.git
cd openclaw-voice
uv sync # or pip install -r requirements.txt
# Configure
cp .env.example .env
# Edit .env with your OPENCLAW_GATEWAY_URL, OPENCLAW_GATEWAY_TOKEN, ELEVENLABS_API_KEY
# Run
PYTHONPATH=. python -m src.server.main
Open http://localhost:8765 in your browser.
For HTTPS (required for mobile mic), use Tailscale Funnel or your own SSL.
| Variable | Required | Description |
|----------|----------|-------------|
| OPENCLAW_GATEWAY_URL | Yes* | OpenClaw gateway URL (e.g., http://localhost:18789) |
| OPENCLAW_GATEWAY_TOKEN | Yes* | Gateway auth token |
| ELEVENLABS_API_KEY | Recommended | For high-quality TTS |
| OPENAI_API_KEY | Fallback | Used if gateway not configured |
*Required for full agent integration. Falls back to direct OpenAI if not set.
For production, enable auth:
OPENCLAW_REQUIRE_AUTH=true
OPENCLAW_MASTER_KEY=your-secret-key
Then generate user keys via POST /api/keys.
src/server/main.py - FastAPI serversrc/server/stt.py - Speech-to-text (Whisper)src/server/tts.py - Text-to-speech (ElevenLabs/Chatterbox)src/server/backend.py - AI backend (gateway or OpenAI)src/client/index.html - Browser UIMIT β Built by Purple Horizons
Open-source browser-based voice interface for AI assistants.
Talk to your AI like you talk to Alexa β but self-hosted, private, and connected to your own agent.
π Website: openclawvoice.com
| Feature | Description | |---------|-------------| | π€ Local STT | Whisper runs locally via faster-whisper. Your voice never leaves your machine. | | π Streaming TTS | ElevenLabs with sentence-by-sentence streaming. Hear responses while they generate. | | π― Voice Activity Detection | Silero VAD filters background noise. Works in noisy environments. | | π§Ή Smart Text Cleaning | Strips markdown, hashtags, URLs before TTS. No more "hash hash". | | π Any AI Backend | OpenAI, Claude, or full OpenClaw agent with memory and tools. | | π Browser-Based | No app install. Works on desktop and mobile. | | π Continuous Mode | Hands-free conversation. Auto-listens after each response. |
# Clone
git clone https://github.com/Purple-Horizons/openclaw-voice.git
cd openclaw-voice
# Install
python3 -m venv .venv
source .venv/bin/activate
pip install -r requirements.txt
# Optional: Install VAD support (better noise handling)
pip install torch torchaudio
# Configure (create .env from example)
cp .env.example .env
# Edit .env with your API keys
# Run
PYTHONPATH=. python -m src.server.main
# Open http://localhost:8765
One-liner for AI agents to self-install:
cd /tmp && \
git clone https://github.com/Purple-Horizons/openclaw-voice.git && \
cd openclaw-voice && \
python3 -m venv .venv && source .venv/bin/activate && \
pip install -r requirements.txt torch torchaudio && \
PYTHONPATH=. ELEVENLABS_API_KEY="$ELEVENLABS_API_KEY" OPENAI_API_KEY="$OPENAI_API_KEY" \
nohup python -m src.server.main > /tmp/voice-server.log 2>&1 &
| Variable | Required | Default | Description |
|----------|----------|---------|-------------|
| ELEVENLABS_API_KEY | Yes* | β | ElevenLabs API key for TTS |
| OPENAI_API_KEY | Yes* | β | OpenAI API key (if not using gateway) |
| OPENCLAW_GATEWAY_URL | No | β | OpenClaw gateway URL for full agent |
| OPENCLAW_GATEWAY_TOKEN | No | β | Gateway auth token |
| OPENCLAW_PORT | No | 8765 | Server port |
| OPENCLAW_STT_MODEL | No | base | Whisper model size |
| OPENCLAW_STT_DEVICE | No | auto | Device: auto, cpu, cuda, mps |
| OPENCLAW_REQUIRE_AUTH | No | false | Require API keys for clients |
*One of OPENAI_API_KEY or OPENCLAW_GATEWAY_URL required.
| Model | Speed | Quality | VRAM | Best For |
|-------|-------|---------|------|----------|
| tiny | Fastest | Fair | ~400MB | Quick testing |
| base | Fast | Good | ~1GB | Default. Good balance. |
| small | Medium | Better | ~2GB | Clearer transcription |
| medium | Slower | Great | ~5GB | Accuracy priority |
| large-v3-turbo | Slow | Best | ~6GB | Maximum accuracy |
| Backend | Type | Quality | Latency | Notes | |---------|------|---------|---------|-------| | ElevenLabs | Cloud | Excellent | ~500ms | Default. Streaming supported. | | Chatterbox | Local | Very Good | ~1s | MIT license, voice cloning | | XTTS-v2 | Local | Excellent | ~1s | Voice cloning supported | | Mock | Local | None | 0ms | For testing (silence) |
ElevenLabs uses eleven_turbo_v2_5 for fastest response.
Connect to your full OpenClaw agent (same memory, tools, and persona as text chat):
# .env
OPENCLAW_GATEWAY_URL=http://localhost:18789
OPENCLAW_GATEWAY_TOKEN=your-token
ELEVENLABS_API_KEY=your-key
Add to your openclaw.json:
{
"gateway": {
"http": {
"endpoints": {
"chatCompletions": { "enabled": true }
}
}
},
"agents": {
"list": [
{
"id": "voice",
"workspace": "/path/to/workspace",
"model": "anthropic/claude-sonnet-4-5"
}
]
}
}
βββββββββββββββ WebSocket βββββββββββββββββββββββββββββββββββββββ
β Browser βββββββββββββββΊβ Voice Server β
β (mic/spk) β β β
βββββββββββββββ β βββββββββββ βββββββ βββββββββββ β
β β Whisper βββ AI βββElevenLabsβ β
β β (STT) β β β β (TTS) β β
β βββββββββββ βββββββ βββββββββββ β
β β β β
β [VAD] [streaming] β
βββββββββββββββββββββββββββββββββββββββ
Streaming Flow:
Mobile browsers require HTTPS for microphone access. Options:
Tailscale Funnel (easiest):
tailscale funnel 8765
# Access via https://your-machine.tailnet-name.ts.net
nginx + Let's Encrypt:
server {
listen 443 ssl;
server_name voice.yourdomain.com;
location / {
proxy_pass http://127.0.0.1:8765;
proxy_http_version 1.1;
proxy_set_header Upgrade $http_upgrade;
proxy_set_header Connection "upgrade";
}
}
Connect to ws://localhost:8765/ws:
// Start recording
{ "type": "start_listening" }
// Send audio (base64 PCM float32, 16kHz)
{ "type": "audio", "data": "base64..." }
// Stop recording
{ "type": "stop_listening" }
// Receive events:
{ "type": "transcript", "text": "...", "final": true }
{ "type": "response_chunk", "text": "..." } // Streaming text
{ "type": "audio_chunk", "data": "...", "sample_rate": 24000 } // Streaming audio
{ "type": "response_complete", "text": "..." } // Full response
{ "type": "vad_status", "speech_detected": true } // VAD feedback
MIT License β see LICENSE.
Made with π¦ by Purple Horizons
No comments yet. Be the first to share your thoughts!