Open-source AI assistant ecosystem with MCP integrations, multimodal workflows, IoT support, and cross-platform voice interaction.
# Add to your Claude Code skills
git clone https://github.com/huangjunsen0406/py-xiaozhiLast scanned: 4/21/2026
{
"issues": [],
"status": "PASSED",
"scannedAt": "2026-04-21T06:04:09.282Z",
"semgrepRan": false,
"npmAuditRan": true,
"pipAuditRan": false
}No comments yet. Be the first to share your thoughts!
English | 简体中文
py-xiaozhi is a lightweight, cross-platform multi-modal AI interaction framework built on Python's async architecture. It supports real-time voice streaming, vision-language tasks, and IoT device control. Deployable across Windows, macOS, Linux desktops, and ARM embedded platforms (Raspberry Pi, Horizon Robotics RDK, Jetson Nano), it bridges the gap between Large Language Models and physical hardware — out of the box.
Evolved from the xiaozhi-esp32 firmware project. Officially adopted by D-Robotics (xiaozhi-in-rdk) as an upstream dependency.

Zero to Xiaozhi Client (Video Tutorial)
py-xiaozhi/
├── main.py # Application entry point
├── src/
│ ├── activation/ # Device activation
│ ├── audio_codecs/ # Audio codecs
│ ├── audio_processing/ # Wake word detection
│ ├── bootstrap/ # Application bootstrap & dependency injection
│ ├── constants/ # Constants
│ ├── core/ # Core infrastructure (event bus, state management, task management, etc.)
│ ├── logging/ # Logging subsystem
│ ├── mcp/ # MCP tool system
│ │ ├── mcp_server.py # MCP server
│ │ └── tools/ # Tool modules (music/camera/screenshot/app/weather/volume)
│ ├── plugins/ # Plugin system (audio, UI, MCP, wake word, shortcuts)
│ ├── protocols/ # Communication protocols (WebSocket/MQTT)
│ ├── ui/ # User interface
│ │ ├── gui/ # PySide6 + QML graphical interface
│ │ ├── cli/ # Command line interface
│ │ └── gpio/ # GPIO embedded interface
│ └── utils/ # Utility functions
├── libs/ # Third-party native libraries
│ ├── libopus/ # Opus audio codec library
│ └── webrtc_apm/ # WebRTC audio processing module
├── models/ # Wake word models
├── assets/ # Static resources
├── scripts/ # Auxiliary scripts
├── documents/ # VitePress documentation site
├── pyproject.toml # Project configuration
└── build.json # Build configuration
# Clone project
git clone https://github.com/huangjunsen0406/py-xiaozhi.git
cd py-xiaozhi
# Base install (CLI / GPIO mode)
uv sync # Recommended (uv users)
# or: pip install -e . # pip users
# GUI mode (extra: PySide6 + qasync)
uv sync --extra gui # Recommended (uv users)
# or: pip install -e '.[gui]' # pip users
# Full development environment (GUI + test / packaging tools)
uv sync --extra gui --group dev
# Code formatting
./format_code.sh
# Run program - GUI mode (default; requires gui extra)
python main.py
# Run program - CLI mode (base install is enough)
python main.py --mode cli
# Specify communication protocol
python main.py --protocol websocket # WebSocket (default)
python main.py --protocol mqtt # MQTT protocol
async/await syntax, avoid blocking operationsConfigManager for unified configuration accesssrc/mcp/tools/ directoryProtocol abstract base classsrc/plugins/ +----------------+
| |
v |
+------+ Wake/Button +------------+ | +------------+
| IDLE | -----------> | CONNECTING | --+-> | LISTENING |
+------+ +------------+ +------------+
^ |
| | Voice Recognition Complete
| +------------+