py-xiaozhi

Name: py-xiaozhi
Author: huangjunsen0406

Verified

Open-source AI assistant ecosystem with MCP integrations, multimodal workflows, IoT support, and cross-platform voice interaction.

3,396stars

710forks

Python

Installation

# Add to your Claude Code skills
git clone https://github.com/huangjunsen0406/py-xiaozhi

Getting Started

Guides for using ai agents skills like py-xiaozhi.

Caveman: Cut Claude Token Use by 65%
How agent-side prompt compression works, when to use it, and when not to.
What is an AI Skills Marketplace?
Definitions, how marketplaces work, and how to choose between them in 2026.
Getting Started with AI Skills

Security ReportVerified

Last scanned: 4/21/2026

{
  "issues": [],
  "status": "PASSED",
  "scannedAt": "2026-04-21T06:04:09.282Z",
  "semgrepRan": false,
  "npmAuditRan": true,
  "pipAuditRan": false
}

README.md

Frequently Asked Questions

What is py-xiaozhi?

py-xiaozhi is an open-source ai agents skill for AI coding assistants such as Claude Code, Codex CLI, and ChatGPT, built by huangjunsen0406. Open-source AI assistant ecosystem with MCP integrations, multimodal workflows, IoT support, and cross-platform voice interaction. It has 3,396 GitHub stars.

Is py-xiaozhi safe to use?

Yes. py-xiaozhi passed SkillsLLM's automated security scan — a dependency vulnerability audit plus prompt-injection heuristics — with no high-severity issues. You can read the full report in the Security Report section on this page.

How do I install py-xiaozhi?

Clone the repository with "git clone https://github.com/huangjunsen0406/py-xiaozhi" and add it to your Claude Code skills directory (see the Installation section above).

What programming language is py-xiaozhi written in?

py-xiaozhi is primarily written in Python. It is open-source under huangjunsen0406 on GitHub, so you can review or fork the full source.

Are there alternatives to py-xiaozhi?

Yes. SkillsLLM lists many other AI Agents skills you can browse and compare side by side. Open the AI Agents category from the badge at the top of this page, or use the Related Skills and comparison links further down to weigh py-xiaozhi against similar tools.

Agentic AI for Beginners

Build your first AI agent from scratch - tool use, ReAct pattern, memory, deployment

41 minBeginner

Comments (0)

to leave a comment.

No comments yet. Be the first to share your thoughts!

Related Skills

superpowers

by obra

An agentic skills framework & software development methodology that works.

234,966

goclaw ii-agent

py-xiaozhi

English | 简体中文

❤️ Sponsors

Want to appear here?

GitDo.net

Thanks to GitDo.net for sponsoring this project! GitDo.net is an AI API aggregation platform — one API Key for Claude, Gemini, GPT and other major models. Direct connection, no proxy needed, stable and efficient. An ideal choice for enterprise-grade AI programming. Visit GitDo.net


Token能量站	Thanks to Token能量站 (Factory.pub) for sponsoring this project! Providing API relay for GPT, Grok, Claude and other major models — stable, reliable, and affordable. Register here
良心AI	Thanks to 良心AI for sponsoring this project! Providing API relay for GPT, Claude, Gemini and other major models — direct connection, stable and high-speed. Register here

About

py-xiaozhi is a lightweight, cross-platform multi-modal AI interaction framework built on Python's async architecture. It supports real-time voice streaming, vision-language tasks, and IoT device control. Deployable across Windows, macOS, Linux desktops, and ARM embedded platforms (Raspberry Pi, Horizon Robotics RDK, Jetson Nano), it bridges the gap between Large Language Models and physical hardware — out of the box.

Evolved from the xiaozhi-esp32 firmware project. Officially adopted by D-Robotics (xiaozhi-in-rdk) as an upstream dependency.

Related Projects

xiaozhi-desktop — Electron desktop client with AEC echo cancellation, Live2D, floating window modes, and Windows / macOS installers

Demo

Bilibili Demo Video

Key Features

Real-time Voice AI — Opus codec with auto frame detection (RFC 6716 TOC parsing), async streaming, sub-20ms latency
Multi-modal Vision — Camera capture + vision-language model integration for image understanding and scene perception
MCP Tool Ecosystem — Modular JSON-RPC 2.0 tool server: music player, camera, screenshot, app management, weather, volume control
Cross-platform Deployment — Windows 10+ / macOS 10.15+ / Linux (x86_64 & ARM), optimized for Raspberry Pi and edge boards
Multiple UI Modes — PySide6 + QML GUI / CLI / GPIO, adapting to desktop, headless server, and embedded environments
Offline Wake Word — Sherpa-ONNX based on-device keyword spotting with custom wake word support
IoT & Embodied AI Ready — GPIO interface for robotics control, hardware actuation, and sensor integration
WebSocket / MQTT — Dual protocol communication with WSS/TLS encrypted transmission and auto-reconnection
Plugin Architecture — Event-driven async design, clean dependency injection, extensible plugin system

System Requirements

Basic Requirements

Python Version: 3.10 - 3.12
Operating System: Windows 10+, macOS 10.15+, Linux
Audio Devices: Microphone and speaker devices
Network Connection: Stable internet connection (for AI services and online features)

Recommended Configuration

Memory: At least 4GB RAM (8GB+ recommended)
Processor: Modern CPU with AVX instruction set support
Storage: At least 2GB available disk space (for model files and cache)
Audio: Audio devices supporting 16kHz sampling rate

Optional Feature Requirements

Voice Wake-up: Requires downloading Sherpa-ONNX speech recognition models
Camera Features: Requires camera device and OpenCV support

Read This First

Carefully read 项目文档 for startup tutorials and file descriptions
The main branch has the latest code; manually reinstall pip dependencies after each update to ensure you have new dependencies

Zero to Xiaozhi Client (Video Tutorial)

Technical Architecture

Core Architecture Design

Event-Driven Architecture: Based on asyncio asynchronous event loop, supporting high-concurrency processing
Layered Design: Clear separation of application layer, protocol layer, and UI layer
Dependency Injection: Component lifecycle managed via bootstrap container
Plugin System: Audio, UI, MCP tools and other components loaded via plugin system

Key Technical Components

Audio Processing: Opus codec, real-time resampling
Speech Recognition: Sherpa-ONNX offline models, wake word recognition
Protocol Communication: WebSocket/MQTT dual protocol support, encrypted transmission, auto-reconnection
Configuration System: Hierarchical configuration, dot notation access, dynamic updates

Performance Optimization

Async First: Full system asynchronous architecture, avoiding blocking operations
Memory Management: Smart caching, garbage collection
Audio Optimization: 5ms low-latency processing, queue management, streaming transmission
Concurrency Control: Task pool management, semaphore control, thread safety

Security Mechanisms

Encrypted Communication: WSS/TLS encryption, certificate verification
Device Authentication: Dual protocol activation, device fingerprint recognition
Access Control: Tool permission management, API access control
Error Isolation: Exception isolation, fault recovery, graceful degradation

Development Guide

Project Structure

py-xiaozhi/
├── main.py                     # Application entry point
├── src/
│   ├── activation/             # Device activation
│   ├── audio_codecs/           # Audio codecs
│   ├── audio_processing/       # Wake word detection
│   ├── bootstrap/              # Application bootstrap & dependency injection
│   ├── constants/              # Constants
│   ├── core/                   # Core infrastructure (event bus, state management, task management, etc.)
│   ├── logging/                # Logging subsystem
│   ├── mcp/                    # MCP tool system
│   │   ├── mcp_server.py       # MCP server
│   │   └── tools/              # Tool modules (music/camera/screenshot/app/weather/volume)
│   ├── plugins/                # Plugin system (audio, UI, MCP, wake word, shortcuts)
│   ├── protocols/              # Communication protocols (WebSocket/MQTT)
│   ├── ui/                     # User interface
│   │   ├── gui/                # PySide6 + QML graphical interface
│   │   ├── cli/                # Command line interface
│   │   └── gpio/               # GPIO embedded interface
│   └── utils/                  # Utility functions
├── libs/                       # Third-party native libraries
│   ├── libopus/                # Opus audio codec library
│   └── webrtc_apm/             # WebRTC audio processing module
├── models/                     # Wake word models
├── assets/                     # Static resources
├── scripts/                    # Auxiliary scripts
├── documents/                  # VitePress documentation site
├── pyproject.toml              # Project configuration
└── build.json                  # Build configuration

Development Environment Setup

# Clone project
git clone https://github.com/huangjunsen0406/py-xiaozhi.git
cd py-xiaozhi

# Base install (CLI / GPIO mode)
uv sync                                    # Recommended (uv users)
# or: pip install -e .                    # pip users

# GUI mode (extra: PySide6 + qasync)
uv sync --extra gui                        # Recommended (uv users)
# or: pip install -e '.[gui]'             # pip users

# Full development environment (GUI + test / packaging tools)
uv sync --extra gui --group dev

# Code formatting
./format_code.sh

# Run program - GUI mode (default; requires gui extra)
python main.py

# Run program - CLI mode (base install is enough)
python main.py --mode cli

# Specify communication protocol
python main.py --protocol websocket  # WebSocket (