by OpenDemon
🎬 全自动 AI 视频代理 · 一句话生成带字幕成片 · Fully Automated AI Video Agent · Local Deployment
# Add to your Claude Code skills
git clone https://github.com/OpenDemon/Pilipili-AutoVideoGuides for using ai agents skills like Pilipili-AutoVideo.
Last scanned: 5/30/2026
{
"issues": [],
"status": "PASSED",
"scannedAt": "2026-05-30T15:52:50.459Z",
"npmAuditRan": true,
"pipAuditRan": false
}Pilipili-AutoVideo is an open-source ai agents skill for AI coding assistants such as Claude Code, Codex CLI, and ChatGPT, built by OpenDemon. 🎬 全自动 AI 视频代理 · 一句话生成带字幕成片 · Fully Automated AI Video Agent · Local Deployment. It has 180 GitHub stars.
Yes. Pilipili-AutoVideo passed SkillsLLM's automated security scan — a dependency vulnerability audit plus prompt-injection heuristics — with no high-severity issues. You can read the full report in the Security Report section on this page.
Clone the repository with "git clone https://github.com/OpenDemon/Pilipili-AutoVideo" and add it to your Claude Code skills directory (see the Installation section above).
Pilipili-AutoVideo is primarily written in Python. It is open-source under OpenDemon on GitHub, so you can review or fork the full source.
Yes. SkillsLLM lists many other AI Agents skills you can browse and compare side by side. Open the AI Agents category from the badge at the top of this page, or use the Related Skills and comparison links further down to weigh Pilipili-AutoVideo against similar tools.
No comments yet. Be the first to share your thoughts!
简体中文 · English · 繁體中文 · 日本語 · 한국어
📹 Demo — Replace this line with a GIF or video recording of the full workflow: topic input → scene review → final video output.
docs/demo.gif(to be recorded — see Contributing)
Pilipili-AutoVideo (噼哩噼哩) is a fully local, end-to-end AI video agent. Describe your video in one sentence — the system automatically handles script planning → keyframe image generation → TTS voiceover → video clip generation → FFmpeg assembly → subtitle burning, delivering a complete MP4 with subtitles and a CapCut/JianYing draft project for final human touch-ups.
Key differentiators from similar tools (LibTV, Huobao Drama):
duration — audio and video are always perfectly aligned┌─────────────────────────────────────────────────────────────┐
│ Pilipili-AutoVideo Architecture │
├─────────────────────────────────────────────────────────────┤
│ Frontend React 19 + TailwindCSS · 3-panel Studio · WS │
├─────────────────────────────────────────────────────────────┤
│ API Layer FastAPI · WebSocket · REST · LangGraph Workflow │
├──────────────┬──────────────┬──────────────┬────────────────┤
│ Brain Layer │ Vision Layer│ Motion Layer│ Voice Layer │
│ DeepSeek │ Nano Banana │ Kling 3.0 │ MiniMax TTS │
│ Kimi │ (Gemini 3 │ Seedance │ Speech 2.8 HD │
│ MiniMax LLM │ Pro Image) │ 1.5 Pro │ │
│ Gemini │ │ │ │
├──────────────┴──────────────┴──────────────┴────────────────┤
│ Assembly Python + FFmpeg · xfade transitions · WhisperX │
├─────────────────────────────────────────────────────────────┤
│ Draft Layer pyJianYingDraft · Auto CapCut/JianYing Draft │
├─────────────────────────────────────────────────────────────┤
│ Memory Mem0 · Local SQLite · Style Preference Twin │
└─────────────────────────────────────────────────────────────┘
| Layer | Technology | Description |
|---|---|---|
| Brain (LLM) | DeepSeek / Kimi / MiniMax / Gemini | Script generation, scene breakdown, metadata |
| Vision (Image) | Nano Banana (Gemini 3 Pro Image) | 4K keyframe lock, subject consistency foundation |
| Motion (Video) | Kling 3.0 / Seedance 1.5 Pro | Dual-engine smart routing, I2V generation |
| Voice (TTS) | MiniMax Speech 2.8 HD | Best-in-class Chinese TTS, voice cloning support |
| Assembly | Python + FFmpeg + WhisperX | xfade transitions + subtitle burning + audio mix |
| Draft | pyJianYingDraft | Auto-generate CapCut/JianYing draft projects |
| Memory | Mem0 (local SQLite / cloud sync) | Style preference digital twin |
| Backend | Python 3.10+ + FastAPI + LangGraph | Async workflow orchestration, WebSocket push |
| Frontend | React 19 + TailwindCSS + Wouter | 3-panel studio, no mock data |
| Software | Version | Notes |
|---|---|---|
| Python | 3.10+ | Backend runtime |
| Node.js | 18+ | Frontend build |
| FFmpeg | 4.0+ | Video assembly (required) |
| Docker | 20.0+ | Container deployment (optional) |
macOS:
brew install ffmpeg
Ubuntu / Debian:
sudo apt update && sudo apt install ffmpeg
Windows: Download from ffmpeg.org and add to PATH. Verify:
ffmpeg -version
# 1. Clone the repository
git clone https://github.com/OpenDemon/Pilipili-AutoVideo.git
cd Pilipili-AutoVideo
# 2. Install Python dependencies
pip install -r requirements.txt
# 3. Copy config template
cp configs/config.example.yaml configs/config.yaml
Edit configs/config.yaml:
llm:
provider: deepseek # deepseek | kimi | minimax | gemini
api_key: "sk-xxxx"
image_gen:
provider: nano_banana
api_key: "AIzaSy-xxxx" # Google AI Studio Key
video_gen:
default_engine: kling # kling | seedance | auto
kling:
api_key: "xxxx"
api_secret: "xxxx"
seedance:
api_key: "xxxx"
tts:
provider: minimax
api_key: "xxxx"
group_id: "xxxx"
memory:
provider: local # local | mem0_cloud
# mem0_api_key: "m0-xxxx" # Fill in for cloud sync
💡 You can also configure API keys visually at
http://localhost:3000/settings— no YAML editing required.
# Basic usage
python cli/main.py run --topic "Cyberpunk Mars colony, 60 seconds, cold color palette"
# Specify engine
python cli/main.py run \
--topic "Ancient palace romance story" \
--engine seedance \
--duration 90 \
--add-subtitles
# List past projects
python cli/main.py list
# Help
python cli/main.py --help
# Start backend
python cli/main.py server
# In another terminal, start frontend
cd frontend
pnpm install && pnpm dev
# Visit http://localhost:3000
# Copy environment variables
cp .env.example .env
# Edit .env with your API keys
# Start all services
docker-compose up -d
# Visit http://localhost:3000
Pilipili-AutoVideo/
├── api/
│ └── server.py # FastAPI backend + WebSocket
├── cli/
│ └── main.py # Click CLI entrypoint
├── core/
│ └── config.py # Global config (Pydantic Settings)
├── modules/
│ ├── llm.py # LLM script generation (multi-provider)
│ ├── image_gen.py # Nano Banana keyframe generation
│ ├── tts.py # MiniMax TTS + duration measurement
│ ├── video_gen.py # Kling 3.0 / Seedance 1.5 I2V
│ ├── assembler.py # FFmpeg assembly + subtitle burning
│ ├── jianying_draft.py # CapCut/JianYing draft generation
│ └── memory.py # Mem0 memory system
├── frontend/ # React 19 frontend (3-panel studio)
├── skills/
│ └── SKILL.md # Skill packaging spec
├── configs/
│ ├── config.example.yaml # Config template
│ └── config.yaml # Local config (gitignored)
├── tests/
│ └── test_pipeline.py # Unit tests (18 test cases)
├── data/
│ ├── outputs/ # Generated videos and drafts
│ └── memory/ # Memory database
├── docker-compose.yml
├── Dockerfile.backend
├── requirements.txt
└── pyproject.toml
The core workflow is orchestrated by LangGraph in the following stages:
User Input
│
▼
① Script Generation (LLM)
│ DeepSeek/Kimi expands one sentence into a structured storyboard
│ Each scene: voiceover text, visual description, motion description,
│ duration, transition, camera motion
│
▼
② Scene Review (optional human step)
│ Web UI shows scene list; user can edit each scene before confirming
│ CLI mode: auto-approved
│
▼
③ Parallel Generation (Keyframe Images + TTS Voiceover)
│ Nano Banana generates 4K keyframe images for each scene in parallel
│ MiniMax TTS generates voiceover for each scene, measuring exact ms duration
│
▼
④ Video Generation (Image-to-Video)
│ Uses keyframe as first frame, voiceover duration as video duration
│ Kling 3.0 (action/product) or Seedance 1.5 (narrative/multi-character)
│
▼
⑤ Assembly (FFmpeg)
│ xfade transitions + background music mixing + WhisperX subtitle burning
│
▼
⑥ Draft Export (CapCut/JianYing)
│ Auto-generates draft project preserving all scene assets and timeline
│
▼
⑦ Memory Update (Mem0)
After user rating, system learns style preferences for future generations
| Dimension | LibTV | Huobao Drama | Pilipili |
|---|---|---|---|
| Interaction | Node canvas, manual trigger | Form-based, step-by-step | Natural language, one sentence |
| Audio-Video Sync | Manual editing | Not explicitly supported | **Measure TTS duration → control video duration |