by OpenDemon
🎬 全自动 AI 视频代理 · 一句话生成带字幕成片 · Fully Automated AI Video Agent · Local Deployment
# Add to your Claude Code skills
git clone https://github.com/OpenDemon/Pilipili-AutoVideoGuides for using ai agents skills like Pilipili-AutoVideo.
简体中文 · English · 繁體中文 · 日本語 · 한국어
📹 Demo — Replace this line with a GIF or video recording of the full workflow: topic input → scene review → final video output.
docs/demo.gif(to be recorded — see Contributing)
Pilipili-AutoVideo (噼哩噼哩) is a fully local, end-to-end AI video agent. Describe your video in one sentence — the system automatically handles script planning → keyframe image generation → TTS voiceover → video clip generation → FFmpeg assembly → subtitle burning, delivering a complete MP4 with subtitles and a CapCut/JianYing draft project for final human touch-ups.
Key differentiators from similar tools (LibTV, Huobao Drama):
duration — audio and video are always perfectly alignedNo comments yet. Be the first to share your thoughts!
┌─────────────────────────────────────────────────────────────┐
│ Pilipili-AutoVideo Architecture │
├─────────────────────────────────────────────────────────────┤
│ Frontend React 19 + TailwindCSS · 3-panel Studio · WS │
├─────────────────────────────────────────────────────────────┤
│ API Layer FastAPI · WebSocket · REST · LangGraph Workflow │
├──────────────┬──────────────┬──────────────┬────────────────┤
│ Brain Layer │ Vision Layer│ Motion Layer│ Voice Layer │
│ DeepSeek │ Nano Banana │ Kling 3.0 │ MiniMax TTS │
│ Kimi │ (Gemini 3 │ Seedance │ Speech 2.8 HD │
│ MiniMax LLM │ Pro Image) │ 1.5 Pro │ │
│ Gemini │ │ │ │
├──────────────┴──────────────┴──────────────┴────────────────┤
│ Assembly Python + FFmpeg · xfade transitions · WhisperX │
├─────────────────────────────────────────────────────────────┤
│ Draft Layer pyJianYingDraft · Auto CapCut/JianYing Draft │
├─────────────────────────────────────────────────────────────┤
│ Memory Mem0 · Local SQLite · Style Preference Twin │
└─────────────────────────────────────────────────────────────┘
| Layer | Technology | Description | | :--- | :--- | :--- | | Brain (LLM) | DeepSeek / Kimi / MiniMax / Gemini | Script generation, scene breakdown, metadata | | Vision (Image) | Nano Banana (Gemini 3 Pro Image) | 4K keyframe lock, subject consistency foundation | | Motion (Video) | Kling 3.0 / Seedance 1.5 Pro | Dual-engine smart routing, I2V generation | | Voice (TTS) | MiniMax Speech 2.8 HD | Best-in-class Chinese TTS, voice cloning support | | Assembly | Python + FFmpeg + WhisperX | xfade transitions + subtitle burning + audio mix | | Draft | pyJianYingDraft | Auto-generate CapCut/JianYing draft projects | | Memory | Mem0 (local SQLite / cloud sync) | Style preference digital twin | | Backend | Python 3.10+ + FastAPI + LangGraph | Async workflow orchestration, WebSocket push | | Frontend | React 19 + TailwindCSS + Wouter | 3-panel studio, no mock data |
| Software | Version | Notes | | :--- | :--- | :--- | | Python | 3.10+ | Backend runtime | | Node.js | 18+ | Frontend build | | FFmpeg | 4.0+ | Video assembly (required) | | Docker | 20.0+ | Container deployment (optional) |
macOS:
brew install ffmpeg
Ubuntu / Debian:
sudo apt update && sudo apt install ffmpeg
Windows: Download from ffmpeg.org and add to PATH. Verify:
ffmpeg -version
# 1. Clone the repository
git clone https://github.com/OpenDemon/Pilipili-AutoVideo.git
cd Pilipili-AutoVideo
# 2. Install Python dependencies
pip install -r requirements.txt
# 3. Copy config template
cp configs/config.example.yaml configs/config.yaml
Edit configs/config.yaml:
llm:
provider: deepseek # deepseek | kimi | minimax | gemini
api_key: "sk-xxxx"
image_gen:
provider: nano_banana
api_key: "AIzaSy-xxxx" # Google AI Studio Key
video_gen:
default_engine: kling # kling | seedance | auto
kling:
api_key: "xxxx"
api_secret: "xxxx"
seedance:
api_key: "xxxx"
tts:
provider: minimax
api_key: "xxxx"
group_id: "xxxx"
memory:
provider: local # local | mem0_cloud
# mem0_api_key: "m0-xxxx" # Fill in for cloud sync
💡 You can also configure API keys visually at
http://localhost:3000/settings— no YAML editing required.
# Basic usage
python cli/main.py run --topic "Cyberpunk Mars colony, 60 seconds, cold color palette"
# Specify engine
python cli/main.py run \
--topic "Ancient palace romance story" \
--engine seedance \
--duration 90 \
--add-subtitles
# List past projects
python cli/main.py list
# Help
python cli/main.py --help
# Start backend
python cli/main.py server
# In another terminal, start frontend
cd frontend
pnpm install && pnpm dev
# Visit http://localhost:3000
# Copy environment variables
cp .env.example .env
# Edit .env with your API keys
# Start all services
docker-compose up -d
# Visit http://localhost:3000
Pilipili-AutoVideo/
├── api/
│ └── server.py # FastAPI backend + WebSocket
├── cli/
│ └── main.py # Click CLI entrypoint
├── core/
│ └── config.py # Global config (Pydantic Settings)
├── modules/
│ ├── llm.py # LLM script generation (multi-provider)
│ ├── image_gen.py # Nano Banana keyframe generation
│ ├── tts.py # MiniMax TTS + duration measurement
│ ├── video_gen.py # Kling 3.0 / Seedance 1.5 I2V
│ ├── assembler.py # FFmpeg assembly + subtitle burning
│ ├── jianying_draft.py # CapCut/JianYing draft generation
│ └── memory.py # Mem0 memory system
├── frontend/ # React 19 frontend (3-panel studio)
├── skills/
│ └── SKILL.md # Skill packaging spec
├── configs/
│ ├── config.example.yaml # Config template
│ └── config.yaml # Local config (gitignored)
├── tests/
│ └── test_pipeline.py # Unit tests (18 test cases)
├── data/
│ ├── outputs/ # Generated videos and drafts
│ └── memory/ # Memory database
├── docker-compose.yml
├── Dockerfile.backend
├── requirements.txt
└── pyproject.toml
The core workflow is orchestrated by LangGraph in the following stages:
User Input
│
▼
① Script Generation (LLM)
│ DeepSeek/Kimi expands one sentence into a structured storyboard
│ Each scene: voiceover text, visual description, motion description,
│ duration, transition, camera motion
│
▼
② Scene Review (optional human step)
│ Web UI shows scene list; user can edit each scene before confirming
│ CLI mode: auto-approved
│
▼
③ Parallel Generation (Keyframe Images + TTS Voiceover)
│ Nano Banana generates 4K keyframe images for each scene in parallel
│ MiniMax TTS generates voiceover for each scene, measuring exact ms duration
│
▼
④ Video Generation (Image-to-Video)
│ Uses keyframe as first frame, voiceover duration as video duration
│ Kling 3.0 (action/product) or Seedance 1.5 (narrative/multi-character)
│
▼
⑤ Assembly (FFmpeg)
│ xfade transitions + background music mixing + WhisperX subtitle burning
│
▼
⑥ Draft Export (CapCut/JianYing)
│ Auto-generates draft project preserving all scene assets and timeline
│
▼
⑦ Memory Update (Mem0)
After user rating, system learns style preferences for future generations
| Dimension | LibTV | Huobao Drama | Pilipili | | :--- | :---: | :---: | :---: | | Interaction | Node canvas, manual trigger | Form-based, step-by-step | Natural language, one sentence | | Audio-Video Sync | Manual editing | Not explicitly supported | **Measure TTS duration → control video duration