by OpenDemon
๐ฌ ๅ จ่ชๅจ AI ่ง้ขไปฃ็ ยท ไธๅฅ่ฏ็ๆๅธฆๅญๅนๆ็ ยท Fully Automated AI Video Agent ยท Local Deployment
# Add to your Claude Code skills
git clone https://github.com/OpenDemon/Pilipili-AutoVideo็ฎไฝไธญๆ ยท English ยท ็น้ซไธญๆ ยท ๆฅๆฌ่ช ยท ํ๊ตญ์ด
๐น Demo โ Replace this line with a GIF or video recording of the full workflow: topic input โ scene review โ final video output.
docs/demo.gif(to be recorded โ see Contributing)
Pilipili-AutoVideo (ๅผๅฉๅผๅฉ) is a fully local, end-to-end AI video agent. Describe your video in one sentence โ the system automatically handles script planning โ keyframe image generation โ TTS voiceover โ video clip generation โ FFmpeg assembly โ subtitle burning, delivering a complete MP4 with subtitles and a CapCut/JianYing draft project for final human touch-ups.
Key differentiators from similar tools (LibTV, Huobao Drama):
duration โ audio and video are always perfectly alignedโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
โ Pilipili-AutoVideo Architecture โ
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโค
โ Frontend React 19 + TailwindCSS ยท 3-panel Studio ยท WS โ
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโค
โ API Layer FastAPI ยท WebSocket ยท REST ยท LangGraph Workflow โ
โโโโโโโโโโโโโโโโฌโโโโโโโโโโโโโโโฌโโโโโโโโโโโโโโโฌโโโโโโโโโโโโโโโโโค
โ Brain Layer โ Vision Layerโ Motion Layerโ Voice Layer โ
โ DeepSeek โ Nano Banana โ Kling 3.0 โ MiniMax TTS โ
โ Kimi โ (Gemini 3 โ Seedance โ Speech 2.8 HD โ
โ MiniMax LLM โ Pro Image) โ 1.5 Pro โ โ
โ Gemini โ โ โ โ
โโโโโโโโโโโโโโโโดโโโโโโโโโโโโโโโดโโโโโโโโโโโโโโโดโโโโโโโโโโโโโโโโโค
โ Assembly Python + FFmpeg ยท xfade transitions ยท WhisperX โ
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโค
โ Draft Layer pyJianYingDraft ยท Auto CapCut/JianYing Draft โ
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโค
โ Memory Mem0 ยท Local SQLite ยท Style Preference Twin โ
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
| Layer | Technology | Description | | :--- | :--- | :--- | | Brain (LLM) | DeepSeek / Kimi / MiniMax / Gemini | Script generation, scene breakdown, metadata | | Vision (Image) | Nano Banana (Gemini 3 Pro Image) | 4K keyframe lock, subject consistency foundation | | Motion (Video) | Kling 3.0 / Seedance 1.5 Pro | Dual-engine smart routing, I2V generation | | Voice (TTS) | MiniMax Speech 2.8 HD | Best-in-class Chinese TTS, voice cloning support | | Assembly | Python + FFmpeg + WhisperX | xfade transitions + subtitle burning + audio mix | | Draft | pyJianYingDraft | Auto-generate CapCut/JianYing draft projects | | Memory | Mem0 (local SQLite / cloud sync) | Style preference digital twin | | Backend | Python 3.10+ + FastAPI + LangGraph | Async workflow orchestration, WebSocket push | | Frontend | React 19 + TailwindCSS + Wouter | 3-panel studio, no mock data |
| Software | Version | Notes | | :--- | :--- | :--- | | Python | 3.10+ | Backend runtime | | Node.js | 18+ | Frontend build | | FFmpeg | 4.0+ | Video assembly (required) | | Docker | 20.0+ | Container deployment (optional) |
macOS:
brew install ffmpeg
Ubuntu / Debian:
sudo apt update && sudo apt install ffmpeg
Windows: Download from ffmpeg.org and add to PATH. Verify:
ffmpeg -version
# 1. Clone the repository
git clone https://github.com/OpenDemon/Pilipili-AutoVideo.git
cd Pilipili-AutoVideo
# 2. Install Python dependencies
pip install -r requirements.txt
# 3. Copy config template
cp configs/config.example.yaml configs/config.yaml
Edit configs/config.yaml:
llm:
provider: deepseek # deepseek | kimi | minimax | gemini
api_key: "sk-xxxx"
image_gen:
provider: nano_banana
api_key: "AIzaSy-xxxx" # Google AI Studio Key
video_gen:
default_engine: kling # kling | seedance | auto
kling:
api_key: "xxxx"
api_secret: "xxxx"
seedance:
api_key: "xxxx"
tts:
provider: minimax
api_key: "xxxx"
group_id: "xxxx"
memory:
provider: local # local | mem0_cloud
# mem0_api_key: "m0-xxxx" # Fill in for cloud sync
๐ก You can also configure API keys visually at
http://localhost:3000/settingsโ no YAML editing required.
# Basic usage
python cli/main.py run --topic "Cyberpunk Mars colony, 60 seconds, cold color palette"
# Specify engine
python cli/main.py run \
--topic "Ancient palace romance story" \
--engine seedance \
--duration 90 \
--add-subtitles
# List past projects
python cli/main.py list
# Help
python cli/main.py --help
# Start backend
python cli/main.py server
# In another terminal, start frontend
cd frontend
pnpm install && pnpm dev
# Visit http://localhost:3000
# Copy environment variables
cp .env.example .env
# Edit .env with your API keys
# Start all services
docker-compose up -d
# Visit http://localhost:3000
Pilipili-AutoVideo/
โโโ api/
โ โโโ server.py # FastAPI backend + WebSocket
โโโ cli/
โ โโโ main.py # Click CLI entrypoint
โโโ core/
โ โโโ config.py # Global config (Pydantic Settings)
โโโ modules/
โ โโโ llm.py # LLM script generation (multi-provider)
โ โโโ image_gen.py # Nano Banana keyframe generation
โ โโโ tts.py # MiniMax TTS + duration measurement
โ โโโ video_gen.py # Kling 3.0 / Seedance 1.5 I2V
โ โโโ assembler.py # FFmpeg assembly + subtitle burning
โ โโโ jianying_draft.py # CapCut/JianYing draft generation
โ โโโ memory.py # Mem0 memory system
โโโ frontend/ # React 19 frontend (3-panel studio)
โโโ skills/
โ โโโ SKILL.md # Skill packaging spec
โโโ configs/
โ โโโ config.example.yaml # Config template
โ โโโ config.yaml # Local config (gitignored)
โโโ tests/
โ โโโ test_pipeline.py # Unit tests (18 test cases)
โโโ data/
โ โโโ outputs/ # Generated videos and drafts
โ โโโ memory/ # Memory database
โโโ docker-compose.yml
โโโ Dockerfile.backend
โโโ requirements.txt
โโโ pyproject.toml
The core workflow is orchestrated by LangGraph in the following stages:
User Input
โ
โผ
โ Script Generation (LLM)
โ DeepSeek/Kimi expands one sentence into a structured storyboard
โ Each scene: voiceover text, visual description, motion description,
โ duration, transition, camera motion
โ
โผ
โก Scene Review (optional human step)
โ Web UI shows scene list; user can edit each scene before confirming
โ CLI mode: auto-approved
โ
โผ
โข Parallel Generation (Keyframe Images + TTS Voiceover)
โ Nano Banana generates 4K keyframe images for each scene in parallel
โ MiniMax TTS generates voiceover for each scene, measuring exact ms duration
โ
โผ
โฃ Video Generation (Image-to-Video)
โ Uses keyframe as first frame, voiceover duration as video duration
โ Kling 3.0 (action/product) or Seedance 1.5 (narrative/multi-character)
โ
โผ
โค Assembly (FFmpeg)
โ xfade transitions + background music mixing + WhisperX subtitle burning
โ
โผ
โฅ Draft Export (CapCut/JianYing)
โ Auto-generates draft project preserving all scene assets and timeline
โ
โผ
โฆ Memory Update (Mem0)
After user rating, system learns style preferences for future generations
| Dimension | LibTV | Huobao Drama | Pilipili | | :--- | :---: | :---: | :---: | | Interaction | Node canvas, manual trigger | Form-based, step-by-step | Natural language, one sentence | | Audio-Video Sync | Manual editing | Not explicitly supported | **Measure TTS duration โ control video duration
No comments yet. Be the first to share your thoughts!