Pilipili-AutoVideo

Name: Pilipili-AutoVideo
Author: OpenDemon

Verified

🎬 全自动 AI 视频代理 · 一句话生成带字幕成片 · Fully Automated AI Video Agent · Local Deployment

180stars

27forks

Python

Installation

# Add to your Claude Code skills
git clone https://github.com/OpenDemon/Pilipili-AutoVideo

Getting Started

Guides for using ai agents skills like Pilipili-AutoVideo.

Caveman: Cut Claude Token Use by 65%
How agent-side prompt compression works, when to use it, and when not to.
What is an AI Skills Marketplace?
Definitions, how marketplaces work, and how to choose between them in 2026.
Getting Started with AI Skills

Security ReportVerified

Last scanned: 5/30/2026

{
  "issues": [],
  "status": "PASSED",
  "scannedAt": "2026-05-30T15:52:50.459Z",
  "npmAuditRan": true,
  "pipAuditRan": false
}

README.md

Frequently Asked Questions

What is Pilipili-AutoVideo?

Pilipili-AutoVideo is an open-source ai agents skill for AI coding assistants such as Claude Code, Codex CLI, and ChatGPT, built by OpenDemon. 🎬 全自动 AI 视频代理 · 一句话生成带字幕成片 · Fully Automated AI Video Agent · Local Deployment. It has 180 GitHub stars.

Is Pilipili-AutoVideo safe to use?

Yes. Pilipili-AutoVideo passed SkillsLLM's automated security scan — a dependency vulnerability audit plus prompt-injection heuristics — with no high-severity issues. You can read the full report in the Security Report section on this page.

How do I install Pilipili-AutoVideo?

Clone the repository with "git clone https://github.com/OpenDemon/Pilipili-AutoVideo" and add it to your Claude Code skills directory (see the Installation section above).

What programming language is Pilipili-AutoVideo written in?

Pilipili-AutoVideo is primarily written in Python. It is open-source under OpenDemon on GitHub, so you can review or fork the full source.

Are there alternatives to Pilipili-AutoVideo?

Yes. SkillsLLM lists many other AI Agents skills you can browse and compare side by side. Open the AI Agents category from the badge at the top of this page, or use the Related Skills and comparison links further down to weigh Pilipili-AutoVideo against similar tools.

Agentic AI for Beginners

Build your first AI agent from scratch - tool use, ReAct pattern, memory, deployment

41 minBeginner

Comments (0)

to leave a comment.

No comments yet. Be the first to share your thoughts!

Related Skills

superpowers

by obra

An agentic skills framework & software development methodology that works.

234,966

J.A.R.V.I.S code-context-engine

🎬 Pilipili-AutoVideo · 噼哩噼哩

Fully Automated AI Video Agent · Local Deployment · One Sentence to Final Cut

简体中文 · English · 繁體中文 · 日本語 · 한국어

📹 Demo — Replace this line with a GIF or video recording of the full workflow: topic input → scene review → final video output. docs/demo.gif (to be recorded — see Contributing)

📖 Overview

Pilipili-AutoVideo (噼哩噼哩) is a fully local, end-to-end AI video agent. Describe your video in one sentence — the system automatically handles script planning → keyframe image generation → TTS voiceover → video clip generation → FFmpeg assembly → subtitle burning, delivering a complete MP4 with subtitles and a CapCut/JianYing draft project for final human touch-ups.

Key differentiators from similar tools (LibTV, Huobao Drama):

Absolute Audio-Video Sync: TTS voiceover is generated first and its exact millisecond duration is measured, then used to control video duration — audio and video are always perfectly aligned
Keyframe Lock Strategy: Nano Banana generates a 4K keyframe image first, then Image-to-Video (I2V) produces the clip — ensuring consistently high visual quality with no subject drift
Digital Twin Memory: Mem0-powered memory system learns your style preferences over time, injecting your creative habits into every new generation
Skill Integration: The entire workflow is packaged as a standard Skill, callable by any AI Agent

🎯 Core Features

🤖 Natural Language Driven: One sentence → full video, no manual node operations required
🎨 Premium Visual Quality: Nano Banana keyframe lock + Kling 3.0 / Seedance 1.5 dual-engine, exceptional subject consistency
🔊 Perfect Audio-Video Sync: Measure voiceover duration first, control video duration accordingly — never misaligned
✂️ CapCut/JianYing Draft Export: AI handles 90%, you fine-tune the last 10% in CapCut
🧠 Gets Smarter Over Time: Mem0 memory system learns your aesthetic preferences with every project
🔌 Agent-Callable: Packaged as a standard Skill, seamlessly integrates into larger automation workflows

🛠️ Architecture

┌─────────────────────────────────────────────────────────────┐
│                  Pilipili-AutoVideo Architecture             │
├─────────────────────────────────────────────────────────────┤
│  Frontend    React 19 + TailwindCSS · 3-panel Studio · WS   │
├─────────────────────────────────────────────────────────────┤
│  API Layer   FastAPI · WebSocket · REST · LangGraph Workflow │
├──────────────┬──────────────┬──────────────┬────────────────┤
│  Brain Layer │  Vision Layer│  Motion Layer│  Voice Layer   │
│  DeepSeek    │  Nano Banana │  Kling 3.0   │  MiniMax TTS   │
│  Kimi        │  (Gemini 3   │  Seedance    │  Speech 2.8 HD │
│  MiniMax LLM │   Pro Image) │  1.5 Pro     │                │
│  Gemini      │              │              │                │
├──────────────┴──────────────┴──────────────┴────────────────┤
│  Assembly    Python + FFmpeg · xfade transitions · WhisperX  │
├─────────────────────────────────────────────────────────────┤
│  Draft Layer pyJianYingDraft · Auto CapCut/JianYing Draft    │
├─────────────────────────────────────────────────────────────┤
│  Memory      Mem0 · Local SQLite · Style Preference Twin     │
└─────────────────────────────────────────────────────────────┘

Layer	Technology	Description
Brain (LLM)	DeepSeek / Kimi / MiniMax / Gemini	Script generation, scene breakdown, metadata
Vision (Image)	Nano Banana (Gemini 3 Pro Image)	4K keyframe lock, subject consistency foundation
Motion (Video)	Kling 3.0 / Seedance 1.5 Pro	Dual-engine smart routing, I2V generation
Voice (TTS)	MiniMax Speech 2.8 HD	Best-in-class Chinese TTS, voice cloning support
Assembly	Python + FFmpeg + WhisperX	xfade transitions + subtitle burning + audio mix
Draft	pyJianYingDraft	Auto-generate CapCut/JianYing draft projects
Memory	Mem0 (local SQLite / cloud sync)	Style preference digital twin
Backend	Python 3.10+ + FastAPI + LangGraph	Async workflow orchestration, WebSocket push
Frontend	React 19 + TailwindCSS + Wouter	3-panel studio, no mock data

🚀 Quick Start

📋 Requirements

Software	Version	Notes
Python	3.10+	Backend runtime
Node.js	18+	Frontend build
FFmpeg	4.0+	Video assembly (required)
Docker	20.0+	Container deployment (optional)

Install FFmpeg

macOS:

brew install ffmpeg

Ubuntu / Debian:

sudo apt update && sudo apt install ffmpeg

Windows: Download from ffmpeg.org and add to PATH. Verify:

ffmpeg -version

Clone & Install

# 1. Clone the repository
git clone https://github.com/OpenDemon/Pilipili-AutoVideo.git
cd Pilipili-AutoVideo

# 2. Install Python dependencies
pip install -r requirements.txt

# 3. Copy config template
cp configs/config.example.yaml configs/config.yaml

Configure API Keys

Edit configs/config.yaml:

llm:
  provider: deepseek          # deepseek | kimi | minimax | gemini
  api_key: "sk-xxxx"

image_gen:
  provider: nano_banana
  api_key: "AIzaSy-xxxx"      # Google AI Studio Key

video_gen:
  default_engine: kling       # kling | seedance | auto
  kling:
    api_key: "xxxx"
    api_secret: "xxxx"
  seedance:
    api_key: "xxxx"

tts:
  provider: minimax
  api_key: "xxxx"
  group_id: "xxxx"

memory:
  provider: local             # local | mem0_cloud
  # mem0_api_key: "m0-xxxx"  # Fill in for cloud sync

💡 You can also configure API keys visually at http://localhost:3000/settings — no YAML editing required.

Option 1: CLI (Recommended for debugging)

# Basic usage
python cli/main.py run --topic "Cyberpunk Mars colony, 60 seconds, cold color palette"

# Specify engine
python cli/main.py run \
  --topic "Ancient palace romance story" \
  --engine seedance \
  --duration 90 \
  --add-subtitles

# List past projects
python cli/main.py list

# Help
python cli/main.py --help

Option 2: Web UI (Recommended for daily use)

# Start backend
python cli/main.py server

# In another terminal, start frontend
cd frontend
pnpm install && pnpm dev

# Visit http://localhost:3000

Option 3: Docker Compose (Recommended for production)

# Copy environment variables
cp .env.example .env
# Edit .env with your API keys

# Start all services
docker-compose up -d

# Visit http://localhost:3000

📦 Project Structure

Pilipili-AutoVideo/
├── api/
│   └── server.py           # FastAPI backend + WebSocket
├── cli/
│   └── main.py             # Click CLI entrypoint
├── core/
│   └── config.py           # Global config (Pydantic Settings)
├── modules/
│   ├── llm.py              # LLM script generation (multi-provider)
│   ├── image_gen.py        # Nano Banana keyframe generation
│   ├── tts.py              # MiniMax TTS + duration measurement
│   ├── video_gen.py        # Kling 3.0 / Seedance 1.5 I2V
│   ├── assembler.py        # FFmpeg assembly + subtitle burning
│   ├── jianying_draft.py   # CapCut/JianYing draft generation
│   └── memory.py           # Mem0 memory system
├── frontend/               # React 19 frontend (3-panel studio)
├── skills/
│   └── SKILL.md            # Skill packaging spec
├── configs/
│   ├── config.example.yaml # Config template
│   └── config.yaml         # Local config (gitignored)
├── tests/
│   └── test_pipeline.py    # Unit tests (18 test cases)
├── data/
│   ├── outputs/            # Generated videos and drafts
│   └── memory/             # Memory database
├── docker-compose.yml
├── Dockerfile.backend
├── requirements.txt
└── pyproject.toml

🎬 Workflow Deep Dive

The core workflow is orchestrated by LangGraph in the following stages:

User Input
  │
  ▼
① Script Generation (LLM)
  │  DeepSeek/Kimi expands one sentence into a structured storyboard
  │  Each scene: voiceover text, visual description, motion description,
  │              duration, transition, camera motion
  │
  ▼
② Scene Review (optional human step)
  │  Web UI shows scene list; user can edit each scene before confirming
  │  CLI mode: auto-approved
  │
  ▼
③ Parallel Generation (Keyframe Images + TTS Voiceover)
  │  Nano Banana generates 4K keyframe images for each scene in parallel
  │  MiniMax TTS generates voiceover for each scene, measuring exact ms duration
  │
  ▼
④ Video Generation (Image-to-Video)
  │  Uses keyframe as first frame, voiceover duration as video duration
  │  Kling 3.0 (action/product) or Seedance 1.5 (narrative/multi-character)
  │
  ▼
⑤ Assembly (FFmpeg)
  │  xfade transitions + background music mixing + WhisperX subtitle burning
  │
  ▼
⑥ Draft Export (CapCut/JianYing)
  │  Auto-generates draft project preserving all scene assets and timeline
  │
  ▼
⑦ Memory Update (Mem0)
     After user rating, system learns style preferences for future generations

🆚 Comparison

Dimension	LibTV	Huobao Drama	Pilipili
Interaction	Node canvas, manual trigger	Form-based, step-by-step	Natural language, one sentence
Audio-Video Sync	Manual editing	Not explicitly supported	**Measure TTS duration → control video duration