by Agents365-ai
AI-powered video podcast creation skill for coding agents. Supports Bilibili & YouTube, multi-language (zh-CN/en-US), 6 TTS engines (Edge/Azure/ElevenLabs/OpenAI/Doubao/CosyVoice), 4K Remotion rendering.
# Add to your Claude Code skills
git clone https://github.com/Agents365-ai/video-podcast-makerGuides for using ai agents skills like video-podcast-maker.
name: video-podcast-maker description: Use when the user gives a topic and wants an automated video podcast created, or asks to learn visual design patterns from a reference video/image. Produces 4K video via research → script → TTS → Remotion → MP4 + BGM. argument-hint: "[topic]" effort: high
allowed-tools intentionally omitted — the workflow uses Bash, Read, Write, Edit,author: Agents365-ai category: Content Creation version: 2.0.0 created: 2025-01-27 updated: 2026-04-03 bilibili: https://space.bilibili.com/441831884 github: https://github.com/Agents365-ai/video-podcast-maker
dependencies is informational only — Claude Code does not auto-load skillsremotion-best-practices per the bodydependencies:
REQUIRED: Load Remotion Best Practices First
This skill depends on
remotion-best-practices. You MUST invoke it before proceeding:Invoke the skill/tool named: remotion-best-practices
Open your coding agent and say: "Make a video podcast about $ARGUMENTS"
Or invoke directly: /video-podcast-maker AI Agent tutorial
Last scanned: 5/8/2026
{
"issues": [],
"status": "PASSED",
"scannedAt": "2026-05-08T05:58:18.127Z",
"semgrepRan": false,
"npmAuditRan": true,
"pipAuditRan": false
}Automated pipeline to create professional video podcasts from a topic. Supports Bilibili, YouTube, Xiaohongshu, Douyin, and WeChat Channels with multi-language output (zh-CN, en-US). Combines research, script generation, multi-engine TTS (Edge/Azure/Doubao/CosyVoice), Remotion video rendering, and FFmpeg audio mixing.
Works with: Claude Code · OpenClaw (ClawHub) · OpenCode · Codex — any coding agent that supports SKILL.md
Publish to: Bilibili · YouTube · Xiaohongshu · Douyin · WeChat Channels
No coding required! Just describe your topic in plain language — the coding agent guides you through each step interactively. You make creative decisions, the agent handles all the technical details. Creating your first video podcast is easier than you think.
Note: This project is still under active development and may not be fully mature yet. We are continuously iterating and improving. Your feedback and suggestions are greatly appreciated — feel free to open an issue or reach out!
No comments yet. Be the first to share your thoughts!
Extract visual design patterns from reference videos or images and apply them to new video compositions. Skip this section unless the user provides a reference video/image or asks to save/list/delete style profiles.
→ See references/design-learning.md for commands, reference-library management, style-profile management, and integration with Pre-workflow / Step 9.
Agent behavior: Check for updates at most once per day (throttled by timestamp file).
Before any shell command that reads files from this skill, resolve SKILL_DIR to the directory containing SKILL.md.
If your agent exposes a built-in skill directory variable such as ${CLAUDE_SKILL_DIR}, you may map it to SKILL_DIR.
SKILL_DIR="${SKILL_DIR:-${CLAUDE_SKILL_DIR}}"
STAMP="${SKILL_DIR}/.last_update_check"
NOW=$(date +%s)
LAST=$(cat "$STAMP" 2>/dev/null || echo 0)
if [ ! -d "${SKILL_DIR}/.git" ]; then
echo "MANUAL_INSTALL"
elif [ $((NOW - LAST)) -gt 86400 ]; then
timeout 5 git -C "${SKILL_DIR}" fetch --quiet 2>/dev/null || true
LOCAL=$(git -C "${SKILL_DIR}" rev-parse HEAD 2>/dev/null)
REMOTE=$(git -C "${SKILL_DIR}" rev-parse origin/main 2>/dev/null)
echo "$NOW" > "$STAMP"
if [ -n "$LOCAL" ] && [ -n "$REMOTE" ] && [ "$LOCAL" != "$REMOTE" ]; then
echo "UPDATE_AVAILABLE"
else
echo "UP_TO_DATE"
fi
else
echo "SKIPPED_RECENT_CHECK"
fi
git -C "${SKILL_DIR}" pull. No → continue..git directory — skill was installed via tarball/zip/cp): Continue silently. Auto-update is disabled; the user must reinstall manually to update.Agent behavior: Run the pre-flight check below before Step 1. SKILL_DIR must already be resolved (see Auto Update Check section above for resolution rules).
SKILL_DIR="${SKILL_DIR:-${CLAUDE_SKILL_DIR}}"
python3 "${SKILL_DIR}/scripts/check_prereqs.py"
If MISSING reported, see README.md for full setup instructions (install commands, API key setup, Remotion project init). The check is backend-aware: backend is resolved as TTS_BACKEND env var → user_prefs.json (global.tts.backend) → edge default, then only env vars required by that backend are validated.
Automated pipeline for professional Bilibili horizontal knowledge videos from a topic.
Target: Bilibili horizontal video (16:9)
- Resolution: 3840×2160 (4K) or 1920×1080 (1080p)
- Style: Clean white (default)
Tech stack: Coding agent + TTS backend + Remotion + FFmpeg
| Parameter | Horizontal (16:9) | Vertical (9:16) | |-----------|-------------------|-----------------| | Resolution | 3840×2160 (4K) | 2160×3840 (4K) | | Frame rate | 30 fps | 30 fps | | Encoding | H.264, 16Mbps | H.264, 16Mbps | | Audio | AAC, 192kbps | AAC, 192kbps | | Duration | 1-15 min | 60-90s (highlight) |
Agent behavior: Detect user intent at workflow start:
Full pipeline with sensible defaults. Mandatory stop at Step 9:
| Step | Decision | Auto Default | |------|----------|-------------| | 3 | Title position | top-center | | 5 | Media assets | Skip (text-only animations) | | 7 | Thumbnail method | Remotion-generated (16:9 + 4:3) | | 9 | Outro animation | Pre-made MP4 (white/black by theme) | | 9 | Preview method | Remotion Studio (mandatory) | | 12 | Subtitles | Skip | | 14 | Cleanup | Auto-clean temp files |
Users can override any default in their initial request:
Prompts at each decision point. Activated by:
Hard constraints for video production. Visual design remains the agent's creative freedom within these rules:
| Rule | Requirement |
|------|-------------|
| Single Project | All videos under videos/{name}/ in user's Remotion project. NEVER create a new project per video. |
| 4K Output | 3840×2160, use scale(2) wrapper over 1920×1080 design space |
| Content Width | ≥85% of screen width |
| Bottom Safe Zone | Bottom 100px reserved for subtitles |
| Audio Sync | All animations driven by timing.json timestamps |
| Thumbnail | MUST generate 16:9 (1920×1080) AND 4:3 (1200×900). Centered layout, title ≥120px, icons ≥120px, fill most of canvas. See design-guide.md. |
| Font | PingFang SC / Noto Sans SC for Chinese text |
| Studio Before Render | MUST launch remotion studio for user review. NEVER render 4K until user explicitly confirms ("render 4K", "render final"). |
Load these files on demand — do NOT load all at once:
timing.json format.All scripts in ${SKILL_DIR}/scripts/ are reachable through one hierarchical entry point:
python3 ${SKILL_DIR}/scripts/cli.py --help # list resources
python3 ${SKILL_DIR}/scripts/cli.py <resource> --help # list actions
python3 ${SKILL_DIR}/scripts/cli.py <resource> <action> --help # forwards to underlying script
python3 ${SKILL_DIR}/scripts/cli.py schema [<method>] # JSON parameter schema
Routes (all forward to the existing scripts; direct invocations keep working):
tts run|validate, verify, audit beats, shorts gen, design list|show|delete|add, prereqs, prefs get|migrate|backend|bgm-path, schema [<method>].
The dispatcher is optional — every snippet in this file uses the direct script path for back-compat. Agents that prefer one entry point with discoverable help should prefer cli.py.
Before launching Studio for Step 9 review:
python3 ${SKILL_DIR}/scripts/audit_beat_sync.py <Video.tsx> <timing.json>
Prints a beat-vs-narration alignment table and flags drift > 1.5s. Especially important for kinetic-typography style where each beat must hit a specific narration moment.
After Step 13, before declaring the video done:
python3 ${SKILL_DIR}/scripts/verify_output.py videos/{name}/
Auto-fixes common omissions (creates final_video.mp4 if missing; disable with --no-fix), validates resolution/codec/duration, checks audio-timing drift, sanity-checks publish_info.md. Exit 0 = green light to publish; exit 2 = warnings only, still publishable. For machine-readable output add --format json (auto when piped). Full flag reference in references/workflow-publish.md.
project-root/ # Remotion project root
├── src/remotion/ # Remotion source
│ ├── compositions/ # Video composition definitions
│ ├── Root.tsx # Remotion entry
│ └── index.ts # Exports
│
├── public/ # Remotion default (unused — use --public-dir videos/{name}/)
│
├── videos/{video-name}/ # Video project assets
│ ├── topic_definition.md # Step 1
│ ├── topic_research.md # Step 2
│ ├── podcast.txt # Step 4: narration script
│ ├── podcast_audio.wav # Step 8: TTS audio
│ ├── podcast_audio.srt # Step 8: subtitles
│ ├── timing.json # Step 8: timeline
│ ├── thumbnail_*.png # Step 7
│ ├── output.mp4 # Step 10
│ ├── video_with_bgm.mp4 # Step 11
│ ├── final_video.mp4 # Step 12: final output
│ └── bgm.mp3 # Background music
│
└── remotion.config.ts
Important: Always use
--public-dirand full output path for Remotion render:npx remotion render src/remotion/index.ts CompositionId videos/{name}/output.mp4 --public-dir videos/{name}/
Video name {video-name}: lowercase English, hyphen-separated (e.g., reference-manager-comparison)
Section name {section}: lowercase English, underscore-separated, matches [SECTION:xxx]
Thumbnail naming (16:9 AND 4:3 both required):
| Type | 16:9 | 4:3 |
|------|------|-----|
| Remotion | thumbnail_remotion_16x9.png | thumbnail_remotion_4x3.png |
| AI | thumbnail_ai_16x9.png | thumbnail_ai_4x3.png |
Use --public-dir videos/{name}/ for all Remotion commands. Each video's assets (timing.json, podcast_audio.wav, bgm.mp3) stay in its own directory — no copying to public/ needed. This enables parallel renders of different videos.
# All render/studio/still commands use --public-dir
npx remotion studio src/remotion/index.ts --public-dir videos/{name}/
npx remotion render src/remotion/index.ts CompositionId videos/{name}/output.mp4 --public-dir videos/{name}/ --video-bitrate 16M
npx remotion still src/remotion/index.ts Thumbnail16x9 videos/{name}/thumbnail.png --public-dir videos/{name}/
At Step 1 start, use your agent's task tracker (Claude Code TaskCreate / Codex todo list / equivalent) to create one task per step. Mark in_progress on start, completed on finish. Files in videos/{name}/ (e.g. podcast.txt, timing.json, output.mp4) act as the durable record of what completed — if a session is interrupted, inspect the directory to determine where to resume.
1. Define topic direction → topic_definition.md
2. Research topic → topic_research.md
3. Design video sections (5-7 chapters)
4. Write narration script → podcast.txt
4.5. Pronunciation pre-flight (zh-CN only) → videos/{name}/phonemes.json
5. Collect media assets → media_manifest.json
6. Generate publish info (Part 1) → publish_info.md
7. Generate thumbnails (16:9 + 4:3) → thumbnail_*.png
8. Generate TTS audio → podcast_audio.wav, timing.json
9. Create Remotion composition + Studio preview (mandatory stop)
10. Render 4K video (only on user request) → output.mp4
11. Mix background music → video_with_bgm.mp4
12. Add subtitles (optional) → final_video.mp4
13. Complete publish info (Part 2) → chapter timestamps
14. Verify output & cleanup
15. Generate vertical shorts (optional) → shorts/
After Step 8 (TTS):
podcast_audio.wav exists and plays correctlytiming.json has all sections with correct timestampspodcast_audio.srt encoding is UTF-8After Step 10 (Render):
output.mp4 resolution is 3840x2160See CLAUDE.md for the full command reference (TTS, Remotion, FFmpeg, shorts generation).
Skill learns and applies preferences automatically. See references/troubleshooting.md for commands and learning details.
| File | Purpose |
|------|---------|
| user_prefs.json | Learned preferences (auto-created from template) |
| user_prefs.template.json | Default values |
| prefs_schema.json | JSON schema definition |
Final = merge(Root.tsx defaults < global < topic_patterns[type] < current instructions)
| Command | Effect | |---------|--------| | "show preferences" | Show current preferences | | "reset preferences" | Reset to defaults | | "save as X default" | Save to topic_patterns |
Full reference: Read references/troubleshooting.md on errors, preference questions, or BGM options.
timing.jsonVideo.tsx, Root.tsx, Thumbnail.tsx, podcast.txt) for quick project scaffoldingBilibili:
MM:SS format for B站 chaptersYouTube:
Xiaohongshu (小红书):
#话题# format (double hash), 5-10 tagsDouyin (抖音):
#话题 format (single hash), 3-8 tagsWeChat Channels (微信视频号):
#话题 format (single hash), 3-8 tags
podcast.txt, repeatedlyThis section is for you, the human — not the agent. Every downstream step — TTS narration, subtitles, section transitions, animation timing, final cut — is derived from this single
podcast.txt. A weak script renders into 4K garbage. No amount of polish downstream saves it.The AI-generated draft is a starting point, nothing more. Do these yourself — don't hand them off to the AI:
- Mentally read it as the narrator. Treat each sentence as one breath — if a line forces you to "catch your breath" or backtrack to parse, fix it. Where you stumble silently is where TTS stumbles audibly.
- Revise at least three times.
- Pass 1: typos, awkward phrasing, tongue-twisters
- Pass 2: cut filler, cut throat-clearing intros ("So today we're going to talk about…"), cut redundancy
- Pass 3: tune rhythm — where to pause, where to break a long sentence, which word carries the stress
- Read each
[SECTION:xxx]block end-to-end. Confirm each section opens with a hook and lands a clean transition into the next — not a bullet-point dump.- Audit numbers, proper nouns, and English terms separately. ~90% of TTS mispronunciations live here. If pronunciation is wrong, add it to
phonemes.json; if it just sounds awkward, rewrite it.- Know your length budget. Estimate ~280 zh-CN chars/min or ~150 en words/min. A 5–10 min video means ~1400–2800 chars / 750–1500 words. Don't pad to fill time.
The only acceptance test: read through it once in your head — does any line make you wince? If yes, don't move on to Step 8 (TTS) yet. Otherwise you're just rendering 4K of something even you don't want to hear.
This skill depends on remotion-best-practices and works alongside other optional skills:
| Software | Version | Purpose | |----------|---------|---------| | macOS / Linux | - | Tested on macOS, Linux compatible | | Python | 3.8+ | TTS script, automation | | Node.js | 18+ | Remotion video rendering | | FFmpeg | 4.0+ | Audio/video processing |
# macOS
brew install ffmpeg node python3
# Ubuntu/Debian
sudo apt install ffmpeg nodejs python3 python3-pip
# Python dependencies
pip install azure-cognitiveservices-speech dashscope edge-tts requests
Important: This skill requires a Remotion project as the foundation.
Understanding the components:
| Component | Source | Purpose |
|-----------|--------|---------|
| Remotion Project | npx create-video | Base framework with src/, public/, package.json |
| video-podcast-maker | SKILL.md workflow | Workflow orchestration (this skill) |
# Step 1: Create a new Remotion project (base framework)
npx create-video@latest my-video-project
cd my-video-project
npm i # Install Remotion dependencies
# Step 2: Verify installation
npx remotion studio # Should open browser preview
If you already have a Remotion project:
cd your-existing-project
npm install remotion @remotion/cli @remotion/player zod
| Service | Purpose | Get Key |
|---------|---------|---------|
| Azure Speech | TTS audio generation (high quality) | Azure Portal → Speech Services |
| Volcengine Doubao Speech | TTS audio generation (alternative backend) | Volcengine Console |
| Aliyun CosyVoice | TTS audio generation (alternative backend) | Aliyun Bailian |
| Edge TTS | TTS audio generation (default, free, no key needed) | pip install edge-tts |
| ElevenLabs | TTS audio generation (highest quality English) | ElevenLabs |
| Google Cloud TTS | TTS audio generation (wide language support) | Google Cloud Console |
| OpenAI | TTS audio generation (simple API) | OpenAI Platform |
| Google Gemini | AI thumbnail generation (optional) | AI Studio |
| Aliyun Dashscope | AI thumbnail - Chinese optimized (optional) | Aliyun Bailian |
Add to ~/.zshrc or ~/.bashrc:
# TTS Backend: edge (default, free), azure, doubao, cosyvoice, elevenlabs, google, openai
export TTS_BACKEND="edge" # Default (free), or "azure" / "doubao" / "cosyvoice" / "elevenlabs" / "google" / "openai"
# Azure TTS (high quality)
export AZURE_SPEECH_KEY="your-azure-speech-key"
export AZURE_SPEECH_REGION="eastasia"
# Volcengine Doubao TTS (alternative backend)
export VOLCENGINE_APPID="your-volcen