by Agents365-ai
AI-powered video podcast creation skill for coding agents. Supports Bilibili & YouTube, multi-language (zh-CN/en-US), 6 TTS engines (Edge/Azure/ElevenLabs/OpenAI/Doubao/CosyVoice), 4K Remotion rendering.
# Add to your Claude Code skills
git clone https://github.com/Agents365-ai/video-podcast-makername: video-podcast-maker description: Use when user provides a topic and wants an automated video podcast created, OR when user wants to learn/analyze video design patterns from reference videos โ handles research, script writing, TTS audio synthesis, Remotion video creation, and final MP4 output with background music. Also supports design learning from reference videos (learn command), style profile management, and design reference library. Supports Bilibili, YouTube, Xiaohongshu, Douyin, and WeChat Channels platforms with independent language configuration (zh-CN, en-US). argument-hint: "[topic]" effort: high allowed-tools: Bash, Read, Write, Edit, Glob, Grep, WebFetch, WebSearch, Agent
author: Agents365-ai category: Content Creation version: 2.0.0 created: 2025-01-27 updated: 2026-04-03 bilibili: https://space.bilibili.com/441831884 github: https://github.com/Agents365-ai/video-podcast-maker dependencies:
REQUIRED: Load Remotion Best Practices First
This skill depends on
remotion-best-practices. You MUST invoke it before proceeding:Skill tool: skill="remotion-best-practices"
Open Claude Code and say: "Make a video podcast about $ARGUMENTS"
Or invoke directly: /video-podcast-maker AI Agent tutorial
Extract visual design patterns from reference videos or images, store them in a searchable library, and apply them to new video compositions.
# Learn from images (Claude Vision analyzes design patterns)
python3 learn_design.py ./screenshot1.png ./screenshot2.png
# Learn from a local video (ffmpeg extracts frames automatically)
python3 learn_design.py ./reference.mp4
# Learn from a URL (Playwright captures screenshots โ experimental)
python3 learn_design.py https://www.bilibili.com/video/BV1xx411c7mD
# Save with a named profile and tags
python3 learn_design.py ./reference.mp4 --profile "tech-minimal" --tags "tech,minimal,dark"
Automated pipeline to create professional video podcasts from a topic. Supports Bilibili, YouTube, Xiaohongshu, Douyin, and WeChat Channels with multi-language output (zh-CN, en-US). Combines research, script generation, multi-engine TTS (Edge/Azure/Doubao/CosyVoice), Remotion video rendering, and FFmpeg audio mixing.
Works with: Claude Code ยท OpenClaw (ClawHub) ยท OpenCode ยท Codex โ any coding agent that supports SKILL.md
Publish to: Bilibili ยท YouTube ยท Xiaohongshu ยท Douyin ยท WeChat Channels
No coding required! Just describe your topic in plain language โ the coding agent guides you through each step interactively. You make creative decisions, the agent handles all the technical details. Creating your first video podcast is easier than you think.
Note: This project is still under active development and may not be fully mature yet. We are continuously iterating and improving. Your feedback and suggestions are greatly appreciated โ feel free to open an issue or reach out!
No comments yet. Be the first to share your thoughts!
references list # List all stored references (auto-cleans orphaned entries)
references show <id> # Show full design report for a reference
references delete <id> # Delete a reference and its files
profiles list # List all saved style profiles
profiles show <name> # Show profile props_override
profiles delete <name> # Delete a style profile
profiles create <name> # Create a new style profile interactively
When the user provides a reference video or image alongside a video creation request, extract design patterns before Step 1 and apply them as session overrides. See references/workflow-steps.md โ Pre-workflow section for the full extraction flow.
Before choosing visual design in Step 9, check for matching style profiles or reference library entries. Apply the best match as a starting point for Remotion composition props. See references/workflow-steps.md โ Step 9 Style Profile Integration for the priority chain.
Agent behavior: Check for updates at most once per day (throttled by timestamp file):
STAMP="${CLAUDE_SKILL_DIR}/.last_update_check"
NOW=$(date +%s)
LAST=$(cat "$STAMP" 2>/dev/null || echo 0)
if [ $((NOW - LAST)) -gt 86400 ]; then
timeout 5 git -C ${CLAUDE_SKILL_DIR} fetch --quiet 2>/dev/null || true
LOCAL=$(git -C ${CLAUDE_SKILL_DIR} rev-parse HEAD 2>/dev/null)
REMOTE=$(git -C ${CLAUDE_SKILL_DIR} rev-parse origin/main 2>/dev/null)
echo "$NOW" > "$STAMP"
if [ -n "$LOCAL" ] && [ -n "$REMOTE" ] && [ "$LOCAL" != "$REMOTE" ]; then
echo "UPDATE_AVAILABLE"
else
echo "UP_TO_DATE"
fi
else
echo "SKIPPED_RECENT_CHECK"
fi
git -C ${CLAUDE_SKILL_DIR} pull. No โ continue.!( missing=""; node -v >/dev/null 2>&1 || missing="$missing node"; python3 --version >/dev/null 2>&1 || missing="$missing python3"; ffmpeg -version >/dev/null 2>&1 || missing="$missing ffmpeg"; [ -n "$AZURE_SPEECH_KEY" ] || missing="$missing AZURE_SPEECH_KEY"; if [ -n "$missing" ]; then echo "MISSING:$missing"; else echo "ALL_OK"; fi )
If MISSING reported above, see README.md for full setup instructions (install commands, API key setup, Remotion project init).
Automated pipeline for professional Bilibili horizontal knowledge videos from a topic.
Target: Bilibili horizontal video (16:9)
- Resolution: 3840ร2160 (4K) or 1920ร1080 (1080p)
- Style: Clean white (default)
Tech stack: Claude + Azure TTS + Remotion + FFmpeg
| Parameter | Horizontal (16:9) | Vertical (9:16) | |-----------|-------------------|-----------------| | Resolution | 3840ร2160 (4K) | 2160ร3840 (4K) | | Frame rate | 30 fps | 30 fps | | Encoding | H.264, 16Mbps | H.264, 16Mbps | | Audio | AAC, 192kbps | AAC, 192kbps | | Duration | 1-15 min | 60-90s (highlight) |
Agent behavior: Detect user intent at workflow start:
Full pipeline with sensible defaults. Only 1 mandatory stop:
| Step | Decision | Auto Default | |------|----------|-------------| | 3 | Title position | top-center | | 5 | Media assets | Skip (text-only animations) | | 7 | Thumbnail method | Remotion-generated (16:9 + 4:3) | | 9 | Outro animation | Pre-made MP4 (white/black by theme) | | 9 | Preview method | Preview render (720p, self-validates) | | 12 | Subtitles | Skip | | 14 | Cleanup | Auto-clean temp files |
Users can override any default in their initial request:
Prompts at each decision point. Activated by:
Planned feature (not yet implemented). Currently, workflow progress is tracked via Claude's conversation context. If a session is interrupted, re-invoke the skill and Claude will check existing files in
videos/{name}/to determine where to resume.
Hard constraints for video production โ visual design is Claude's creative freedom:
| Rule | Requirement |
|------|-------------|
| Single Project | All videos under videos/{name}/ in user's Remotion project. NEVER create a new project per video. |
| 4K Output | 3840ร2160, use scale(2) wrapper over 1920ร1080 design space |
| Content Width | โฅ85% of screen width |
| Bottom Safe Zone | Bottom 100px reserved for subtitles |
| Audio Sync | All animations driven by timing.json timestamps |
| Thumbnail | MUST generate 16:9 (1920ร1080) AND 4:3 (1200ร900). Title โฅ80px bold, high contrast. |
| Font | PingFang SC / Noto Sans SC for Chinese text |
Claude loads these files on demand โ do NOT load all at once:
project-root/ # Remotion project root
โโโ src/remotion/ # Remotion source
โ โโโ compositions/ # Video composition definitions
โ โโโ Root.tsx # Remotion entry
โ โโโ index.ts # Exports
โ
โโโ public/ # Remotion default (unused โ use --public-dir videos/{name}/)
โ
โโโ videos/{video-name}/ # Video project assets
โ โโโ workflow_state.json # Workflow progress
โ โโโ topic_definition.md # Step 1
โ โโโ topic_research.md # Step 2
โ โโโ podcast.txt # Step 4: narration script
โ โโโ podcast_audio.wav # Step 8: TTS audio
โ โโโ podcast_audio.srt # Step 8: subtitles
โ โโโ timing.json # Step 8: timeline
โ โโโ thumbnail_*.png # Step 7
โ โโโ output.mp4 # Step 10
โ โโโ video_with_bgm.mp4 # Step 11
โ โโโ final_video.mp4 # Step 12: final output
โ โโโ bgm.mp3 # Background music
โ
โโโ remotion.config.ts
Important: Always use
--public-dirand full output path for Remotion render:npx remotion render src/remotion/index.ts CompositionId videos/{name}/output.mp4 --public-dir videos/{name}/
Video name {video-name}: lowercase English, hyphen-separated (e.g., reference-manager-comparison)
Section name {section}: lowercase English, underscore-separated, matches [SECTION:xxx]
Thumbnail naming (16:9 AND 4:3 both required):
| Type | 16:9 | 4:3 |
|------|------|-----|
| Remotion | thumbnail_remotion_16x9.png | thumbnail_remotion_4x3.png |
| AI | thumbnail_ai_16x9.png | thumbnail_ai_4x3.png |
Use --public-dir videos/{name}/ for all Remotion commands. Each video's assets (timing.json, podcast_audio.wav, bgm.mp3) stay in its own directory โ no copying to public/ needed. This enables parallel renders of different videos.
# All render/studio/still commands use --public-dir
npx remotion studio src/remotion/index.ts --public-dir videos/{name}/
npx remotion render src/remotion/index.ts CompositionId videos/{name}/output.mp4 --public-dir videos/{name}/ --video-bitrate 16M
npx remotion still src/remotion/index.ts Thumbnail16x9 videos/{name}/thumbnail.png --public-dir videos/{name}/
At Step 1 start:
videos/{name}/workflow_state.jsonTaskCreate to create tasks per step. Mark in_progress on start, completed on finish.workflow_state.json AND TaskUpdate. 1. Define topic direction โ topic_definition.md
2. Research topic โ topic_research.md
3. Design video sections (5-7 chapters)
4. Write narration script โ podcast.txt
5. Collect media assets โ media_manifest.json
6. Generate publish info (Part 1) โ publish_info.md
7. Generate thumbnails (16:9 + 4:3) โ thumbnail_*.png
8. Generate TTS audio โ podcast_audio.wav, timing.json
9. Create Remotion composition + Studio preview
10. Render 4K video โ output.mp4
11. Mix background music โ video_with_bgm.mp4
12. Add subtitles (optional) โ final_video.mp4
13. Complete publish info (Part 2) โ chapter timestamps
14. Verify output & cleanup
15. Generate vertical shorts (optional) โ shorts/
After Step 8 (TTS):
podcast_audio.wav exists and plays correctlytiming.json has all sections with correct timestampspodcast_audio.srt encoding is UTF-8After Step 10 (Render):
output.mp4 resolution is 3840x2160See CLAUDE.md for the full command reference (TTS, Remotion, FFmpeg, shorts generation).
Skill learns and applies preferences automatically. See references/troubleshooting.md for commands and learning details.
| File | Purpose |
|------|---------|
| user_prefs.json | Learned preferences (auto-created from template) |
| user_prefs.template.json | Default values |
| prefs_schema.json | JSON schema definition |
Final = merge(Root.tsx defaults < global < topic_patterns[type] < current instructions)
| Command | Effect | |---------|--------| | "show preferences" | Show current preferences | | "reset preferences" | Reset to defaults | | "save as X default" | Save to topic_patterns |
Full reference: Read references/troubleshooting.md on errors, preference questions, or BGM options.
timing.jsonVideo.tsx, Root.tsx, Thumbnail.tsx, podcast.txt) for quick project scaffoldingBilibili:
MM:SS format for B็ซ chaptersYouTube:
Xiaohongshu (ๅฐ็บขไนฆ):
#่ฏ้ข# format (double hash), 5-10 tagsDouyin (ๆ้ณ):
#่ฏ้ข format (single hash), 3-8 tagsWeChat Channels (ๅพฎไฟก่ง้ขๅท):
#่ฏ้ข format (single hash), 3-8 tags
This skill depends on remotion-best-practices and works alongside other optional skills:
| Software | Version | Purpose | |----------|---------|---------| | macOS / Linux | - | Tested on macOS, Linux compatible | | Python | 3.8+ | TTS script, automation | | Node.js | 18+ | Remotion video rendering | | FFmpeg | 4.0+ | Audio/video processing |
# macOS
brew install ffmpeg node python3
# Ubuntu/Debian
sudo apt install ffmpeg nodejs python3 python3-pip
# Python dependencies
pip install azure-cognitiveservices-speech dashscope edge-tts requests
Important: This skill requires a Remotion project as the foundation.
Understanding the components:
| Component | Source | Purpose |
|-----------|--------|---------|
| Remotion Project | npx create-video | Base framework with src/, public/, package.json |
| video-podcast-maker | Claude Code skill | Workflow orchestration (this skill) |
# Step 1: Create a new Remotion project (base framework)
npx create-video@latest my-video-project
cd my-video-project
npm i # Install Remotion dependencies
# Step 2: Verify installation
npx remotion studio # Should open browser preview
If you already have a Remotion project:
cd your-existing-project
npm install remotion @remotion/cli @remotion/player zod
| Service | Purpose | Get Key |
|---------|---------|---------|
| Azure Speech | TTS audio generation (high quality) | Azure Portal โ Speech Services |
| Volcengine Doubao Speech | TTS audio generation (alternative backend) | Volcengine Console |
| Aliyun CosyVoice | TTS audio generation (alternative backend) | Aliyun Bailian |
| Edge TTS | TTS audio generation (default, free, no key needed) | pip install edge-tts |
| ElevenLabs | TTS audio generation (highest quality English) | ElevenLabs |
| Google Cloud TTS | TTS audio generation (wide language support) | Google Cloud Console |
| OpenAI | TTS audio generation (simple API) | OpenAI Platform |
| Google Gemini | AI thumbnail generation (optional) | AI Studio |
| Aliyun Dashscope | AI thumbnail - Chinese optimized (optional) | Aliyun Bailian |
Add to ~/.zshrc or ~/.bashrc:
# TTS Backend: edge (default, free), azure, doubao, cosyvoice, elevenlabs, google, openai
export TTS_BACKEND="edge" # Default (free), or "azure" / "doubao" / "cosyvoice" / "elevenlabs" / "google" / "openai"
# Azure TTS (high quality)
export AZURE_SPEECH_KEY="your-azure-speech-key"
export AZURE_SPEECH_REGION="eastasia"
# Volcengine Doubao TTS (alternative backend)
export VOLCENGINE_APPID="your-volcengine-appid"
export VOLCENGINE_ACCESS_TOKEN="your-volcengine-access-token"
export VOLCENGINE_CLUSTER="volcano_tts" # Default cluster, adjust per console config
export VOLCENGINE_VOICE_TYPE="BV001_streaming" # Adjust per console voice options
# Aliyun CosyVoice TTS (alternative backend) + AI thumbnails
export DASHSCOPE_API_KEY="your-dashscope-api-key"
# Optional: Edge TTS voice override
export EDGE_TTS_VOICE="zh-CN-XiaoxiaoNeural"
# ElevenLabs TTS
export ELEVENLABS_API_KEY="your-elevenlabs-api-key"
# Google Cloud TTS
export GOOGLE_TTS_API_KEY="your-google-tts-api-key"
# OpenAI TTS
export OPENAI_API_KEY="your-openai-api-key"
# Optional: Google Gemini for AI thumbnails
export GEMINI_API_KEY="your-gemini-api-key"
Then reload: source ~/.zshrc
This skill is designed for use with Claude Code or Opencode. Simply tell Claude:
"Create a video podcast about [your topic]"
Claude will guide you through the entire workflow automatically.
Tips: The quality of first-generation output heavily depends on the model's intelligence and capabilities โ the smarter and more advanced the model, the better the results. In our testing, both Codex and Claude Code produce excellent videos on the first try, and OpenCode paired with GLM-5 also delivers solid results. If the initial output isn't perfect, you can preview it in Remotion Studio and ask the coding agent to keep refining until you're satisfied.
Before rendering the final video, use Remotion Studio to preview and visually edit styles:
npx remotion studio src/remotion/index.ts
This opens a browser-based editor wh