by the-ai-merge
An MCP Multimodal AI Agent with eyes and ears!
# Add to your Claude Code skills
git clone https://github.com/the-ai-merge/multimodal-agents-courseTired of tutorials that just walk you through connecting an existing MCP server to Claude Desktop?
Yeah, us too.
That's why we built Kubrick AI, an MCP Multimodal Agent for video processing tasks. Yes! You read that right.
💡 Agents + Video Processing ... and MCP!
This course is a collaboration between The Neural Maze and Neural Bits (from now on, "The Neural Bros"), and it's built for developers who want to go beyond the basics and build serious, production-ready AI Systems. In particular, you'll:
Learn how to build an MCP server for video processing using Pixeltable and FastMCP
Design a custom, Groq-powered agent, connected to your MCP server with its own MCP client
Integrate your agentic system with Opik for full observability and prompt versioning
Learn how to use Pixeltable for multimodal data processing and stateful agents
Create complex MCP servers using FastMCP: expose resources, prompts, and tools
Apply prompt versioning to your MCP server (instead of defining the prompts in the Agent API)
Learn how to implement custom MCP clients for your agents
Implement an MCP Tool Agent from scratch, using Llama 4 Scout and Maverick as the LLMs
Use Opik for MCP prompt versioning
Learn how to implement custom tracing and monitoring with Opik
🚀 No shortcuts. No fluff. Let's learn by doing.
Completing this course, you'll learn how to design and enable Agents to understand multimodal data, across images, video, audio, and text inputs, all within a single system.
Specifically, you'll get to:
After completing this course, you'll have built your own Kubrick Agent with a HAL-themed spin-off, to play the role of a new set of eyes and ears:
<video src="https://github.com/user-attachments/assets/ef77c2a9-1a77-4f14-b2dd-e759c3f6db72"/></video>
Kub...