by iusztinpaul
Hands-on workshop: Build a multi-agent AI system from scratch — Deep Research Agent + Writing Workflow served as MCP servers. Includes code, slides, and video
# Add to your Claude Code skills
git clone https://github.com/iusztinpaul/designing-real-world-ai-agents-workshopA hands-on workshop, presented at AI Engineering Conference Europe, building a multi-agent AI system with two MCP servers: a Deep Research Agent and a LinkedIn Writing Workflow. Both connected to a harness like Claude Code or Cursor.
🎬 Full workshop available on YouTube ↓
📑 Slides here.
Built as a lightweight companion to the Agentic AI Engineering Course, which covers 34 lessons and three end-to-end portfolio projects. This workshop distills the core agentic patterns into a ~2-hour hands-on build.
Deep Research Agent — An MCP server that runs deep research using Gemini with Google Search grounding and native YouTube video analysis:
user topic → [deep_research] × N → analyze_youtube_video (if URLs) → [deep_research gap-fill] → compile_research → research.md
LinkedIn Writing Workflow — An MCP server that generates LinkedIn posts with an evaluator-optimizer loop:
research.md + guideline → generate post → [review → edit] × N → post.md → generate image
Both servers expose tools, resources, and prompts via the Model Context Protocol, letting any MCP-compatible harness orchestrate the workflow.
No comments yet. Be the first to share your thoughts!
Here's a real run through the full pipeline — from a topic seed to a published-ready LinkedIn post with an AI-generated image.
We planned 12 AI agents and shipped 1. It worked better. Sounds crazy, right? But it's a common story.
A client built an AI marketing chatbot. Their initial design had dozens of agents: orchestrator, validators, spam prevention. It failed.
A single agent with tools won. Tasks were tightly coupled. One brain maintained context. Tools were still specialized.
This is the core mistake. People jump to complex multi-agent setups too fast.
Think AI system design as a spectrum:
...
A single agent works for most cases. But it has limits. Too many tools? You hit "context rot." Past ~10-20 tools, LLMs degrade at tool selection. They get overwhelmed. Information gets lost in the middle.
So, when do you actually need multi-agent?
...
The simplest system that reliably solves the problem is always the best system. Don't overengineer your AI agents. Build simple first.
What's the most complex agent architecture you've simplified? Tell me below.
Read the full post
A short research brief with 2-3 questions and reference links:
# Research Topic: AI Agent Architecture — When Less Is More
## Key Questions
1. Why do single-agent architectures with smart tools outperform multi-agent systems?
2. What are the only legitimate reasons to adopt a multi-agent architecture?
## References
- Stop Overengineering: Workflows vs AI Agents Explained (YouTube)
- From 12 Agents to 1 (DecodingAI article)
research.mdThe agent runs multiple Gemini-grounded search queries and analyzes YouTube videos, then compiles everything into a structured research brief with sources.
The full research.md for this example is ~20k tokens across 2 queries and 1 video transcript.
A short brief describing the post angle, audience, and key points:
# LinkedIn Post Guideline
## Topic
Why most AI teams should use 1 agent instead of 12.
## Angle
Open with the counterintuitive "12 agents → 1" hook. Introduce the complexity
spectrum. End with a clear mental model.
## Target Audience
AI engineers and technical leads building LLM-powered applications.
## Key Points
- A team planned 12 agents but shipped 1 — it worked better.
- The spectrum: workflows → single agent + tools → multi-agent. Stay left.
- "Context rot": past ~10-20 tools, LLMs degrade at tool selection.
- Only 4 valid reasons for multi-agent.
## Tone
Direct, opinionated, engineer-to-engineer. No fluff.
The evaluator-optimizer loop generates a draft, then runs 3 rounds of review + edit:
v0 — Initial draft
We planned 12 AI agents. We shipped 1.
Sounds crazy, right? But it's a common story.
A client wanted an AI chatbot for marketing content: emails, SMS, promos. Their initial design had dozens of specialized agents: orchestrator, analyzers, validators, spam prevention.
In practice? A single agent with tools won. Tasks were tightly coupled, sequential. Splitting it created information silos and handoff errors. [...]
The simplest system that reliably solves the problem is always the best system.
v3 — After 3 review/edit cycles
We planned 12 AI agents and shipped 1. It worked better.
A client built an AI marketing chatbot. Their initial design had dozens of agents: orchestrator, validators, spam prevention. It failed.
A single agent with tools won. Tasks were tightly coupled. One brain maintained context. Tools were still specialized.
Stay as far left as possible. Move right only when forced. [...]
The simplest system that reliably solves the problem is always the best system.
Harness engineering isn't just a new term for prompt engineering. It's where AI is heading.
Agents got useful enough for code and tools, but they weren't reliable. They'd repeat mistakes. The bottleneck shifted from code generation to consistent, reliable behavior in real systems.
Think of it this way: prompt engineering is what to ask. Context engineering is what to send the model. Harness engineering is how the whole thing operates. It's the environment around the model, beyond just tokens.
Car analogy: the model is the engine. Context is the fuel. The harness is the rest of the car: steering, brakes, lane boundaries. It prevents crashes.
A harness includes tools, permissions, state, tests, logs, retries, checkpoints, guardrails, and evals.
Stop hoping the model improves. Engineer its environment. The burden shifts to us, the builders, to prevent repeat mistakes.
I use self-reflection in my Claude Code setup. The agent learns what I liked, saving tokens and time.
Real companies are already doing this. Anthropic's long-running agents externalize memory into artifacts. OpenAI built a 1M-line product with zero manual code using structured docs and agent-to-agent reviews. Stripe agents merge 1K+ PRs weekly within isolated environments. LangChain moved a coding agent from outside the top 30 to top 5 on Terminal Bench 2.0 by changing only the harness. Same model, better system.
This isn't just for coding agents. This is the new way software gets built.
The programmer's job is shifting: less writing code, more designing habitats for agents to work without issues. Think machine-readable docs, evals, sandboxes, permission boundaries, and structural tests.
Reliability is the real work. Not just prompting.
LLMs are heading into systems, workflows, harnesses. Value comes from orchestration, constraints, feedback loops—not just a single prompt. The future isn't one genius model. It's models in well-engineered environments.
That's why harness engineering matters. It's what happens when you stop demoing intelligence and start shipping it.
Want to learn more? I explain it all in my latest video: https://youtu.be/zYerCzIexCg What's your biggest challenge building reliable agent systems right now?
Forget your latest AI model. There's a new system breaking the intern