designing-real-world-ai-agents-workshop

Name: designing-real-world-ai-agents-workshop
Author: iusztinpaul

Verified

Hands-on workshop: Build a multi-agent AI system from scratch — Deep Research Agent + Writing Workflow served as MCP servers. Includes code, slides, and video

474stars

125forks

Python

Installation

# Add to your Claude Code skills
git clone https://github.com/iusztinpaul/designing-real-world-ai-agents-workshop

Getting Started

Guides for using ai agents skills like designing-real-world-ai-agents-workshop.

Caveman: Cut Claude Token Use by 65%
How agent-side prompt compression works, when to use it, and when not to.
What is an AI Skills Marketplace?
Definitions, how marketplaces work, and how to choose between them in 2026.
Getting Started with AI Skills

Security ReportVerified

Last scanned: 5/21/2026

{
  "issues": [],
  "status": "PASSED",
  "scannedAt": "2026-05-21T07:53:49.070Z",
  "semgrepRan": false,
  "npmAuditRan": true,
  "pipAuditRan": true
}

README.md

Frequently Asked Questions

What is designing-real-world-ai-agents-workshop?

designing-real-world-ai-agents-workshop is an open-source ai agents skill for AI coding assistants such as Claude Code, Codex CLI, and ChatGPT, built by iusztinpaul. Hands-on workshop: Build a multi-agent AI system from scratch — Deep Research Agent + Writing Workflow served as MCP servers. Includes code, slides, and video. It has 474 GitHub stars.

Is designing-real-world-ai-agents-workshop safe to use?

Yes. designing-real-world-ai-agents-workshop passed SkillsLLM's automated security scan — a dependency vulnerability audit plus prompt-injection heuristics — with no high-severity issues. You can read the full report in the Security Report section on this page.

How do I install designing-real-world-ai-agents-workshop?

Clone the repository with "git clone https://github.com/iusztinpaul/designing-real-world-ai-agents-workshop" and add it to your Claude Code skills directory (see the Installation section above).

What programming language is designing-real-world-ai-agents-workshop written in?

designing-real-world-ai-agents-workshop is primarily written in Python. It is open-source under iusztinpaul on GitHub, so you can review or fork the full source.

Are there alternatives to designing-real-world-ai-agents-workshop?

Yes. SkillsLLM lists many other AI Agents skills you can browse and compare side by side. Open the AI Agents category from the badge at the top of this page, or use the Related Skills and comparison links further down to weigh designing-real-world-ai-agents-workshop against similar tools.

Agentic AI for Beginners

Build your first AI agent from scratch - tool use, ReAct pattern, memory, deployment

41 minBeginner

Comments (0)

to leave a comment.

No comments yet. Be the first to share your thoughts!

Related Skills

superpowers

by obra

An agentic skills framework & software development methodology that works.

234,966

claude-skills autoharness

Build Your Own Deep Research Agent + Technical Writer Multi-Agent System

A hands-on workshop, presented at AI Engineering Conference Europe, building a multi-agent AI system with two MCP servers: a Deep Research Agent and a LinkedIn Writing Workflow. Both connected to a harness like Claude Code or Cursor.

🎬 Full workshop available on YouTube ↓

📑 Slides here.

Whenever You're Ready, Here's How to Go Deeper

This workshop is a 2–4 hour taste. If you want to go from zero to shipping production-grade AI agents, check out our Agentic AI Engineering Course, built with Towards AI.

34 lessons. Three end-to-end portfolio projects. A certificate. And a Discord community with direct access to industry experts and us.

Rated 5/5 by 300+ students. The first 6 lessons are free:

Start here →

How to Use This Repo

Three ways to use this repo. Pick the mode that fits the time you have. Or work through all three in order, since each builds on the last:

Watch the workshop and see the patterns end-to-end. Watch in ~2 hr. Start with the 2-hour YouTube workshop and the slides above. You'll come away with a mental model of the full multi-agent system: tool-use agents, evaluator-optimizer loops, grounded search, structured LLM output, and MCP-server design.
Run the finished code. See it produce real artifacts. Run in ~30 min. Watch the system generate a research brief, draft a LinkedIn post through an evaluator-optimizer loop, and score itself with an LLM-as-judge. Follow the Getting Started and Running the Code sections to install the project and run the MCP servers, skills, and evaluation pipeline.
Implement it yourself with agentic coding. Build a 1:1 replica from scratch in ~2–4 hr. Open implement_yourself/, a stripped-down skeleton prepared with 25 pre-groomed tickets and a custom /implement Claude Code skill that orchestrates SWE and Tester agents in a loop, ticket by ticket, until the directory matches src/. See implement_yourself/README.md for the kickoff guide.

No cheating, by design. implement_yourself/ is a self-contained project. Open your harness (Claude Code, Cursor, …) directly in that folder (not at the repo root) so its working directory is scoped to the skeleton. The agents can't see the reference implementation in ../src/, can't grep it, can't read its files. You get a real build, not a copy-paste.

What You'll Build Today

Deep Research Agent — An MCP server that runs deep research using Gemini with Google Search grounding and native YouTube video analysis:

user topic → [deep_research] × N → analyze_youtube_video (if URLs) → [deep_research gap-fill] → compile_research → research.md

LinkedIn Writing Workflow — An MCP server that generates LinkedIn posts with an evaluator-optimizer loop:

research.md + guideline → generate post → [review → edit] × N → post.md → generate image

Both servers expose tools, resources, and prompts via the Model Context Protocol, letting any MCP-compatible harness orchestrate the workflow.

Patterns and concepts you'll learn:

Tool-use agents — letting the LLM decide which tools to call and when
Evaluator-optimizer loop — generate, review, edit in cycles
Grounded search — Gemini with Google Search grounding for factual research
Structured LLM output — Pydantic schemas for type-safe model responses
MCP server design — registering tools, resources, and prompts with FastMCP
LLM-as-judge evaluation — automated quality scoring with Opik

Example: End-to-End Workflow

Here's a real run through the full pipeline — from a topic seed to a published-ready LinkedIn post with an AI-generated image.

Final output

We planned 12 AI agents and shipped 1. It worked better. Sounds crazy, right? But it's a common story.

A client built an AI marketing chatbot. Their initial design had dozens of agents: orchestrator, validators, spam prevention. It failed.

A single agent with tools won. Tasks were tightly coupled. One brain maintained context. Tools were still specialized.

This is the core mistake. People jump to complex multi-agent setups too fast.

Think AI system design as a spectrum:

Workflows: You control steps.
Single Agent + Tools: Model decides flow.
Multi-Agent: Multiple decision-makers.

...

A single agent works for most cases. But it has limits. Too many tools? You hit "context rot." Past ~10-20 tools, LLMs degrade at tool selection. They get overwhelmed. Information gets lost in the middle.

So, when do you actually need multi-agent?

...

The simplest system that reliably solves the problem is always the best system. Don't overengineer your AI agents. Build simple first.

What's the most complex agent architecture you've simplified? Tell me below.

Read the full post

1. Start with a seed

A short research brief with 2-3 questions and reference links:

# Research Topic: AI Agent Architecture — When Less Is More

## Key Questions
1. Why do single-agent architectures with smart tools outperform multi-agent systems?
2. What are the only legitimate reasons to adopt a multi-agent architecture?

## References
- Stop Overengineering: Workflows vs AI Agents Explained (YouTube)
- From 12 Agents to 1 (DecodingAI article)

2. Deep Research Agent produces `research.md`

The agent runs multiple Gemini-grounded search queries and analyzes YouTube videos, then compiles everything into a structured research brief with sources.

The full research.md for this example is ~20k tokens across 2 queries and 1 video transcript.

3. Write a guideline

A short brief describing the post angle, audience, and key points:

# LinkedIn Post Guideline

## Topic
Why most AI teams should use 1 agent instead of 12.

## Angle
Open with the counterintuitive "12 agents → 1" hook. Introduce the complexity
spectrum. End with a clear mental model.

## Target Audience
AI engineers and technical leads building LLM-powered applications.

## Key Points
- A team planned 12 agents but shipped 1 — it worked better.
- The spectrum: workflows → single agent + tools → multi-agent. Stay left.
- "Context rot": past ~10-20 tools, LLMs degrade at tool selection.
- Only 4 valid reasons for multi-agent.

## Tone
Direct, opinionated, engineer-to-engineer. No fluff.

4. Writing Workflow refines the post

The evaluator-optimizer loop generates a draft, then runs 3 rounds of review + edit:

v0 — Initial draft

We planned 12 AI agents. We shipped 1.

Sounds crazy, right? But it's a common story.

A client wanted an AI chatbot for marketing content: emails, SMS, promos. Their initial design had dozens of specialized agents: orchestrator, analyzers, validators, spam prevention.

In practice? A single agent with tools won. Tasks were tightly coupled, sequential. Splitting it created information silos and handoff errors. [...]

The simplest system that reliably solves the problem is always the best system.

v3 — After 3 review/edit cycles

We planned 12 AI agents and shipped 1. It worked better.

A client built an AI marketing chatbot. Their initial design had dozens of agents: orchestrator, validators, spam prevention. It failed.

A single agent with tools won. Tasks were tightly coupled. One brain maintained context. Tools were still specialized.

Stay as far left as possible. Move right only when forced. [...]

The simplest system that reliably solves the problem is always the best system.

Harness engineering isn't just a new term for prompt engineering. It's where AI is heading.

Agents got useful enough for code and tools, but they weren't reliable. They'd repeat mistake