by walkinglabs
Harness engineering official style beginner tutorial, from 0 to 1
# Add to your Claude Code skills
git clone https://github.com/walkinglabs/learn-harness-engineeringGuides for using ai agents skills like learn-harness-engineering.
Last scanned: 5/23/2026
{
"issues": [
{
"type": "npm-audit",
"message": "@chevrotain/cst-dts-gen: Vulnerability found",
"severity": "high"
},
{
"type": "npm-audit",
"message": "@chevrotain/gast: Vulnerability found",
"severity": "high"
},
{
"type": "npm-audit",
"message": "chevrotain: Vulnerability found",
"severity": "high"
},
{
"type": "npm-audit",
"message": "chevrotain-allstar: Vulnerability found",
"severity": "high"
},
{
"type": "npm-audit",
"message": "dompurify: DOMPurify's ADD_TAGS function form bypasses FORBID_TAGS due to short-circuit evaluation",
"severity": "medium"
},
{
"type": "npm-audit",
"message": "esbuild: esbuild enables any website to send any requests to the development server and read the response",
"severity": "medium"
},
{
"type": "npm-audit",
"message": "langium: Vulnerability found",
"severity": "high"
},
{
"type": "npm-audit",
"message": "lodash-es: lodash vulnerable to Code Injection via `_.template` imports key names",
"severity": "high"
},
{
"type": "npm-audit",
"message": "mermaid: Mermaid Gantt Charts are vulnerable to an Infinite Loop DoS",
"severity": "medium"
},
{
"type": "npm-audit",
"message": "postcss: PostCSS has XSS via Unescaped </style> in its CSS Stringify Output",
"severity": "medium"
},
{
"type": "npm-audit",
"message": "uuid: uuid: Missing buffer bounds check in v3/v5/v6 when buf is provided",
"severity": "medium"
},
{
"type": "npm-audit",
"message": "vite: Vite Vulnerable to Path Traversal in Optimized Deps `.map` Handling",
"severity": "medium"
},
{
"type": "npm-audit",
"message": "vitepress: Vulnerability found",
"severity": "medium"
},
{
"type": "npm-audit",
"message": "vitepress-plugin-mermaid: Vulnerability found",
"severity": "medium"
}
],
"status": "WARNING",
"scannedAt": "2026-05-23T06:33:33.961Z",
"semgrepRan": false,
"npmAuditRan": true,
"pipAuditRan": true
}Globe icon This course is available in 13 languages: English, 简体中文, 繁體中文, 日本語, 한국어, Español, Français, Русский, Deutsch, العربية, Tiếng Việt, Oʻzbekcha, Türkçe. Choose your language from the badges above.
Learn Harness Engineering is a course dedicated to the engineering of AI coding agents. We have deeply studied and synthesized the most advanced Harness Engineering theories and practices in the industry. Our core references include:
Quick start? The
skills/harness-creator/skill can help you scaffold a production-grade harness (AGENTS.md, feature lists, init.sh, verification workflows) for your own project in minutes.
No comments yet. Be the first to share your thoughts!
A comprehensive course outline and introduction to core philosophies, providing a clear path to get started.

Deep dives into real-world pain points and hands-on projects (like Project 01) for an immersive learning experience.

Templates and reference configurations designed to solve common pitfalls in multi-turn AI agent development, such as context loss and premature task completion.

The repository now includes a PDF build pipeline for the course content.
npm run pdf:build to generate the currently configured PDF coursebooks locally.artifacts/pdfs/.npm run screenshots:readme if you want to refresh the README preview images.release-course-pdfs.yml can build the PDFs and publish them to GitHub Releases.There's a hard truth most people learn the hard way: the strongest model in the world will still fail on real engineering tasks if you don't build a proper environment around it.
You've probably seen this yourself. You give Claude or GPT a task in your repo. It starts well — reads files, writes code, looks productive. Then something goes wrong. It skips a step. It breaks a test. It says "done" but nothing actually works. You spend more time cleaning up than if you'd done it yourself.
This isn't a model problem. It's a harness problem.
The evidence is clear. Anthropic ran a controlled experiment: same model (Opus 4.5), same prompt ("build a 2D retro game editor"). Without a harness, it spent $9 in 20 minutes and produced something that didn't work. With a full harness (planner + generator + evaluator), it spent $200 in 6 hours and built a game you could actually play. The model didn't change. The harness did.
OpenAI reported the same thing with Codex: in a well-harnessed repository, the same model goes from "unreliable" to "reliable." Not a marginal improvement — a qualitative shift.
This course teaches you how to build that environment.
THE HARNESS PATTERN
====================
You --> give task --> Agent reads harness files --> Agent executes
|
harness governs every step:
|
+--> Instructions: what to do, in what order
+--> Scope: one feature at a time, no overreach
+--> State: progress log, feature list, git history
+--> Verification: tests, lint, type-check, smoke runs
+--> Lifecycle: init at start, clean state at end
|
v
Agent stops only when
verification passes
Harness engineering is about building a complete working environment around the model so it produces reliable results. It's not about writing better prompts. It's about designing the system the model operates inside.
A harness has five subsystems:
┌────────────────────────────────────────────────────────────────┐
│ THE HARNESS │
│ │
│ ┌──────────────┐ ┌──────────────┐ ┌────────────────────┐ │
│ │ Instructions │ │ State │ │ Verification │ │
│ │ │ │ │ │ │ │
│ │ AGENTS.md │ │ progress.md │ │ tests + lint │ │
│ │ CLAUDE.md │ │ feature_list │ │ type-check │ │
│ │ feature_list │ │ git log │ │ smoke runs │ │
│ │ docs/ │ │ session hand │ │ e2e pipeline │ │
│ └──────────────┘ └──────────────┘ └────────────────────┘ │
│ │
│ ┌──────────────┐ ┌──────────────────────────────────────┐ │
│ │ Scope │ │ Session Lifecycle │ │
│ │ │ │ │ │
│ │ one feature │ │ init.sh at start │ │
│ │ at a time │ │ clean-state checklist at end │ │
│ │ definition │ │ handoff note for next session │ │
│ │ of done │ │ commit only when safe to resume │ │
│ └──────────────┘ └──────────────────────────────────────┘ │
│ │
└────────────────────────────────────────────────────────────────┘
The MODEL decides what code to write.
The HARNESS governs when, where, and how it writes it.
The harness doesn't make the model smarter.
It makes the model's output reliable.
Each subsystem has one job: