Provider-neutral Agent Skill for Codex, Claude Code, and agentic harness design.
# Add to your Claude Code skills
git clone https://github.com/DenisSergeevitch/agents-best-practicesGuides for using ai agents skills like agents-best-practices.
Use this skill when the user asks how to build, improve, debug, or evaluate an agentic harness. This is a general-purpose agent architecture skill. Coding agents are one subdomain only; apply the same principles to research, finance, legal, support, operations, sales, healthcare, education, data analysis, procurement, and workflow automation agents.
An agent harness is the control plane around a model. The model proposes actions; the harness validates, authorizes, executes, records, summarizes, and returns observations. Keep the loop simple and make the runtime rigorous.
Default architecture:
user/task
-> instruction and context builder
-> model call
-> tool/action proposal
-> schema validation
-> permission decision
-> execution or approval pause
-> structured observation
-> context update
-> repeat within budget or finish
Use this skill for prompts involving any of these intents:
"The model proposes actions; the harness validates, authorizes, executes, records, and returns observations."
A provider-neutral Agent Skill for designing, generating MVP blueprints for, auditing, refactoring, and explaining agentic harnesses.
It applies beyond coding agents: research, support, operations, sales, finance, data analysis, procurement, legal workflows, healthcare workflows, education, and workflow automation agents all need the same core runtime discipline.
Install - pick one:
A. With skills (any compatible agent):
npx skills add DenisSergeevitch/agents-best-practices -g
The -g flag installs globally at user level so every project can discover it.
B. Or paste this prompt to your AI agent:
Install the agents-best-practices skill for me:
1. Clone https://github.com/DenisSergeevitch/agents-best-practices into my
user-level skills directory as `agents-best-practices/`.
Use the skill directory my agent reads on this machine, for example:
- Codex: ~/.codex/skills/
- Claude Code: ~/.claude/skills/
2. Verify that SKILL.md, icon.jpeg, and the references/ directory are present.
3. Confirm the install path when done.
No comments yet. Be the first to share your thoughts!
Do not use this skill for ordinary single-turn writing, translation, or Q&A unless the user is asking about the design of an agent that will perform those tasks.
First, identify the user's design problem:
Then load the most relevant reference files, not all files by default. If the user asks to make or build an agent for a domain, default to MVP Builder Mode.
When the user asks to make, build, design, scaffold, or specify an agent for a domain, produce a concrete domain-specific MVP harness blueprint, not only advice. Use mvp-agent-blueprint.md as the primary reference and load other references as needed.
Default behavior:
When the user asks for guidance, produce a concrete architecture, not generic principles:
Use this template when the user wants a harness design. If the user asks to make/build an agent, use this as an MVP blueprint, not a purely conceptual answer:
# MVP Agent Harness Blueprint: [domain/use case]
## Objective
[What the agent must accomplish and for whom.]
## MVP scope and assumptions
[Smallest useful version, explicit assumptions, non-goals, and what is intentionally deferred.]
## Autonomy and risk level
[Answer-only, draft-only, approval-gated, or autonomous within policy.]
## Core loop
[How the model, tools, observations, retries, and stopping rules work.]
## Instruction architecture
[System/developer/user/scoped memory layout.]
## Tool registry
[Tools, schemas, risk classes, permissions, and result format.]
## Planning and goal behavior
[When to plan, when to ask, when to continue, when to stop.]
## Context and memory
[Retrieval, durable state, compaction, and rehydration.]
## Skills and connectors
[Reusable skills, MCP/external connector policy, tool search, attachment rules.]
## Safety and approvals
[Guardrails, prompt injection treatment, secrets, sandboxing, human review.]
## Observability and evals
[Trace events, eval cases, launch criteria, failure probes.]
## Minimal implementation path
[Smallest safe version first, implementation skeleton, validation path, then measured expansion.]
execute_anything, write_database, or send_message without a strict wrapper and approval policy.Use these links when provider-specific detail is needed:
C. Manual install paths:
# Codex
mkdir -p "${CODEX_HOME:-$HOME/.codex}/skills"
git clone https://github.com/DenisSergeevitch/agents-best-practices.git \
"${CODEX_HOME:-$HOME/.codex}/skills/agents-best-practices"
# Claude Code, user-level
mkdir -p "$HOME/.claude/skills"
git clone https://github.com/DenisSergeevitch/agents-best-practices.git \
"$HOME/.claude/skills/agents-best-practices"
# Claude Code, project-level
mkdir -p .claude/skills
git clone https://github.com/DenisSergeevitch/agents-best-practices.git \
.claude/skills/agents-best-practices
Use cases · What this is · Philosophy · Layout · Sources
Once installed, the skill activates when a conversation touches agent architecture, harness design, tool permissions, planning mode, context and memory, skills, connectors, observability, evals, prompt caching, or production readiness. Three common ways to use it:
You have a domain and need the smallest useful production-safe agent harness, not a vague list of best practices.
You > Build an agent for account renewal risk. It should read CRM,
support tickets, and usage data, then draft renewal actions.
Agent > Start with an approval-gated Level 2 harness.
The MVP has one job: produce a renewal-risk brief plus draft
next actions for a human account owner.
Core loop:
user/task -> context builder -> model call -> typed tool call
-> schema validation -> permission check -> execution or pause
-> structured observation -> next step or final brief
Minimal tools:
- read_account_profile read_private_data
- list_support_tickets read_private_data
- fetch_usage_summary read_private_data
- draft_customer_email draft_external_message
- request_approval approval_gate
Launch gate:
20 historical accounts, trace review, no unapproved external
sends, and human acceptance on at least 80% of draft actions.
The primary reference is references/mvp-agent-blueprint.md.
You already have a working agent, but it is brittle, expensive, too broad, or hard to debug.
You > Our research agent sometimes runs tools forever and forgets why
it made a decision after context compaction. Audit the harness.
Agent > The failure points are runtime-level, not prompt-level:
- no hard step/tool/time/cost budget
- compaction preserves prose but loses active approvals
- tool results are unbounded and mix trusted/untrusted data
- no event trace for model output -> tool call -> observation
Fix order:
1. Add loop budgets and termination reasons.
2. Store plan, approvals, todos, and artifacts outside the prompt.
3. Make compaction rehydrate active state, not chat history.
4. Add evals for injection, missing tool result, timeout, and
budget exhaustion.
Start with references/agentic-loop.md, references/context-memory-compaction.md, and references/security-evals-observability.md.
You need an agent to work with real systems without turning every API into an unsafe broad tool.
You > The ops agent needs Slack, Linear, Google Drive, and our internal
deploy API. How should tools and permissions work?
Agent > Split by risk class. Reads can be autonomous when scoped.
Drafts can be autonomous when labeled. External writes, deploys,
destructive actions, privileged access, and financial operations
require an approval record outside the model.
Do not expose generic send_message, write_database, or run_command.
Wrap each action as a narrow typed tool with structured results and
deterministic permission checks.
references/planning-and-goals.md.references/context-memory-compaction.md.references/prompt-caching-and-cost.md.references/provider-api-patterns.md.references/checklists.md."Keep the loop simple and make the runtime rigorous."
A reference for people building agentic systems where the model is only one part of the runtime. It helps design a harness that includes:
This is the control plane around an agent: instructions -> context builder -> model call -> tool proposal -> validation -> permission decision -> execution or approval pause -> observation -> next step or final answer.
execute_anything, send_message, or write_database.Use the single-agent MVP first. Add subagents, goal loops, connectors, and broader autonomy only after measured failures justify them.
agents-best-practices/
├── README.md # public-facing overview and install notes
├── SKILL.md # skill entry point and trigger rules
├── icon.jpeg # skill image used by the README
└── references/
├── mvp-agent-blueprint.md # domain-specific MVP harness blueprint
├── architecture.md # component model and harness boundaries
├── agentic-loop.md # loop invariants, retries, budgets, stopping
├── tools-and-permissions.md # typed tools, risk classes, approvals
├── planning-and-goals.md # planning mode and long-running goals
├── context-memory-compaction.md # context, memory, retrieval, compaction
├── prompt-caching-and-cost.md # stable prefixes and cost-aware context
├── skills-and-connectors.md # Agent Skills, MCP, connectors, tool search
├── system-prompts-instructions.md # instruction hierarchy and templates
├── provider-api-patterns.md # OpenAI, Anthropic, compatible APIs
├── security-evals-observability.md # guardrails, tracing, evals, launch gates
├── agent-legibility-feedback-loops.md # source-of-truth artifacts and cleanup
├── checklists.md # implementation and audit checklists
├── coverage-audit.md # topic coverage verification
└── source-links.md # official references and further reading
The central tension this skill resolves: **how can an agent do useful work in rea