by VILA-Lab
A Systematic Analysis and Discussion of Claude Code for Designing Today's and Future AI Agent Systems
# Add to your Claude Code skills
git clone https://github.com/VILA-Lab/Dive-into-Claude-CodeA comprehensive source-level architectural analysis of Claude Code (v2.1.88, ~1,900 TypeScript files, ~512K lines of code), combined with a curated collection of community analyses, a design-space guide for agent builders, and cross-system comparisons.
[!TIP] TL;DR -- Only 1.6% of Claude Code's codebase is AI decision logic. The other 98.4% is deterministic infrastructure -- permission gates, context management, tool routing, and recovery logic. The agent loop is a simple while-loop; the real engineering complexity lives in the systems around it. This repo dissects that architecture and distills it into actionable design guidance for anyone building AI agent systems.
From Our Paper
Beyond the Paper
| If you are a... | Start here | Then read | |:----------------|:-----------|:----------| | Agent Builder | Build Your Own Agent | Architecture Deep Dive | | Security Researcher | Safety and Permissions | Architecture: Safety Layers | | Product Manager | Key Highlights | Values and Principles | | Researcher | Full Paper (arXiv) | Community Resources |
1,884 files ยท ~512K lines ยท v2.1.88 ยท 7 safety layers ยท 5 compaction stages ยท 54 tools ยท 27 hook events ยท 4 extension mechanisms ยท 7 permission modes
Claude Code answers four design questions that every production coding agent must face:
| Question | Claude Code's Answer |
|:---------|:---------------------|
| Where does reasoning live? | Model reasons; harness enforces. ~1.6% AI, 98.4% infrastructure. |
| How many execution engines? | One queryLoop for all interfaces (CLI, SDK, IDE). |
| Default safety posture? | Deny-first: deny > ask > allow. Strictest rule wins. |
| Binding resource constraint? | ~200K (older models) / 1M (Claude 4.6 series) context window. 5 compaction layers before every model call. |
The system decomposes into 7 components (User โ Interfaces โ Agent Loop โ Permission System โ Tools โ State & Persistence โ Execution Environment) across 5 architectural layers.
[!NOTE] For the full architectural deep dive -- 7 safety layers, 9-step turn pipeline, 5-layer compaction, and more -- see docs/architecture.md.
The architecture traces from 5 human values through 13 design principles to implementation:
| Value | Core Idea | |:------|:----------| | Human Decision Authority | Humans retain control via principal hierarchy. When a 93% prompt-approval rate revealed approval fatigue, response was restructured boundaries, not more warnings. | | Safety, Security, Privacy | System protects even when human vigilance lapses. 7 independent safety layers. | | Reliable Execution | Does what was meant. Gather-act-verify loop. Graceful recovery. | | Capability Amplification | "A Unix utility, not a product." 98.4% is deterministic infrastructure enabling the model. | | Contextual Adaptability | CLAUDE.md hierarchy, graduated extensibility, trust trajectories that evolve over time. |
| Principle | Design Question | |:----------|:----------------| | Deny-first with human escalation | Should unrecognized actions be allowed, blocked, or escalated? | | Graduated trust spectrum | Fixed permission level, or spectrum users traverse over time? | | Defense in depth | Single safety boundary, or multiple overlapping ones? | | Externalized programmable policy | Hardcoded policy, or externalized configs with lifecycle hooks? | | Context as scarce resource | Single-pass truncation or graduated pipeline? | | Append-only durable state | Mutable state, snapshots, or append-only logs? | | Minimal scaffolding, maximal harness | Invest in scaffolding or operational infrastructure? | | Values over rules | Rigid procedures or contextual judgment with deterministic guardrails? | | Composable multi-mechanism extensibility | One API or layered mechanisms at different costs? | | Reversibility-weighted risk assessment | Same oversight for all, or lighter for reversible actions? | | Transparent file-based config and memory | Opaque DB, embeddings, or user-visible files? | | Isolated subagent boundaries | Shared context/permissions, or isolation? | | Graceful recovery and resilience | Fail hard, or recover silently? |
The paper also applies a sixth evaluative lens -- long-term capability preservation -- citing evidence that developers in AI-assisted conditions score 17% lower on comprehension tests.
The core is a ReAct-pattern while-loop: assemble context โ call model โ dispatch tools โ check permissions โ execute โ repeat. Implemented as an AsyncGenerator yielding streaming events.
Before every model call, five compaction shapers run sequentially (cheapest first): Budget Reduction โ Snip โ Microcompact โ Context Collapse โ Auto-Compact.
9-step pipeline per turn: Settings resolution โ State init โ Context assembly โ 5 pre-model shapers โ Model call โ Tool dispatch โ Permission gate โ Tool execution โ Stop condition
Two execution paths:
StreamingToolExecutor -- begins executing tools as they stream in (latency optimization)runTools -- classifies tools as concurrent-safe or exclusiveRecovery: Max output token escalation (3 retries), reactive compaction (once per turn), prompt-too-long handling, streaming fallback, fallback model
5 stop conditions: No tool use, max turns, context overflow, hook intervention, explicit abort
7 permission modes form a graduated trust spectrum: plan โ default โ acceptEdits โ auto (ML classifier) โ dontAsk โ bypassPermissions (+ internal bubble).
Deny-first: A broad deny always overrides a narrow allow. 7 independent safety layers from tool pre-filtering through shell sandboxing to hook interception. Permissions are never restored on resume -- trust is re-established per session.
[!WARNING] Shared failure modes: Defense-in-depth degrades when layers share constraints. Per-subcommand parsing causes event-loop starvation -- commands exceeding 50 subcommands bypass security analysis entirely to prevent the REPL from freezing.
Authorization pipeline: Pre-filtering (strip denied tools) โ PreToolUse hooks โ Deny-first rule evaluation โ Permission handler (4 branche
No comments yet. Be the first to share your thoughts!