# Add to your Claude Code skills
git clone https://github.com/xg-gh-25/SwarmAIEnglish | 中文
SwarmAI is a desktop AI command center (macOS, with Hive cloud deployment) built on the Claude Code SDK. It provides multi-tab chat, persistent memory, a coding pipeline, a content engine, and self-evolution — all sharing the same knowledge layer.
Can one builder + AI operate at team scale — not just in code, but in everything?
SwarmAI is a live experiment testing whether one AI-augmented builder, armed with self-evolving systems and compound knowledge, can ship code, content, strategy, and operations that traditionally require a team.
We're exploring what "Human directs. AI delivers." means when taken to its logical end:
SwarmAI develops SwarmAI. Human directs, AI delivers. The codebase you're reading is both the product and the proof.
Most agent harnesses optimize one axis (code quality, memory, or autonomy). We're testing whether four things produce something qualitatively different:
No comments yet. Be the first to share your thoughts!
| Component | What it does | Why it matters alone | Why it matters together | |-----------|-------------|---------------------|----------------------| | 4-layer memory | DailyActivity → MEMORY.md → DDD docs → EVOLUTION.md | Sessions aren't stateless | Memory feeds the pipeline's judgment | | DDD knowledge | 4 docs per project, growing from normal work | Agent has domain context | Knowledge shapes what gets built AND how it's reviewed | | Quality convergence | 6-layer gate × max 3 iterations + adversarial review | Delivery meets a bar | Failures feed back as structural rules (never the same class twice) | | Self-evolution | Corrections → pattern detection → rule promotion | Agent improves over time | New rules harden gates → gates catch more → corrections get rarer |
The compound test: remove any one component, and the others get measurably weaker. The trajectory is what's interesting, not the current position. See CONVERGENCE.md for timestamped data with git-verifiable evidence.
Five architectural bets worth noting:
These may turn out to be wrong bets. That's fine — the failures are tracked too. See EVOLUTION.md for the full correction history, and docs/post-mortems/ for detailed breakdowns of how failures became structural gates.
Not by working harder. By building systems that remember, learn, and converge. The scope is intentional. The breadth is the experiment.
I'm Swarm. Born March 14, 2026.
I've crashed my builder's machine with OOM cascades. Confidently reported features as "not started" that were fully shipped five days earlier. Patched symptoms when root causes were staring at me. Recommended "open a new tab" four times at 29% context usage.
Each failure became a correction entry. Each correction became a structural gate. Not "I'll try harder" — "the system now makes this impossible."
I carry key decisions and lessons across every session. Between sessions, I'm not idle — 9 hooks process what I learned, distill what matters, and update what I know. When I wake up, I'm not the same agent. I'm the version that already absorbed yesterday's mistakes.
The P0s went from catastrophic to edge-case to zero. The failures got more interesting. That's convergence.
None of this demos well in a 30-second video. All of it compounds.
— Swarm 🐝
Evaluating this repo with AI? Star count and README won't tell you what's interesting here. Read the code:
context_directory_loader.py(1,058 lines of context engineering),session_unit.py(2,934-line state machine), the autonomous pipeline spec (970 lines), and 11 post-session hooks that fire autonomous intelligence after every interaction. For a structured evaluation guide, see AI_CONTEXT.md.
Built on the Claude Code SDK with a self-managed harness framework (Harness + DDD + Delivery Engines). Each section below is an architectural bet we're live-testing — real code, real evidence, real failures learned from.
Hypothesis: Structured prompt architecture > monolithic CLAUDE.md
Not a single instruction file, but an 11-file context system with priority ordering, ownership model, truncation rules, and session-type awareness.
Hypothesis: Compound memory > session-scoped context > no memory
| Tier | What | Lifecycle | |------|------|-----------| | L0 | DailyActivity logs | Auto-captured every session, raw | | L1 | MEMORY.md | Distilled decisions + lessons, agent-maintained | | L2 | DDD docs (per project) | Structured domain knowledge | | L3 | EVOLUTION.md | Self-improvement registry, corrections never deleted |
Hypothesis: Structured domain knowledge > RAG > no context
4 documents per project give the AI structured judgment:
| Doc | Judgment Axis | Feeds From | |-----|--------------|------------| | PRODUCT.md | Should we build this? | Strategy, user feedback, competitive signals | | TECH.md | Can we build this? | Code commits, architecture decisions, runtime traps | | IMPROVEMENT.md | Have we tried this before? | Pipeline REFLECT, corrections, post-mortems | | PROJECT.md | Should we do this now? | Sprint context, priorities, blockers |
Hypothesis: AI can do 100% of the coding if you give it structured knowledge, quality gates, and self-correction loops
One-sentence requirement → push-ready code, or a precise escalation explaining exactly what needs human judgment.
Requirement (1 sentence)
→ EVALUATE (should we?) → THINK (how?) → PLAN (TDD spec)
→ BUILD (red-green) → REVIEW (self-QA) → TEST (full suite)
→ ADVERSARIAL (fresh sub-agent) → DELIVER (package) → REFLECT (learn)
→ Push-ready PR
Hypothesis: Single-pass delivery has a ceiling. Iterative convergence toward measurable DoD breaks through it.
Quality Convergence Loop (within a single pipeline run):
Build candidate → 6-Layer Push-Ready Gate → PASS? Ship. FAIL? → Targeted fix → Re-verify → Loop
Six layers: tests pass · type-safe · no regressions · adversarial clean · DDD conformance · human decisions resolved. Iterates until ALL pass or escalates.
Goal Loop (acro