by alinaqi
What started as an opinionated Claude Code setup kit is now an autonomous AI engineering command center
# Add to your Claude Code skills
git clone https://github.com/alinaqi/maggyLast scanned: 5/15/2026
{
"issues": [],
"status": "PASSED",
"scannedAt": "2026-05-15T06:57:03.521Z",
"semgrepRan": false,
"npmAuditRan": true,
"pipAuditRan": true
}maggy is an open-source ai agents skill for AI coding assistants such as Claude Code, Codex CLI, and ChatGPT, built by alinaqi. What started as an opinionated Claude Code setup kit is now an autonomous AI engineering command center. It has 697 GitHub stars.
Yes. maggy passed SkillsLLM's automated security scan — a dependency vulnerability audit plus prompt-injection heuristics — with no high-severity issues. You can read the full report in the Security Report section on this page.
Clone the repository with "git clone https://github.com/alinaqi/maggy" and add it to your Claude Code skills directory (see the Installation section above).
maggy is primarily written in Python. It is open-source under alinaqi on GitHub, so you can review or fork the full source.
Yes. SkillsLLM lists many other AI Agents skills you can browse and compare side by side. Open the AI Agents category from the badge at the top of this page, or use the Related Skills and comparison links further down to weigh maggy against similar tools.
No comments yet. Be the first to share your thoughts!
Turn Claude Code into a self-reviewing, test-enforced engineering system that remembers context across sessions — then route work across 13 models from a single dashboard.
Claude Bootstrap is an installable config pack (skills, hooks, rules, templates) for Claude Code. Maggy is the optional local server that adds multi-model routing, a web dashboard, intent-driven protocols, and plugin orchestration. Both live in this repo. Start with Bootstrap; add Maggy when you need the harness.
1100+ tests. 67 skills. 15 MCP tools. Used daily across production codebases.
| Claude Bootstrap | Maggy Harness | |
|---|---|---|
| What it is | Skills, hooks, rules installed into ~/.claude/ |
Local FastAPI server + web dashboard |
| Install time | ~30 seconds | ~5 minutes (Python 3.11+, API keys) |
| Requires | Claude Code (also works with Codex, Kimi, Gemini CLI) | Everything in Bootstrap + Python + optional Docker |
| You get | TDD enforcement, 67 skills, quality gates, ADR reviews, iCPG, Mnemos memory | All of Bootstrap + 13-tier routing, skill protocols, Telos testing, Cortex MCP, plugins, dashboard |
git clone https://github.com/alinaqi/maggy.git
cd maggy && ./install.sh
Your next Claude Code session picks it up automatically.
pipx install maggy-harness # or: pip install maggy-harness
maggy bootstrap # installs skills, hooks, ~/bin model wrappers, plugins
maggy serve # auto-configures from your local repos,
# then opens the dashboard at localhost:8080
(or from source: cd maggy && ./install.sh && maggy serve)
No API keys required to start — Maggy runs in local mode and, on first launch,
discovers your local git repos and opens the dashboard pointed at them. Add
GITHUB_TOKEN / ANTHROPIC_API_KEY later only if you want GitHub sync or
API-model features. See GETTING_STARTED.md for details.
Routing a task:
You: "review the auth middleware for timing attacks"
→ Blast score: 8/10 (security + architecture)
→ Routed to: Claude (Tier 11)
→ ADR gate: found docs/adr/0003-jwt-strategy.md → injected as context
→ Review runs with full architectural context
Skill Protocol execution:
You: "push to git"
→ Intent matched: git-push protocol
→ ✅ lint (2.1s)
→ ✅ typecheck (4.3s)
→ ✅ tests (11.2s)
→ ✅ stage
→ ✅ commit [AI-generated: "fix: resolve token refresh race condition"]
→ ✅ push
Fatigue-aware memory:
Session fatigue: 0.61 (PRE-SLEEP)
→ Mnemos: auto-checkpoint written
→ Micro-consolidation: 3 ResultNodes compressed
→ iCPG context injected: 2 ReasonNodes, 1 constraint
→ Context freed: ~18k tokens
You're using Claude Code. It's impressive — but:
| Layer | What it does |
|---|---|
| 67 skills | Python, TypeScript, React, React Native, Flutter, Supabase, Firebase, Stripe, Playwright, security, ADRs, cross-agent delegation |
| TDD enforcement | Stop hooks — tests must pass before Claude considers a task done |
| Quality gates | Max 20 lines/function, 3 params, 2 nesting levels. Enforced per file |
| iCPG | Intent-Augmented Code Property Graph. Stores why code exists. 6-dimension drift detection. Prevents duplicate implementations |
| Mnemos | Task-scoped memory with 4-dimension fatigue model. Survives context compaction with typed checkpoints |
| ADR enforcement | Non-trivial changes require an Architectural Decision Record. Missing one? Reverse-engineered from git history |
| Agent teams | 6 agents: Lead, Quality, Security, Review, Merger, Feature |
| System | What it does |
|---|---|
| 13-Tier Routing | Semantic blast score (1–10) routes to cheapest capable model. Local Qwen3 classifier → DeepSeek (~80% of tasks) → Kimi → Gemini → Grok → Codex → Claude. Budget-capped with auto-demotion. Routing details |
| Skill Protocols | YAML-defined workflows in maggy/skills/protocols/. "Push to git" → lint → test → stage → commit → push. Drop a .yaml to add your own |
| Telos | Testing beyond TDD. Three planes: Conformance × Validation × Integrity. A zero in any plane collapses the total score. Details |
| Cortex MCP | Code intelligence: 10 edge types, cyclomatic complexity, FTS5 search, bidirectional traversal. 15 tools, single SQLite DB. Benchmarks |
| Polyphony | Docker-isolated parallel agent execution. Second session auto-provisions a workspace. Spec |
| Engram | Cross-session memory. 7 amnesia types. Persists architectural knowledge across weeks |
| Council PR Review | Multi-model council reviews a GitHub PR from the dashboard — deterministic mega-PR chunking, a static gate (tsc/ruff) as ground truth, and an adversarial refute pass that kills false positives. Extensible per-language skills (Python/TS/Go/Rust/Java/C#/Ruby/PHP + drop-in more). pip install maggy-harness[review] |
| Plugins | Drop-in system. Ships with: Build-in-Public (auto-posts to LinkedIn/X), Telos, GitHub/Asana/Monday providers |
Every message is scored 1–10 for complexity and classified by task type. The cheapest capable model wins.
| Tier | Model | Role |
|---|---|---|
| T0 | Qwen3 (local) | Classification, triage, free bulk ops |
| T1 | Gemini Flash-Lite | Bulk extraction, CIG pipelines |
| T2 | DeepSeek Flash | Docs, tests, scaffolding |
| T3 | Gemini Flash | Multimodal, vision, audio |
| T4 | DeepSeek Pro | Complex coding, multi-file refactors |
| T5 | Gemini CLI | Multi-file agentic coding |
| T6 | AGY | End-to-end implementation (git + code + test) |
| T7 | Kimi | Long-context analysis, routing alt |
| T8 | Gemini Pro Search | Deep research, Google grounding, 2M context |
| T9 | Grok | Competitor intel, deep reasoning |
| T10 | Codex | Bulk generation, security-sensitive tasks |
| T11 | Claude Sonnet | Quality-critical code, complex debugging |
| T12 | Claude Opus | Architecture, security review, ADR decisions |
Routing is semantic (Qwen3 as local classifier), fatigue-aware, budget-capped, and cascading.
We've added first-class support for srooter, an Anthropic/OpenAI-compatible LLM gateway that routes your requests across models (Claude, MiniMax, DeepSeek, Kimi, Gemini, Grok, local Qwen) transparently — intent-based routing, budget caps, fallbacks, and a usage dashboard, without changing your tools.
Recommended with Maggy, Claude Code, or Codex. Point any of them at the gateway and your traffic is routed for you — no per-tool config:
# Claude Code (or Codex) → srooter
export ANTHROPIC_BASE_URL="https://www.srooter.ai/anthropic" # or your local gateway
export ANTHROPIC_API_KEY="<your-srooter-key>"
claude # now routed through srooter
Pick the model you "follow" once with /model-config — Maggy, the route-task hooks, and srooter all honor the same choice. Trivial asks stay on the cheap/local tier; real coding goes to your primary model (e.g. MiniMax-M2.5).
Standard TDD tells you if your code passes tests. Telos tells you if your code fulfills its intent.
IFS (Intent Fidelity Scale) = F1 × F2 × F3
F1 — Conformance: passed / total tests (pytest / vitest)
F2 — Validation: drift severity (Cortex drift_events)
F3 — Integrity: IF-3 orphan symbols (no reason edges)
IF-4 empty contracts (no pre/post/invariants)
IF-6 stale reasons (proposed >7d, never fulfilled)
IF-7 scope sprawl (reason scopes >10 files)
A zero in any plane collapses IFS to zero. 100% test pass rate with severe architectural drift = score of 0. This is intentional. See the Telos RFC.
.claude/
skills/ # 67 skills — Python, TS, React, security, mobile, databases
hooks/ # TDD enforcement, quality gates, Mnemos lifecycle
rules/ # Conditional rules by file glob
templates/ # settings.json, CLAUDE.md, ADR template, PR template
maggy/
maggy/
pipeline/ # Unified ChatPipeline orchestrator
skills/ # Skill injection + YAML protocol engine