maggy

Name: maggy
Author: alinaqi

Verified

What started as an opinionated Claude Code setup kit is now an autonomous AI engineering command center

697stars

57forks

Python

Installation

# Add to your Claude Code skills
git clone https://github.com/alinaqi/maggy

Getting Started

Guides for using ai agents skills like maggy.

Caveman: Cut Claude Token Use by 65%
How agent-side prompt compression works, when to use it, and when not to.
What is an AI Skills Marketplace?
Definitions, how marketplaces work, and how to choose between them in 2026.
Getting Started with AI Skills

Security ReportVerified

Last scanned: 5/15/2026

{
  "issues": [],
  "status": "PASSED",
  "scannedAt": "2026-05-15T06:57:03.521Z",
  "semgrepRan": false,
  "npmAuditRan": true,
  "pipAuditRan": true
}

README.md

Frequently Asked Questions

What is maggy?

maggy is an open-source ai agents skill for AI coding assistants such as Claude Code, Codex CLI, and ChatGPT, built by alinaqi. What started as an opinionated Claude Code setup kit is now an autonomous AI engineering command center. It has 697 GitHub stars.

Is maggy safe to use?

Yes. maggy passed SkillsLLM's automated security scan — a dependency vulnerability audit plus prompt-injection heuristics — with no high-severity issues. You can read the full report in the Security Report section on this page.

How do I install maggy?

Clone the repository with "git clone https://github.com/alinaqi/maggy" and add it to your Claude Code skills directory (see the Installation section above).

What programming language is maggy written in?

maggy is primarily written in Python. It is open-source under alinaqi on GitHub, so you can review or fork the full source.

Are there alternatives to maggy?

Yes. SkillsLLM lists many other AI Agents skills you can browse and compare side by side. Open the AI Agents category from the badge at the top of this page, or use the Related Skills and comparison links further down to weigh maggy against similar tools.

Agentic AI for Beginners

Build your first AI agent from scratch - tool use, ReAct pattern, memory, deployment

41 minBeginner

Comments (0)

to leave a comment.

No comments yet. Be the first to share your thoughts!

Related Skills

superpowers

by obra

An agentic skills framework & software development methodology that works.

234,966

codanna OpenJudge

Claude Bootstrap + Maggy

Turn Claude Code into a self-reviewing, test-enforced engineering system that remembers context across sessions — then route work across 13 models from a single dashboard.

Claude Bootstrap is an installable config pack (skills, hooks, rules, templates) for Claude Code. Maggy is the optional local server that adds multi-model routing, a web dashboard, intent-driven protocols, and plugin orchestration. Both live in this repo. Start with Bootstrap; add Maggy when you need the harness.

1100+ tests. 67 skills. 15 MCP tools. Used daily across production codebases.

Who This Is For

Solo engineers using Claude Code who want TDD enforcement, quality gates, and memory that survives context compaction — without changing their workflow
Teams routing work across Claude, DeepSeek, Kimi, Gemini, and Codex from a single dashboard with cost-aware model selection
Platform engineers building AI-assisted developer tooling who need a reference implementation with intent tracking, protocol execution, and plugin architecture

Choose Your Path

	Claude Bootstrap	Maggy Harness
What it is	Skills, hooks, rules installed into `~/.claude/`	Local FastAPI server + web dashboard
Install time	~30 seconds	~5 minutes (Python 3.11+, API keys)
Requires	Claude Code (also works with Codex, Kimi, Gemini CLI)	Everything in Bootstrap + Python + optional Docker
You get	TDD enforcement, 67 skills, quality gates, ADR reviews, iCPG, Mnemos memory	All of Bootstrap + 13-tier routing, skill protocols, Telos testing, Cortex MCP, plugins, dashboard

Bootstrap — 30-second install

git clone https://github.com/alinaqi/maggy.git
cd maggy && ./install.sh

Your next Claude Code session picks it up automatically.

Full Harness — zero-config

pipx install maggy-harness   # or: pip install maggy-harness
maggy bootstrap              # installs skills, hooks, ~/bin model wrappers, plugins
maggy serve                  # auto-configures from your local repos,
                             # then opens the dashboard at localhost:8080

(or from source: cd maggy && ./install.sh && maggy serve)

No API keys required to start — Maggy runs in local mode and, on first launch, discovers your local git repos and opens the dashboard pointed at them. Add GITHUB_TOKEN / ANTHROPIC_API_KEY later only if you want GitHub sync or API-model features. See GETTING_STARTED.md for details.

What It Looks Like in Practice

Routing a task:

You: "review the auth middleware for timing attacks"
→ Blast score: 8/10 (security + architecture)
→ Routed to: Claude (Tier 11)
→ ADR gate: found docs/adr/0003-jwt-strategy.md → injected as context
→ Review runs with full architectural context

Skill Protocol execution:

You: "push to git"
→ Intent matched: git-push protocol
→ ✅ lint       (2.1s)
→ ✅ typecheck   (4.3s)
→ ✅ tests       (11.2s)
→ ✅ stage
→ ✅ commit      [AI-generated: "fix: resolve token refresh race condition"]
→ ✅ push

Fatigue-aware memory:

Session fatigue: 0.61 (PRE-SLEEP)
→ Mnemos: auto-checkpoint written
→ Micro-consolidation: 3 ResultNodes compressed
→ iCPG context injected: 2 ReasonNodes, 1 constraint
→ Context freed: ~18k tokens

The Problem This Solves

You're using Claude Code. It's impressive — but:

It picks the most expensive model for everything, including trivial tasks
Context fills up, state is lost, you re-explain yourself every session
There's no enforcement: code quality, test coverage, and ADR compliance only happen if you remember to ask
Running multiple agents on the same repo causes file conflicts
You have no visibility into what Claude is actually doing inside your codebase

What Bootstrap Gives You

Layer	What it does
67 skills	Python, TypeScript, React, React Native, Flutter, Supabase, Firebase, Stripe, Playwright, security, ADRs, cross-agent delegation
TDD enforcement	Stop hooks — tests must pass before Claude considers a task done
Quality gates	Max 20 lines/function, 3 params, 2 nesting levels. Enforced per file
iCPG	Intent-Augmented Code Property Graph. Stores why code exists. 6-dimension drift detection. Prevents duplicate implementations
Mnemos	Task-scoped memory with 4-dimension fatigue model. Survives context compaction with typed checkpoints
ADR enforcement	Non-trivial changes require an Architectural Decision Record. Missing one? Reverse-engineered from git history
Agent teams	6 agents: Lead, Quality, Security, Review, Merger, Feature

What Maggy Adds

System	What it does
13-Tier Routing	Semantic blast score (1–10) routes to cheapest capable model. Local Qwen3 classifier → DeepSeek (~80% of tasks) → Kimi → Gemini → Grok → Codex → Claude. Budget-capped with auto-demotion. Routing details
Skill Protocols	YAML-defined workflows in `maggy/skills/protocols/`. "Push to git" → lint → test → stage → commit → push. Drop a `.yaml` to add your own
Telos	Testing beyond TDD. Three planes: Conformance × Validation × Integrity. A zero in any plane collapses the total score. Details
Cortex MCP	Code intelligence: 10 edge types, cyclomatic complexity, FTS5 search, bidirectional traversal. 15 tools, single SQLite DB. Benchmarks
Polyphony	Docker-isolated parallel agent execution. Second session auto-provisions a workspace. Spec
Engram	Cross-session memory. 7 amnesia types. Persists architectural knowledge across weeks
Council PR Review	Multi-model council reviews a GitHub PR from the dashboard — deterministic mega-PR chunking, a static gate (tsc/ruff) as ground truth, and an adversarial refute pass that kills false positives. Extensible per-language skills (Python/TS/Go/Rust/Java/C#/Ruby/PHP + drop-in more). `pip install maggy-harness[review]`
Plugins	Drop-in system. Ships with: Build-in-Public (auto-posts to LinkedIn/X), Telos, GitHub/Asana/Monday providers

Model Routing

Every message is scored 1–10 for complexity and classified by task type. The cheapest capable model wins.

Tier	Model	Role
T0	Qwen3 (local)	Classification, triage, free bulk ops
T1	Gemini Flash-Lite	Bulk extraction, CIG pipelines
T2	DeepSeek Flash	Docs, tests, scaffolding
T3	Gemini Flash	Multimodal, vision, audio
T4	DeepSeek Pro	Complex coding, multi-file refactors
T5	Gemini CLI	Multi-file agentic coding
T6	AGY	End-to-end implementation (git + code + test)
T7	Kimi	Long-context analysis, routing alt
T8	Gemini Pro Search	Deep research, Google grounding, 2M context
T9	Grok	Competitor intel, deep reasoning
T10	Codex	Bulk generation, security-sensitive tasks
T11	Claude Sonnet	Quality-critical code, complex debugging
T12	Claude Opus	Architecture, security review, ADR decisions

Routing is semantic (Qwen3 as local classifier), fatigue-aware, budget-capped, and cascading.

Gateway routing with srooter — www.srooter.ai

We've added first-class support for srooter, an Anthropic/OpenAI-compatible LLM gateway that routes your requests across models (Claude, MiniMax, DeepSeek, Kimi, Gemini, Grok, local Qwen) transparently — intent-based routing, budget caps, fallbacks, and a usage dashboard, without changing your tools.

Recommended with Maggy, Claude Code, or Codex. Point any of them at the gateway and your traffic is routed for you — no per-tool config:

# Claude Code (or Codex) → srooter
export ANTHROPIC_BASE_URL="https://www.srooter.ai/anthropic"   # or your local gateway
export ANTHROPIC_API_KEY="<your-srooter-key>"
claude        # now routed through srooter

Pick the model you "follow" once with /model-config — Maggy, the route-task hooks, and srooter all honor the same choice. Trivial asks stay on the cheap/local tier; real coding goes to your primary model (e.g. MiniMax-M2.5).

Telos: Testing Beyond TDD

Standard TDD tells you if your code passes tests. Telos tells you if your code fulfills its intent.

IFS (Intent Fidelity Scale) = F1 × F2 × F3

F1 — Conformance:  passed / total tests            (pytest / vitest)
F2 — Validation:   drift severity                  (Cortex drift_events)
F3 — Integrity:    IF-3 orphan symbols              (no reason edges)
                   IF-4 empty contracts             (no pre/post/invariants)
                   IF-6 stale reasons               (proposed >7d, never fulfilled)
                   IF-7 scope sprawl                (reason scopes >10 files)

A zero in any plane collapses IFS to zero. 100% test pass rate with severe architectural drift = score of 0. This is intentional. See the Telos RFC.

Repo Structure

.claude/
  skills/       # 67 skills — Python, TS, React, security, mobile, databases
  hooks/        # TDD enforcement, quality gates, Mnemos lifecycle
  rules/        # Conditional rules by file glob
  templates/    # settings.json, CLAUDE.md, ADR template, PR template

maggy/
  maggy/
    pipeline/   # Unified ChatPipeline orchestrator
    skills/     # Skill injection + YAML protocol engine