by mlhher
Orchestrate an entire AI dev team on 5GB VRAM. Ephemeral subagents, exact-match diffs. Single static binary, any model. Zero config, zero context bloat.
# Add to your Claude Code skills
git clone https://github.com/mlhher/late-cliLast scanned: 5/29/2026
{
"issues": [],
"status": "PASSED",
"scannedAt": "2026-05-29T07:57:42.131Z",
"semgrepRan": false,
"npmAuditRan": true,
"pipAuditRan": true
}late-cli is an open-source ai agents skill for AI coding assistants such as Claude Code, Codex CLI, and ChatGPT, built by mlhher. Orchestrate an entire AI dev team on 5GB VRAM. Ephemeral subagents, exact-match diffs. Single static binary, any model. Zero config, zero context bloat. It has 352 GitHub stars.
Yes. late-cli passed SkillsLLM's automated security scan — a dependency vulnerability audit plus prompt-injection heuristics — with no high-severity issues. You can read the full report in the Security Report section on this page.
Clone the repository with "git clone https://github.com/mlhher/late-cli" and add it to your Claude Code skills directory (see the Installation section above).
late-cli is primarily written in Go. It is open-source under mlhher on GitHub, so you can review or fork the full source.
Yes. SkillsLLM lists many other AI Agents skills you can browse and compare side by side. Open the AI Agents category from the badge at the top of this page, or use the Related Skills and comparison links further down to weigh late-cli against similar tools.
No comments yet. Be the first to share your thoughts!
Every other coding agent floods its own context with edits, retries and implementation details until the model loses the thread. Late delegates all of that to ephemeral subagents — isolated contexts that execute one task and are destroyed. The orchestrator sees only plans and outcomes, never the mess. Single static binary, zero dependencies, any model.
Drop into any project and start building. Get to your first prompt in less than 10 seconds.
# Linux / macOS
brew tap mlhher/late && brew install late
# Universal Fallback (Linux / macOS / Windows WSL)
curl -sfL https://raw.githubusercontent.com/mlhher/late-cli/main/install.sh | bash
cd your-project
late
Other Installation Methods
- Arch Linux:
yay -S late-cli-bin- Linux / macOS / Native Windows: Download the latest binary and drop it in your PATH. (macOS manual download: if blocked, run
xattr -d com.apple.quarantine /path/to/late)Connecting to Cloud Models? Local models (llama.cpp on
:8080, the default for llama-server) work out-of-the-box. No configuration required. For cloud providers (DeepSeek, Claude, Gemini, OpenRouter), set yourOPENAI_BASE_URL,OPENAI_API_KEY, andOPENAI_MODELenvironment variables.
Lead Architect forming a plan and spawning an atomic subagent for a surgical edit.
| Late | Claude Code | OpenCode | The Weekly Clone | |
|---|---|---|---|---|
| Workflow | Autonomous Orchestration | Manual toggling | Manual toggling | Blind execution/Manual toggling |
| Implementations | Ephemeral subagents (Context destroyed) | Floods main context window | Floods main context window | Floods main context window |
| KV-Cache | Ruthless KV cache management | Brute-force context dumping | Brute-force context dumping | Brute-force context dumping |
| System Prompt | ~1,000 tokens (Always planning workflow) | 10,000+ tokens | 10,000+ tokens | ~300-1000+ tokens (No-workflow lobotomy) |
| Dependencies | Zero-dependency static binary | Node.js | Node.js | Python / Node.js |
| Setup required | None (OOTB llama-server support) |
Anthropic OAuth / Sign-in | Mandatory JSON tweaking | Flavor of the week JSON/YAML/TOML configs |
| Built For | Builders wanting 10x throughput | Enterprise expense accounts | Tinkering with settings | Chasing GitHub stars |
"The same model feels smarter with Late." — Reddit
"Late-CLI is mindblowing... I'm shocked that the token usage is so minimal, I keep expecting a big bill from DeepSeek's API." — GitHub Discussions
Outperforming Claude Code and Codex for Local LLM Workflows — Agent Native
Built with Late: Late is primarily developed inside Late itself.
Works with Claude, DeepSeek, Qwen, Gemma (including thinking support for Gemma), and any OpenAI-compatible API. See the Quickstart Guide for hybrid model routing, keybindings, MCP setup, Skills and more.
Standard coding agents do all their work, whether it's planning, implementing, retrying failed edits, or self-healing, in one shared context window. Every retry, every failed implementation, every repair loop pollutes the context the model reasons from. It degrades. You blame the model. The model is fine.
Late separates concerns. A lean orchestrator (~1,000 token system prompt) reads your codebase, forms a plan, and delegates individual implementation tasks to ephemeral subagents. Each subagent gets a fresh isolated context containing only its one task and nothing else. When it completes, that context is destroyed. The orchestrator only ever sees outcomes.
Late manages the KV cache and context window carefully, leaving more room for reasoning. The orchestrator's context grows only from what matters: your instructions and the agent's decisions. Everything the subagent did to get there is gone with it. This is why the same model feels sharper in Late. It reasons from signal, not noise.
search/replace blocks with autonomous self-healing on mismatch. Edits fail loud. We never silently corrupt your files.[y/N]. Features Session, Project, and Global trust scopes with TTL decay.Built to create engineering leverage, not to supply free infrastructure for AI startups.
Free for builders: Use Late freely to write code for any project, including commercial ones. Your output is yours.
Commercial restrictions: You may not monetize Late itself. Wrapping the orchestration engine into a paid service or deploying it as enterprise infrastructure requires a commercial agreement.
Late converts to GPLv2 on February 21, 2030. Full license in LICENSE.