by hadihonarvar
Self-hosted LLM gateway. One Go binary turns your Macs and Linux boxes into a private inference cluster — multi-machine routing, sharding via llama.cpp-RPC, per-user keys + quotas + audit, OpenAI- and Anthropic-compatible APIs behind one endpoint. Point Cursor / Claude Code / Aider / SDKs at it.
# Add to your Claude Code skills
git clone https://github.com/hadihonarvar/flockflock is an open-source api integration skill for AI coding assistants such as Claude Code, Codex CLI, and ChatGPT, built by hadihonarvar. Self-hosted LLM gateway. One Go binary turns your Macs and Linux boxes into a private inference cluster — multi-machine routing, sharding via llama.cpp-RPC, per-user keys + quotas + audit, OpenAI- and Anthropic-compatible APIs behind one endpoint. Point Cursor / Claude Code / Aider / SDKs at it. It has 52 GitHub stars.
flock's catalog security scan is still queued. You can run an instant dependency and prompt-injection check now with the "Scan for vulnerabilities" button above.
Clone the repository with "git clone https://github.com/hadihonarvar/flock" and add it to your Claude Code skills directory (see the Installation section above).
flock is primarily written in Go. It is open-source under hadihonarvar on GitHub, so you can review or fork the full source.
Yes. SkillsLLM lists many other API Integration skills you can browse and compare side by side. Open the API Integration category from the badge at the top of this page, or use the Related Skills and comparison links further down to weigh flock against similar tools.
No comments yet. Be the first to share your thoughts!
Top skills in this category by stars
Unlocks once the catalog security scan passes (runs nightly).
The deep catalog scan for this skill is still queued. Run an instant dependency check now instead.
Self-hosted AI for your team. One endpoint. Your hardware.
flockllm.com · GitHub · Maintained by Hadi Honarvar Nazari · Apache-2.0
Flock is the self-hosted control plane for LLMs. One Go binary turns your Macs and Linux boxes into a private inference cluster — multi-machine routing, per-user keys, daily quotas, full audit log, and a built-in admin dashboard, behind one endpoint that speaks both the OpenAI and Anthropic APIs.
Engine-agnostic: bring Ollama, vLLM, MLX-LM, or llama.cpp-RPC. Run open-weight models (Qwen, Llama, DeepSeek, …) on your own hardware, shard a giant model across several machines via llama.cpp-RPC, and transparently fall back to paid Claude / GPT only when you choose.
Point Cursor, Claude Code, Aider, Continue, or any OpenAI/Anthropic SDK at Flock. It just works.
┌──────────────────────────────────────────────────────────────┐
│ YOUR USE CASES │
│ (the tools your team already uses) │
└──────────────────────────────────────────────────────────────┘
│ │ │ │ │
▼ ▼ ▼ ▼ ▼
┌──────────┐ ┌──────────┐ ┌──────────┐ ┌──────────┐ ┌──────────┐
│ Cursor │ │ Claude │ │ Aider │ │ Custom │ │ curl │
│ │ │ Code │ │ │ │ Python │ │ scripts │
│ │ │ │ │ │ │ SDK │ │ │
└────┬─────┘ └────┬─────┘ └────┬─────┘ └────┬─────┘ └────┬─────┘
│ OpenAI │ Anthropic │ OpenAI │ Either │ HTTP
└────────────┴────────────┴────────────┴────────────┘
│
│ ONE URL · ONE API KEY
▼
╔══════════════════════════════════════════════════════════════════════╗
║ ⬢ ⬢ ⬢ FLOCK ⬢ ⬢ ⬢ ║
║ (this is what we built) ║
║ ════════════════════════════════════════════════════════════════ ║
║ Gateway OpenAI + Anthropic on /v1/chat/completions ║
║ per-user keys · daily quotas · full audit log ║
║ admin dashboard at :8080 ║
║ ║
║ Router Same model on N nodes → load-balance ║
║ Different models per node → route by placement ║
║ Model bigger than any node → split via llama.cpp-RPC ║
║ Claude / GPT requested → proxy to vendor ║
║ Engine error or timeout → retry catalog fallback chain ║
╚═════════════════════════════╤════════════════════════════════════════╝
│
┌─────────────────────┼─────────────────────┐
▼ ▼ ▼
┌─────────────┐ ┌─────────────┐ ┌─────────────┐
│ Engines │ │ Engines │ │ Egress │
│ (any mix) │ │ (any mix) │ │ proxy │
│ • Ollama │ │ • Ollama │ │ │
│ • vLLM │ │ • vLLM │ │ api.anthro- │
│ • MLX-LM │ │ • MLX-LM │ │ pic.com │
│ • llama.cpp│ │ • llama.cpp│ │ api.openai │
└──────┬──────┘ └──────┬──────┘ │ .com │
│ │ └──────┬──────┘
▼ ▼ ▼
┌──────────────────────────────────────────────────────────────────────┐
│ UNDERLYING LLMs / WEIGHTS │
│ │
│ YOUR HARDWARE VENDOR APIs │
│ • Mac Studio · Mac Mini • Claude (Anthropic) │
│ • Linux + RTX GPU • GPT, o3, o4 (OpenAI) │
│ │
│ 41 curated catalog models (Qwen 3.6, GLM, Each request routed │
│ gpt-oss, Llama 4, Gemma 4, DeepSeek V4, to EITHER your hard- │
│ Kimi K2.6, Nemotron 3 Ultra, vision + ware OR a vendor — │
│ embedding models) you pay vendors only │
│ + any HuggingFace or Ollama model. when YOU chose to. │
└──────────────────────────────────────────────────────────────────────┘
One-sentence version: Flock is the layer that lets your tools talk to any LLM — open-weight on your hardware, or hosted Claude / GPT — through one URL and one API key, with the team controls (quotas, audit, per-user keys) that the raw vendor APIs don't give you.
Flock is engine-agnostic. The quickest path uses Ollama as the local engine — but vLLM, MLX-LM, and llama.cpp-RPC all work. See Prerequisites — read first below for the alternatives.
# 1. install Flock
curl -fsSL https://raw.githubusercontent.com/hadihonarvar/flock/main/installer/install.sh | sh
export PATH="$HOME/.local/bin:$PATH" # if the installer says so
# 2. install an engine (pick one) — Ollama is the simplest default
brew install --cask ollama && open -a Ollama
# alternatives: pip install mlx-lm · or run llama.cpp's llama-server · or run vLLM in Docker
# 3. start Flock with a tiny model (~1 GB, fast download)
FLOCK_DEFAULT_MODEL=llama-3.2-1b flock up
Option A — .deb / .rpm package (recommended for Debian / Ubuntu / Raspbian / QNAP / Asustor / Fedora / RHEL):
# Debian / Ubuntu / Raspbian (arm64 example — also amd64)
curl -LO https://github.com/hadihonarvar/flock/releases/latest/download/flock_VERSION_linux_arm64.deb
sudo dpkg -i flock_VERSION_linux_arm64.deb
# Binary at /usr/bin/flock, catalog at /usr/share/flock/catalog
# Recommends llama.cpp for sharding — install via apt if you want it.
# Fedora / RHEL / CentOS
sudo rpm -i https://github.com/hadihonarvar/flock/releases/latest/download/flock_VERSION_linux_amd64.rpm
(Replace VERSION with the latest from Releases. The package version stays current via your distro's normal upgrade path — flock update also works as an in-place binary swap for non-package installs.)
Option B — install.sh (works everywhere; drops binary in ~/.local/bin/ and catalog in ~/.flock/catalog/):
# 1. install Flock
curl -fsSL https://raw.githubusercontent.com/hadihonarvar/flock/main/installer/install.sh | sh
echo 'export PATH="$HOME/.local/bin:$PATH"' >> ~/.bashrc && source ~/.bashrc
# 2. install an engine (pick one) — Ollama is the simplest default
curl -fsSL https://ollama.com/install.sh | sh && sudo systemctl enable --now ollama
# alternatives: vLLM in Docker for NVIDIA · llama.cpp's llama-server · MLX-LM (Apple Silicon only)
# 3. start Flock with a tiny model (~1 GB, fast download)
FLOCK_DEFAULT_MODEL=llama-3.2-1b flock up
💡 Not sure which engine to install? Run
flock doctorafter step 1 — it inspects your hardware and tells you the single command to run.
Flock prints something like:
✔ default model: llama-3.2-1b
✔ engine: ollama at http://127.0.0.1:11434
Flock is ready.
API: http://localhost:8080/v1
Admin API key: sk-orc-xK9p…
Every command supports --help — flock <cmd> --help prints usage, flags, and examples.
Copy that admin key. In another terminal:
curl http://localhost:8080/v1/chat/completions \
-H "Authorization: Bearer sk-orc-xK9p…" \
-d '{"model":"auto","messages":[{"role":"user","content":"hi in 5 words"}]}'
You should see a JSON response with a 5-word reply. 🎉
Or use the web dashboard: open http://localhost:8080 and paste the admin key.
Or wire up Claude Code: in any terminal where you use Claude Code, set:
export ANTHROPIC_BASE_URL=http://localhost:8080
export ANTHROPIC_AUTH_TOKEN=sk-orc-xK9p…
claude
…and Claude Code talks to your local model instead of paying for the API.
If something breaks, run flock doctor — it tells you exactly what to fix. Common issues are in the Troubleshooting installation section.
| Status | Beta — single-node verified end-to-end (curl, dashboard, CLI); multi-node routing has in-process E2E coverage (internal/controlplane/two_node_e2e_test.go); real two-machine verification via the 30-sec smoke script + [manual walkthrough](docs/TW |