flock

Name: flock
Author: hadihonarvar

Pending

Self-hosted LLM gateway. One Go binary turns your Macs and Linux boxes into a private inference cluster — multi-machine routing, sharding via llama.cpp-RPC, per-user keys + quotas + audit, OpenAI- and Anthropic-compatible APIs behind one endpoint. Point Cursor / Claude Code / Aider / SDKs at it.

52stars

1forks

Installation

# Add to your Claude Code skills
git clone https://github.com/hadihonarvar/flock

Getting Started

Guides for using api integration skills like flock.

Getting Started with AI Skills
First-time install walkthrough for Claude Code, Codex CLI, and ChatGPT.
What is an AI Skills Marketplace?
Definitions, how marketplaces work, and how to choose between them in 2026.

README.md

Frequently Asked Questions

What is flock?

flock is an open-source api integration skill for AI coding assistants such as Claude Code, Codex CLI, and ChatGPT, built by hadihonarvar. Self-hosted LLM gateway. One Go binary turns your Macs and Linux boxes into a private inference cluster — multi-machine routing, sharding via llama.cpp-RPC, per-user keys + quotas + audit, OpenAI- and Anthropic-compatible APIs behind one endpoint. Point Cursor / Claude Code / Aider / SDKs at it. It has 52 GitHub stars.

Is flock safe to use?

flock's catalog security scan is still queued. You can run an instant dependency and prompt-injection check now with the "Scan for vulnerabilities" button above.

How do I install flock?

Clone the repository with "git clone https://github.com/hadihonarvar/flock" and add it to your Claude Code skills directory (see the Installation section above).

What programming language is flock written in?

flock is primarily written in Go. It is open-source under hadihonarvar on GitHub, so you can review or fork the full source.

Are there alternatives to flock?

Yes. SkillsLLM lists many other API Integration skills you can browse and compare side by side. Open the API Integration category from the badge at the top of this page, or use the Related Skills and comparison links further down to weigh flock against similar tools.

LLM Engineer for Beginners

Ship LLM features to production - prompts, RAG, structured outputs, evaluation

39 minBeginner

Comments (0)

to leave a comment.

No comments yet. Be the first to share your thoughts!

Related Skills

ECC

by affaan-m

The agent harness performance optimization system. Skills, instincts, memory, security, and research-first development for Claude Code, Codex, Opencode, Cursor and beyond.

218,961

Popular in API Integration

Top skills in this category by stars

CLIProxyAPI

by router-for-me

Wrap Antigravity, ChatGPT Codex, Claude Code, Grok Build as an OpenAI/Gemini/Claude/Codex compatible API service, allowing you to enjoy the free Gemini 3.1 Pro, GPT 5.5, Grok 4.3, Claude model through API

37,988

single-file-wbs awesome-nsfw-ai

Flock

Self-hosted AI for your team. One endpoint. Your hardware.

flockllm.com · GitHub · Maintained by Hadi Honarvar Nazari · Apache-2.0

Flock is the self-hosted control plane for LLMs. One Go binary turns your Macs and Linux boxes into a private inference cluster — multi-machine routing, per-user keys, daily quotas, full audit log, and a built-in admin dashboard, behind one endpoint that speaks both the OpenAI and Anthropic APIs.

Engine-agnostic: bring Ollama, vLLM, MLX-LM, or llama.cpp-RPC. Run open-weight models (Qwen, Llama, DeepSeek, …) on your own hardware, shard a giant model across several machines via llama.cpp-RPC, and transparently fall back to paid Claude / GPT only when you choose.

Point Cursor, Claude Code, Aider, Continue, or any OpenAI/Anthropic SDK at Flock. It just works.

🗺️ Where Flock sits

           ┌──────────────────────────────────────────────────────────────┐
           │                       YOUR USE CASES                         │
           │             (the tools your team already uses)               │
           └──────────────────────────────────────────────────────────────┘
                  │           │          │             │            │
                  ▼           ▼          ▼             ▼            ▼
            ┌──────────┐ ┌──────────┐ ┌──────────┐ ┌──────────┐ ┌──────────┐
            │  Cursor  │ │  Claude  │ │  Aider   │ │  Custom  │ │   curl   │
            │          │ │   Code   │ │          │ │ Python   │ │  scripts │
            │          │ │          │ │          │ │   SDK    │ │          │
            └────┬─────┘ └────┬─────┘ └────┬─────┘ └────┬─────┘ └────┬─────┘
                 │  OpenAI    │ Anthropic  │  OpenAI    │  Either    │  HTTP
                 └────────────┴────────────┴────────────┴────────────┘
                                          │
                                          │   ONE URL · ONE API KEY
                                          ▼
      ╔══════════════════════════════════════════════════════════════════════╗
      ║                  ⬢ ⬢ ⬢   FLOCK   ⬢ ⬢ ⬢                              ║
      ║                  (this is what we built)                             ║
      ║  ════════════════════════════════════════════════════════════════    ║
      ║  Gateway     OpenAI + Anthropic on /v1/chat/completions              ║
      ║              per-user keys · daily quotas · full audit log           ║
      ║              admin dashboard at :8080                                ║
      ║                                                                      ║
      ║  Router      Same model on N nodes  → load-balance                   ║
      ║              Different models per node → route by placement          ║
      ║              Model bigger than any node → split via llama.cpp-RPC    ║
      ║              Claude / GPT requested → proxy to vendor                ║
      ║              Engine error or timeout  → retry catalog fallback chain ║
      ╚═════════════════════════════╤════════════════════════════════════════╝
                                    │
              ┌─────────────────────┼─────────────────────┐
              ▼                     ▼                     ▼
       ┌─────────────┐       ┌─────────────┐       ┌─────────────┐
       │   Engines   │       │   Engines   │       │   Egress    │
       │  (any mix)  │       │  (any mix)  │       │   proxy     │
       │  • Ollama   │       │  • Ollama   │       │             │
       │  • vLLM     │       │  • vLLM     │       │ api.anthro- │
       │  • MLX-LM   │       │  • MLX-LM   │       │ pic.com     │
       │  • llama.cpp│       │  • llama.cpp│       │ api.openai  │
       └──────┬──────┘       └──────┬──────┘       │ .com        │
              │                     │              └──────┬──────┘
              ▼                     ▼                     ▼
      ┌──────────────────────────────────────────────────────────────────────┐
      │                    UNDERLYING LLMs / WEIGHTS                         │
      │                                                                      │
      │   YOUR HARDWARE                              VENDOR APIs             │
      │   • Mac Studio · Mac Mini                    • Claude (Anthropic)    │
      │   • Linux + RTX GPU                          • GPT, o3, o4 (OpenAI)  │
      │                                                                      │
      │   41 curated catalog models (Qwen 3.6, GLM,   Each request routed   │
      │   gpt-oss, Llama 4, Gemma 4, DeepSeek V4,     to EITHER your hard-  │
      │   Kimi K2.6, Nemotron 3 Ultra, vision +       ware OR a vendor —    │
      │   embedding models)                           you pay vendors only  │
      │   + any HuggingFace or Ollama model.          when YOU chose to.    │
      └──────────────────────────────────────────────────────────────────────┘

One-sentence version: Flock is the layer that lets your tools talk to any LLM — open-weight on your hardware, or hosted Claude / GPT — through one URL and one API key, with the team controls (quotas, audit, per-user keys) that the raw vendor APIs don't give you.

🚀 Try it in 60 seconds

Flock is engine-agnostic. The quickest path uses Ollama as the local engine — but vLLM, MLX-LM, and llama.cpp-RPC all work. See Prerequisites — read first below for the alternatives.

🍎 macOS (Apple Silicon — M1/M2/M3/M4)

# 1. install Flock
curl -fsSL https://raw.githubusercontent.com/hadihonarvar/flock/main/installer/install.sh | sh
export PATH="$HOME/.local/bin:$PATH"   # if the installer says so

# 2. install an engine (pick one) — Ollama is the simplest default
brew install --cask ollama && open -a Ollama
# alternatives: pip install mlx-lm  ·  or run llama.cpp's llama-server  ·  or run vLLM in Docker

# 3. start Flock with a tiny model (~1 GB, fast download)
FLOCK_DEFAULT_MODEL=llama-3.2-1b flock up

🐧 Linux (x86_64 or arm64) — including Raspberry Pi, NAS, edge boxes

Option A — .deb / .rpm package (recommended for Debian / Ubuntu / Raspbian / QNAP / Asustor / Fedora / RHEL):

# Debian / Ubuntu / Raspbian (arm64 example — also amd64)
curl -LO https://github.com/hadihonarvar/flock/releases/latest/download/flock_VERSION_linux_arm64.deb
sudo dpkg -i flock_VERSION_linux_arm64.deb
# Binary at /usr/bin/flock, catalog at /usr/share/flock/catalog
# Recommends llama.cpp for sharding — install via apt if you want it.

# Fedora / RHEL / CentOS
sudo rpm -i https://github.com/hadihonarvar/flock/releases/latest/download/flock_VERSION_linux_amd64.rpm

(Replace VERSION with the latest from Releases. The package version stays current via your distro's normal upgrade path — flock update also works as an in-place binary swap for non-package installs.)

Option B — install.sh (works everywhere; drops binary in ~/.local/bin/ and catalog in ~/.flock/catalog/):

# 1. install Flock
curl -fsSL https://raw.githubusercontent.com/hadihonarvar/flock/main/installer/install.sh | sh
echo 'export PATH="$HOME/.local/bin:$PATH"' >> ~/.bashrc && source ~/.bashrc

# 2. install an engine (pick one) — Ollama is the simplest default
curl -fsSL https://ollama.com/install.sh | sh && sudo systemctl enable --now ollama
# alternatives: vLLM in Docker for NVIDIA  ·  llama.cpp's llama-server  ·  MLX-LM (Apple Silicon only)

# 3. start Flock with a tiny model (~1 GB, fast download)
FLOCK_DEFAULT_MODEL=llama-3.2-1b flock up

💡 Not sure which engine to install? Run flock doctor after step 1 — it inspects your hardware and tells you the single command to run.

What you should see (both platforms)

Flock prints something like:

✔ default model: llama-3.2-1b
✔ engine: ollama at http://127.0.0.1:11434
  Flock is ready.
  API:    http://localhost:8080/v1
  Admin API key:   sk-orc-xK9p…

Every command supports --help — flock <cmd> --help prints usage, flags, and examples.

Copy that admin key. In another terminal:

curl http://localhost:8080/v1/chat/completions \
  -H "Authorization: Bearer sk-orc-xK9p…" \
  -d '{"model":"auto","messages":[{"role":"user","content":"hi in 5 words"}]}'

You should see a JSON response with a 5-word reply. 🎉

Or use the web dashboard: open http://localhost:8080 and paste the admin key.

Or wire up Claude Code: in any terminal where you use Claude Code, set:

export ANTHROPIC_BASE_URL=http://localhost:8080
export ANTHROPIC_AUTH_TOKEN=sk-orc-xK9p…
claude

…and Claude Code talks to your local model instead of paying for the API.

If something breaks, run flock doctor — it tells you exactly what to fix. Common issues are in the Troubleshooting installation section.


Status	Beta — single-node verified end-to-end (curl, dashboard, CLI); multi-node routing has in-process E2E coverage (`internal/controlplane/two_node_e2e_test.go`); real two-machine verification via the 30-sec smoke script + [manual walkthrough](docs/TW