claude-code-local

Name: claude-code-local
Author: nicedreamzapp

Verified

Run Claude Code 100% on-device with local AI on Apple Silicon. MLX-native Anthropic-API server, 65 tok/s Qwen 3.5 122B, Llama 3.3 70B, Gemma 4 31B. Private, offline, airgap-ready. Built for NDA / legal / healthcare workflows.

2,965stars

568forks

Python

Installation

# Add to your Claude Code skills
git clone https://github.com/nicedreamzapp/claude-code-local

Getting Started

Guides for using ai agents skills like claude-code-local.

Caveman: Cut Claude Token Use by 65%
How agent-side prompt compression works, when to use it, and when not to.
What is an AI Skills Marketplace?
Definitions, how marketplaces work, and how to choose between them in 2026.
Getting Started with AI Skills

Security ReportVerified

Last scanned: 4/24/2026

{
  "issues": [],
  "status": "PASSED",
  "scannedAt": "2026-04-24T06:11:28.460Z",
  "semgrepRan": false,
  "npmAuditRan": true,
  "pipAuditRan": true
}

README.md

Frequently Asked Questions

What is claude-code-local?

claude-code-local is an open-source ai agents skill for AI coding assistants such as Claude Code, Codex CLI, and ChatGPT, built by nicedreamzapp. Run Claude Code 100% on-device with local AI on Apple Silicon. MLX-native Anthropic-API server, 65 tok/s Qwen 3.5 122B, Llama 3.3 70B, Gemma 4 31B. Private, offline, airgap-ready. Built for NDA / legal / healthcare workflows. It has 2,965 GitHub stars.

Is claude-code-local safe to use?

Yes. claude-code-local passed SkillsLLM's automated security scan — a dependency vulnerability audit plus prompt-injection heuristics — with no high-severity issues. You can read the full report in the Security Report section on this page.

How do I install claude-code-local?

Clone the repository with "git clone https://github.com/nicedreamzapp/claude-code-local" and add it to your Claude Code skills directory (see the Installation section above).

What programming language is claude-code-local written in?

claude-code-local is primarily written in Python. It is open-source under nicedreamzapp on GitHub, so you can review or fork the full source.

Are there alternatives to claude-code-local?

Yes. SkillsLLM lists many other AI Agents skills you can browse and compare side by side. Open the AI Agents category from the badge at the top of this page, or use the Related Skills and comparison links further down to weigh claude-code-local against similar tools.

Agentic AI for Beginners

Build your first AI agent from scratch - tool use, ReAct pattern, memory, deployment

41 minBeginner

Comments (0)

to leave a comment.

No comments yet. Be the first to share your thoughts!

Related Skills

superpowers

by obra

An agentic skills framework & software development methodology that works.

234,966

awesome-agent-skills openclaw101

🤔 What Is This?

Your Mac has a powerful GPU built right into the chip. This project uses that GPU to run massive AI models — the same kind that power ChatGPT and Claude — entirely on your computer, and plugs them into Claude Code so the whole coding experience works offline.

🚫 No internet needed 💰 No monthly subscription 🔒 No one sees your code or data ✅ Full Claude Code experience — write code, edit files, manage projects, control your browser, or run a full hands-free voice session

         📱 You (Mac or Phone)
          │
     🤖 Claude Code           ← the AI coding tool you know
          │  HTTP localhost:4000
     ⚡ MLX Native Server      ← this repo (~1000 lines of Python)
          │
     🥊 Pick your fighter     ← Gemma 4 31B · Llama 3.3 70B · Qwen 3.5 122B
          │
     🖥️  Apple Silicon GPU    ← your M-series chip does all the work

The trick: Claude Code speaks the Anthropic API. Local model servers speak the OpenAI API. So everyone bolts a translation proxy in between — and the proxy is slow and fragile. This server speaks Anthropic natively. One process, zero translations:

🐌 What everyone else does	🚀 What we did
Claude Code → Proxy → Ollama → Model	Claude Code → Our Server → Model
3 processes, 2 API translations	1 process, 0 translations
133 seconds per task	17.6 seconds per task

🎯 That one change — eliminating the proxy — made it 7.5× faster.

🎬 Watch It Run — AirGap AI

A real NDA. Llama 3.3 70B. Wi-Fi physically OFF. lsof running live. Watch a 70-billion-parameter model audit a confidential legal document, on-device, with the receipts on screen.

More local-AI demos on the channel:

Video	What happens
🌌 The Rematch	4 AI engines build northern lights, 3 fully local — the local challenger painted the best aurora
🏁 Hexagon Shootout	Gemma 31B vs Llama 70B vs cloud Claude, same physics prompt, live counters — 2 of 3 with zero cloud calls
🐳 DeepSeek Three-Way	DeepSeek V4 Flash local beats cloud Claude on wall-clock, same MacBook
🎤 NarrateClaude	Speak to Claude Code, hear replies in a cloned voice — 100% on-device
🏠 Mac mini as home AI	Chat with the Mac mini at home from any browser on any phone

🥊 The Lineup — Pick Your Fighter

We started with one model. Now we ship a roster. Same MLX server, same Anthropic API — swap one env var and you swap the brain. Plus the ds4 engine for DeepSeek V4 Flash via its own native Metal runtime.

	🟢 Gemma 4 31B	🟠 Llama 3.3 70B	🔵 Qwen 3.5 122B	🐳 DeepSeek V4 Flash ⭐
Nickname	The Quick One	The Wise One	The Beast	The 1M-Context Whale
Build	4-bit IT abliterated	8-bit abliterated	4-bit MoE (A10B)	2-bit asymmetric (ds4 GGUF)
Speed	~15 tok/s	~7 tok/s	65 tok/s 🚀	~32 tok/s
Params	31 B dense	71 B dense	122 B / 10 B active	284 B / 37 B active
Context	128 K	128 K	256 K	1 M tokens
RAM	~18 GB	~70 GB	~75 GB	~81 GB
Min RAM to run	32 GB	96 GB	96 GB	128 GB
Best at	Daily coding	Hardest reasoning, full precision	Max throughput, active sparsity	Long context, agentic loops
Engine	MLX Native	MLX Native	MLX Native	`antirez/ds4`
Launcher	`Gemma 4 Code.command`	`Llama 70B.command`	`Claude Local.command`	`DeepSeek V4 Flash.app`

💡 Fun fact: Qwen wins raw speed because it's an MoE — only 10B of 122B params activate per token. DeepSeek V4 Flash is even bigger (284B) but only ~37B active per token, and it ships with on-disk KV cache so a 25k-token Claude Code system prompt prefills exactly once, ever.

🐳 DeepSeek V4 Flash via `ds4`

We tested it the day Antirez (the Redis guy) shipped ds4. Local DeepSeek beat cloud Claude on wall-clock time on the same MacBook, same prompt — watch the three-way.


🧠 Engine	`antirez/ds4` — pure C + Metal kernels, ~few thousand lines
🤗 Weights	`antirez/deepseek-v4-gguf` (q2: 81 GB, q4: 153 GB)
📦 Server wrapper	`~/.local/bin/ds4-server-up` (boots on demand)
🚀 Claude Code wrapper	`~/.local/bin/claude-ds4` (drop-in replacement for `claude`)
📏 Context	1 M tokens; 200 K is sane for most agent runs
💾 Disk KV cache	Persists across restarts — first prefill is the only one that ever happens

⭐ Our Own MLX Abliterated Uploads

The models in this lineup aren't from generic mirrors — we package and upload our own abliterated MLX builds to HuggingFace so anyone running this repo can pull them with one command. Browse the full set at huggingface.co/divinetribe.

# Llama 3.3 70B — full-precision feel
MLX_MODEL=divinetribe/Llama-3.3-70B-Instruct-abliterated-8bit-mlx \
  bash scripts/start-mlx-server.sh

# Gemma 4 31B — fast daily driver
MLX_MODEL=divinetribe/gemma-4-31b-it-abliterated-4bit-mlx \
  bash scripts/start-mlx-server.sh

# Hermes 4 14B — sweet spot for 16/32 GB Macs
MLX_MODEL=divinetribe/Hermes-4-14B-abliterated-4bit-mlx \
  bash scripts/start-mlx-server.sh

Model	Quant	Disk	Params	Context	Best for
`Llama-3.3-70B-Instruct-abliterated-8bit-mlx`	8-bit, g64	~75 GB	71 B dense	128 K	Hardest reasoning on 96 GB+ Macs
`gemma-4-31b-it-abliterated-4bit-mlx`	4-bit, g64	~17 GB	31 B dense	128 K	Daily coding on a 32 GB+ Mac
`Hermes-4-14B-abliterated-4bit-mlx`	4-bit, g64	~8 GB	14 B dense (Qwen3 base)	40 K	16 GB Macs, instruction-following, tool use

Abliteration sources: huihui-ai (Llama, Gemma) and Babsie (Hermes). MLX conversion + quantization by us. See what abliteration means.

⚠️ Use it responsibly. "Abliterated" suppresses the model's built-in refusal direction so it doesn't refuse benign-but-edgy requests. It is not a general capability upgrade, and you remain bound by each upstream license (Llama 3.3, Gemma, Hermes/Qwen3).