Run Claude Code 100% on-device with local AI on Apple Silicon. MLX-native Anthropic-API server, 65 tok/s Qwen 3.5 122B, Llama 3.3 70B, Gemma 4 31B. Private, offline, airgap-ready. Built for NDA / legal / healthcare workflows.
# Add to your Claude Code skills
git clone https://github.com/nicedreamzapp/claude-code-local๐งฉ This repo is the BRAIN of a 4-part local-first ambient-computing stack
Brain (here) ยท ๐ค Ears+Mouth ยท ๐ Hands ยท ๐ฑ Phone. Each repo stands alone; together they take Claude Code off the keyboard and off the screen. Jump to the stack diagram โ
๐ฅ๏ธ More of my open-source software: nicedreamzwholesale.com/software
We started with one model. Now we ship a roster. Same MLX server, same Anthropic API, swap one env var and you swap the brain.
| | ๐ข Gemma 4 31B | ๐ Llama 3.3 70B โญ | ๐ต Qwen 3.5 122B |
|---|:---:|:---:|:---:|
| Nickname | The Quick One | The Wise One | The Beast |
| Build | 4-bit IT abliterated | 8-bit affine abliterated | 4-bit MoE (A10B) |
| Speed | ~15 tok/s | ~7 tok/s | 65 tok/s ๐ |
| Params | 31 B dense | 71 B dense | 122 B / 10 B active |
| RAM | ~18 GB | ~75 GB | ~75 GB |
| Disk | 18 GB | 75 GB | 65 GB |
| Best at | Daily coding, fits 64 GB Mac | Hardest reasoning, full precision | Max throughput, active sparsity |
| Uploaded by us? | โ | โญ Yes (HF) | โ |
| Launcher | Gemma 4 Code.command | Llama 70B.command | Claude Local.command |
| Min RAM to run | 32 GB | 96 GB | 96 GB |
๐ก Fun fact: Qwen wins raw speed because it's an MoE โ only 10B of 122B params activate per token. Llama 70B is the slowest and the smartest because it's full-precision dense weights. Gemma is the lightweight champ that fits where the others can't.
The Llama 3.3 70B in this lineup isn't from a generic mirror โ we packaged and uploaded our own 8-bit MLX abliterated build to HuggingFace so anyone running this repo can pull it with one command:
MLX_MODEL=divinetribe/Llama-3.3-70B-Instruct-abliterated-8bit-mlx \
bash scripts/start-mlx-server.sh
| | |
|---|---|
| ๐ค HuggingFace | divinetribe/Llama-3.3-70B-Instruct-abliterated-8bit-mlx |
| ๐ Quant | 8-bit affine, group size 64 |
| ๐พ Disk | ~75 GB (15 safetensors shards) |
| ๐ง Params | 71 B dense |
| ๐ Context | 128 K tokens |
| ๐ Abliteration base | huihui-ai abliterated build of Meta's Llama 3.3 70B Instruct (what abliteration means) |
| ๐ MLX conversion + 8-bit pack | by us โ chosen to preserve quality over minimal footprint |
โ ๏ธ Use it responsibly. "Abliterated" suppresses the model's built-in refusal direction so it doesn't refuse benign-but-edgy requests. It is not a general capability upgrade, and you remain bound by the upstream Llama 3.3 license.
Four ways to run the lineup. Each one is a double-clickable launcher in launchers/.
| Mode | What it does | Launcher |
|---|---|---|
| ๐ค Code | Run Claude Code with a local model โ same UX, no API key | Claude Local.command, Gemma 4 Code.command, Llama 70B.command |
| ๐ Browser | Local AI controls real Brave browser via Chrome DevTools | Browser Agent.command |
| ๐ค Hands-Free Voice | Speak in, hear replies in your cloned voice โ full loop, 100% on-device | Narrative Gemma.command + NarrateClaude |
| ๐ฑ Phone | iMessage in โ text/image/video out, full pipeline | ~/.claude/imessage-*.sh |
Your MacBook has a powerful GPU built right into the chip. This project uses that GPU to run massive AI models โ the same kind that power ChatGPT and Claude โ entirely on your computer.
๐ซ No internet needed ๐ฐ No monthly subscription ๐ No one sees your code or data โ Full Claude Code experience โ write code, edit files, manage projects, control your browser, or run a full hands-free voice session where you speak every question and hear every reply in your own cloned voice (both directions on-device)
๐ฑ You (Mac or Phone)
โ
๐ค C
No comments yet. Be the first to share your thoughts!