| | Your Mac | Model | Speed (tok/s = words/sec) | What works | |:---|:---:|:---:|:---:|:---:| | 16 GB MacBook Air | Qwen3.5-4B | 160 tok/s | Chat, coding, tools | | 32+ GB Mac Mini / Studio | Nemotron-Nano 30B | 141 tok/s | 🆕 Fastest 30B, 100% tools | | 32+ GB Mac Mini / Studio | Qwen3.6-35B | 95 tok/s | 256 experts, 262K context | | 64 GB Mac Mini / Studio | Qwen3.5-35B | 83 tok/s | Best balance of smart + fast | | 96+ GB Mac Studio / Pro | Qwen3.5-122B | 57 tok/s | Frontier-level intelligence | | 128+ GB Mac Studio Ultra | 🆕 DeepSeek V4 Flash 158B-A13B | 31-56 tok/s | Day-0 frontier MoE, 1M context |

tok/s (tokens per second) — roughly how many words the AI generates per second. Higher = faster.
4bit / 8bit — compression levels for models. 4bit uses less memory (recommended); 8bit is higher quality.
TTFT (Time To First Token) — how long before the AI starts responding.
Tool calling — the AI can call functions in your code. Used by Cursor, Claude Code, and coding assistants.
OpenAI API compatible — Rapid-MLX speaks the same language as ChatGPT's API, so any app that works with ChatGPT can work with Rapid-MLX by just changing the server address.
Ollama / llama.cpp — other popular tools for running local AI. Rapid-MLX is 2-4x faster on Apple Silicon.

Quick Start

Step 1 — Install (pick one):

# Homebrew (recommended — just works, no Python version issues)
brew install raullenchai/rapid-mlx/rapid-mlx

# pip (requires Python 3.10+ — macOS ships 3.9, so install Python first if needed)
pip install rapid-mlx

# Or one-liner with auto-setup (installs Python if needed)
curl -fsSL https://raullenchai.github.io/Rapid-MLX/install.sh | bash

Rapid-MLX

Quick Start

Related Skills

Popular in Testing

Works With

Agent Harnesses (MHI-tested)

UI / IDE Clients

Model-Harness Index (MHI)