Name: vmlx
Author: jjang-ai

JANG 2-bit destroys MLX 4-bit on MiniMax M2.5:

| Quantization | MMLU (200q) | Size | |---|---|---| | JANG_2L (2-bit) | 74% | 89 GB | | MLX 4-bit | 26.5% | 120 GB | | MLX 3-bit | 24.5% | 93 GB | | MLX 2-bit | 25% | 68 GB |

Adaptive mixed-precision keeps critical layers at higher precision. Scores at jangq.ai. Models at JANGQ-AI.

Quickstart

Install from PyPI

Published on PyPI as vmlx -- install and run in one command:

# Recommended: uv (fast, no venv hassle)
brew install uv
uv tool install vmlx
vmlx serve mlx-community/Qwen3-8B-4bit

# Or: pipx (isolates from system Python)
brew install pipx
pipx install vmlx
vmlx serve mlx-community/Qwen3-8B-4bit

# Or: pip in a virtual environment
python3 -m venv ~/.vmlx-env && source ~/.vmlx-env/bin/activate
pip install vmlx
vmlx serve mlx-community/Qwen3-8B-4bit

Note: On macOS 14+, bare pip install fails with "externally-managed-environment". Use uv, pipx, or a venv.

The vMLX inference server is now running at http://0.0.0.0:8000 with an OpenAI + Anthropic compatible API. Works with any model from mlx-community -- thousands of models ready to go.

Or download the desktop app

Get MLX Studio -- a native macOS app with chat UI, model management, image generation, and developer tools. No terminal required. Just download the DMG and drag to Applications.

vmlx

Quickstart

Install from PyPI

Or download the desktop app

Related Skills

Use with OpenAI SDK

Use with Anthropic SDK

Use with curl

Model Support

Features

Inference Engine

Distributed Inference (Multi-Mac)

5-Layer Cache Architecture

Tool Calling

Reasoning / Thinking Mode

A