llama.cpp

Name: llama.cpp
Author: ggml-org

by ggml-org

Verified

LLM inference in C/C++

103,839stars

16,880forks

C++

Added 4/16/2026

Installation

# Add to your Claude Code skills
git clone https://github.com/ggml-org/llama.cpp

Getting Started

Guides for using ai agents skills like llama.cpp.

Caveman: Cut Claude Token Use by 65%
How agent-side prompt compression works, when to use it, and when not to.
What is an AI Skills Marketplace?
Definitions, how marketplaces work, and how to choose between them in 2026.
Getting Started with AI Skills

Security ReportVerified

Last scanned: 4/16/2026

{
  "issues": [],
  "status": "PASSED",
  "scannedAt": "2026-04-16T06:02:18.611Z",
  "semgrepRan": false,
  "npmAuditRan": true,
  "pipAuditRan": false
}

README.md

Frequently Asked Questions

What is llama.cpp?

llama.cpp is an open-source ai agents skill for AI coding assistants such as Claude Code, Codex CLI, and ChatGPT, built by ggml-org. LLM inference in C/C++. It has 103,839 GitHub stars.

Is llama.cpp safe to use?

Yes. llama.cpp passed SkillsLLM's automated security scan — a dependency vulnerability audit plus prompt-injection heuristics — with no high-severity issues. You can read the full report in the Security Report section on this page.

How do I install llama.cpp?

Clone the repository with "git clone https://github.com/ggml-org/llama.cpp" and add it to your Claude Code skills directory (see the Installation section above).

What programming language is llama.cpp written in?

llama.cpp is primarily written in C++. It is open-source under ggml-org on GitHub, so you can review or fork the full source.

Are there alternatives to llama.cpp?

Yes. SkillsLLM lists many other AI Agents skills you can browse and compare side by side. Open the AI Agents category from the badge at the top of this page, or use the Related Skills and comparison links further down to weigh llama.cpp against similar tools.

Agentic AI for Beginners

Build your first AI agent from scratch - tool use, ReAct pattern, memory, deployment

41 minBeginner

Comments (0)

to leave a comment.

No comments yet. Be the first to share your thoughts!

Related Skills

superpowers

by obra

An agentic skills framework & software development methodology that works.

234,966

gemini-cli caveman

First-time install walkthrough for Claude Code, Codex CLI, and ChatGPT.

AI Agentsaibrainstorming

ECC

by affaan-m

The agent harness performance optimization system. Skills, instincts, memory, security, and research-first development for Claude Code, Codex, Opencode, Cursor and beyond.

229,808

35,174

JavaScript

AI Agentsai-agentsanthropic

The agent that grows with you

214,982

40,020

Python

AI Agentsaiai-agent

View details

Compare

everything-claude-code

by affaan-m

The agent harness performance optimization system. Skills, instincts, memory, security, and research-first development for Claude Code, Codex, Opencode, Cursor and beyond.

185,940

28,768

JavaScript

AI Agentsai-agentsanthropic

View details

Compare

claude-code

by anthropics

Claude Code is an agentic coding tool that lives in your terminal, understands your codebase, and helps you code faster by executing routine tasks, explaining complex code, and handling git workflows - all through natural language commands.

120,031

19,897

Shell

AI Agents

View details

Compare

cc-switch

by farion1231

A cross-platform desktop All-in-One assistant for Claude Code, Codex, OpenCode, OpenClaw, Gemini CLI & Hermes Agent. Only official website: ccswitch.io

117,311

7,853

Rust

AI Agentsai-toolsclaude-code

View details

Compare

llama.cpp

llama

Manifesto / ggml / ops

LLM inference in C/C++

Recent API changes

Hot topics

Hugging Face cache migration: models downloaded with -hf are now stored in the standard Hugging Face cache directory, enabling sharing with other HF tools.
guide : using the new WebUI of llama.cpp
guide : running gpt-oss with llama.cpp
[FEEDBACK] Better packaging for llama.cpp to support downstream consumers 🤗
Support for the gpt-oss model with native MXFP4 format has been added | PR | Collaboration with NVIDIA | Comment
Multimodal support arrived in llama-server: #12898 | documentation
VS Code extension for FIM completions: https://github.com/ggml-org/llama.vscode
Vim/Neovim plugin for FIM completions: https://github.com/ggml-org/llama.vim
Hugging Face Inference Endpoints now support GGUF out of the box! https://github.com/ggml-org/llama.cpp/discussions/9669
Hugging Face GGUF editor: discussion | tool

Quick start

Getting started with llama.cpp is straightforward. Here are several ways to install it on your machine:

Install llama.cpp using brew, nix or winget
Run with Docker - see our Docker documentation
Download pre-built binaries from the releases page
Build from source by cloning this repository - check out our build guide

Once installed, you'll need a model to work with. Head to the Obtaining and quantizing models section to learn more.

Example command:

# Use a local model file
llama-cli -m my_model.gguf

# Or download and run a model directly from Hugging Face
llama-cli -hf ggml-org/gemma-3-1b-it-GGUF

# Launch OpenAI-compatible API server
llama-server -hf ggml-org/gemma-3-1b-it-GGUF

Description

The main goal of llama.cpp is to enable LLM inference with minimal setup and state-of-the-art performance on a wide range of hardware - locally and in the cloud.

Plain C/C++ implementation without any dependencies
Apple silicon is a first-class citizen - optimized via ARM NEON, Accelerate and Metal frameworks
AVX, AVX2, AVX512 and AMX support for x86 architectures
RVV, ZVFH, ZFH, ZICBOP and ZIHINTPAUSE support for RISC-V architectures
1.5-bit, 2-bit, 3-bit, 4-bit, 5-bit, 6-bit, and 8-bit integer quantization for faster inference and reduced memory use
Custom CUDA kernels for running LLMs on NVIDIA GPUs (support for AMD GPUs via HIP and Moore Threads GPUs via MUSA)
Vulkan and SYCL backend support
CPU+GPU hybrid inference to partially accelerate models larger than the total VRAM capacity

The llama.cpp project is the main playground for developing new features for the ggml library.

Typically finetunes of the base models below are supported as well.

Instructions for adding support for new models: HOWTO-add-model.md

llama.cpp

Frequently Asked Questions

What is llama.cpp?

Is llama.cpp safe to use?

How do I install llama.cpp?

What programming language is llama.cpp written in?

Are there alternatives to llama.cpp?

Related Skills

llama.cpp

Recent API changes

Hot topics

Quick start

Description

Text-only

Multimodal