by mohitsoni48
Run any local LLM engine, auto-tuned to your GPU — polished web UI + OpenAI/Anthropic-compatible API. Point Claude Code at your own machine in one command. No Electron, no Python, offline-first.
# Add to your Claude Code skills
git clone https://github.com/mohitsoni48/TurboLLMGuides for using api integration skills like TurboLLM.
TurboLLM is an open-source api integration skill for AI coding assistants such as Claude Code, Codex CLI, and ChatGPT, built by mohitsoni48. Run any local LLM engine, auto-tuned to your GPU — polished web UI + OpenAI/Anthropic-compatible API. Point Claude Code at your own machine in one command. No Electron, no Python, offline-first. It has 51 GitHub stars.
TurboLLM's catalog security scan is still queued. You can run an instant dependency and prompt-injection check now with the "Scan for vulnerabilities" button above.
Clone the repository with "git clone https://github.com/mohitsoni48/TurboLLM" and add it to your Claude Code skills directory (see the Installation section above).
TurboLLM is primarily written in TypeScript. It is open-source under mohitsoni48 on GitHub, so you can review or fork the full source.
Yes. SkillsLLM lists many other API Integration skills you can browse and compare side by side. Open the API Integration category from the badge at the top of this page, or use the Related Skills and comparison links further down to weigh TurboLLM against similar tools.
No comments yet. Be the first to share your thoughts!
Top skills in this category by stars
Unlocks once the catalog security scan passes (runs nightly).
The deep catalog scan for this skill is still queued. Run an instant dependency check now instead.
npx turbollm
That one command starts a local daemon, opens a browser UI, and serves your models over an API any tool can talk to. TurboLLM is the performance & bleeding-edge layer for local LLMs — built for people who today hand-compile forks and hunt forums for the right flags.
Local-LLM tools make two choices for you, and both cost you performance:
-c, -ngl, --n-cpu-moe, KV type, threads, flash-attn, draft models) that make
the difference between 20 and 80 tokens/sec on the same hardware.TurboLLM does the opposite:
llama-server-compatible binary — a
build you compiled, a community fork, or the one it auto-provisions for your GPU. It probes
the binary's real capabilities and adapts the UI to them. This is the whole point.Same GPU (RTX 5070 Ti 16 GB), same model, same 200K context — measured generation speed. TurboLLM is faster than LM Studio on the very same official llama.cpp, and faster still when you run a community fork LM Studio can't.
① On official llama.cpp, TurboLLM is faster. It auto-provisions a GPU-native engine build (CUDA 13 for Blackwell here) and tunes expert-offload to the layer, so at the same KV-cache quant it beats LM Studio's bundled runtime:
| Qwen3.6-35B-A3B · 200K | TurboLLM | LM Studio | Speed-up |
|---|---|---|---|
official llama.cpp — q4_0 |
74.7 t/s | 61.0 t/s | 1.2× |
official llama.cpp — q8_0 |
72.3 t/s | ~66 t/s* | 1.1× |
② Run a faster engine and pull far ahead. Because TurboLLM runs any engine, you can drop in
the TurboQuant fork — a llama.cpp fork with a low-bit turbo4 KV cache that LM Studio simply
can't load — in one click. On a large-KV model it delivers q8_0-level quality at more than
double the speed:
| Qwen3.6-27B · 200K · matched quality | TurboLLM + TurboQuant | LM Studio | Speed-up |
|---|---|---|---|
turbo4 vs q8_0 |
24.6 t/s | 11.4 t/s | 2.2× |
Same run, 1.7× faster prefill too (1288 vs 757 tok/s).
*LM Studio's q8_0 mildly spilled VRAM at its best offload. A low-bit KV cache helps most
when the cache is large; TurboLLM's auto-tuner and on-screen measured t/s pick the fastest engine +
config for each model, so you don't have to.
The headline — running any engine, including community forks — has its own section below. Everything else is grouped here; each summary is the gist, expand for the detail:
.gguf link (model-author
sites, mirrors, private servers); it disk-space-checks and downloads through the same manager.temperature / top_k / top_p / min_p. No recommendation → your sampling is left untouched.-ngl), MoE CPU-offload (--n-cpu-moe), parallel slots, KV-cache quant type (incl.
low-bit on supporting forks), CPU threads, flash attention, and speculative decoding (NextN /
MTP / draft).web_search (Tavily), fetch_url, and sandboxed run_code,