FunASR

Name: FunASR
Author: modelscope

Verified

Industrial-grade speech recognition toolkit: 170x realtime, 50+ languages, speaker diarization, emotion detection, streaming, and OpenAI-compatible API.

19,078stars

1,916forks

Python

Installation

# Add to your Claude Code skills
git clone https://github.com/modelscope/FunASR

Getting Started

Guides for using mcp servers skills like FunASR.

Best MCP Servers in 2026
Category-by-category picks: databases, dev tools, productivity, browser automation.
What is an AI Skills Marketplace?
Definitions, how marketplaces work, and how to choose between them in 2026.
Getting Started with AI Skills
First-time install walkthrough for Claude Code, Codex CLI, and ChatGPT.

Security ReportVerified

Last scanned: 5/25/2026

{
  "issues": [],
  "status": "PASSED",
  "scannedAt": "2026-05-25T08:20:43.936Z",
  "semgrepRan": false,
  "npmAuditRan": true,
  "pipAuditRan": true
}

README.md

Frequently Asked Questions

What is FunASR?

FunASR is an open-source mcp servers skill for AI coding assistants such as Claude Code, Codex CLI, and ChatGPT, built by modelscope. Industrial-grade speech recognition toolkit: 170x realtime, 50+ languages, speaker diarization, emotion detection, streaming, and OpenAI-compatible API. It has 19,078 GitHub stars.

Is FunASR safe to use?

Yes. FunASR passed SkillsLLM's automated security scan — a dependency vulnerability audit plus prompt-injection heuristics — with no high-severity issues. You can read the full report in the Security Report section on this page.

How do I install FunASR?

Clone the repository with "git clone https://github.com/modelscope/FunASR" and add it to your Claude Code skills directory (see the Installation section above).

What programming language is FunASR written in?

FunASR is primarily written in Python. It is open-source under modelscope on GitHub, so you can review or fork the full source.

Are there alternatives to FunASR?

Yes. SkillsLLM lists many other MCP Servers skills you can browse and compare side by side. Open the MCP Servers category from the badge at the top of this page, or use the Related Skills and comparison links further down to weigh FunASR against similar tools.

MCP for Beginners

Build MCP servers that give AI assistants real capabilities

36 minBeginner

Comments (0)

to leave a comment.

No comments yet. Be the first to share your thoughts!

Related Skills

n8n

by n8n-io

Fair-code workflow automation platform with native AI capabilities. Combine visual building with custom code, self-host or cloud, 400+ integrations.

195,751

Developers Also Liked

Based on votes and bookmarks from developers who liked this skill

google_workspace_mcp

by taylorwilsdon

Control Gmail, Google Calendar, Docs, Sheets, Slides, Chat, Forms, Tasks, Search & Drive with AI - Comprehensive Google Workspace / G Suite MCP Server & CLI Tool

2,812

code-review-graph mcp-for-beginners

MCP Serversasraudiochineseemotion-recognitionfunasrmcp-servermultilingual-asropenai-compatible-apiparaformerpunctuationpytorchreal-time-asrspeaker-diarizationspeech-recognitionspeech-to-textstreaming-asrtranscriptionvllmvoice-activity-detectionwhisper-alternative

gemini-cli

by google-gemini

An open-source AI agent that brings the power of Gemini directly into your terminal.

105,854

14,225

TypeScript

AI Agentsaiai-agents

View details

Compare

Scrapling

by D4Vinci

🕷️ An adaptive Web Scraping framework that handles everything from a single request to a full-scale crawl!

68,783

6,811

Python

MCP Serversaiai-scraping

View details

Compare

ruflo

by ruvnet

🌊 The leading agent meta-harness. Deploy intelligent multi-player swarms, coordinate autonomous workflows, and build conversational AI systems. Features adaptive memory, self-learning intelligence, RAG integration, and native Claude Code / Codex / Hermes and many more Integrated

63,625

7,504

TypeScript

AI Agentsagentic-aiagentic-framework

View details

Compare

worldmonitor

by koala73

Real-time global intelligence dashboard. AI-powered news aggregation, geopolitical monitoring, and infrastructure tracking in a unified situational awareness interface

61,601

9,592

TypeScript

AI Agentsagentai

View details

Compare

TrendRadar

by sansan0

⭐AI-driven public opinion & trend monitor with multi-platform aggregation, RSS, and smart alerts.🎯 告别信息过载，你的 AI 舆情监控助手与热点筛选工具！聚合多平台热点 + RSS 订阅，支持关键词精准筛选。AI 智能筛选新闻 + AI 翻译 + AI 分析简报直推手机，也支持接入 MCP 架构，赋能 AI 自然语言对话分析、情感洞察与趋势预测等。支持 Docker ，数据本地/云端自持。集成微信/飞书/钉钉/Telegram/邮件/ntfy/bark/slack 等渠道智能推送。

60,391

24,769

Python

MCP Serversaibark

View details

Compare

MCP Serversaig-suite

UltraRAG

by OpenBMB

UltraRAG: A Low-Code MCP Framework for Building Complex and Innovative RAG Pipelines

2,454

210

Python

MCP Serversbge-m3deepseek

View details

ghidra-mcp

by bethington

Ghidra MCP Server — 200+ MCP tools for AI-powered reverse engineering. GUI plugin + headless server, lazy tool loading, convention enforcement, batch operations, Ghidra Server integration, and Docker deployment.

2,745

Java

MCP Serversaibinary-analysis

View details

excel-mcp-server

by haris-musa

A Model Context Protocol server for Excel file manipulation

3,999

442

Python

MCP Serversaiautomation

View details

deep-research

by u14app

Use any LLMs (Large Language Models) for Deep Research. Support SSE API and MCP server.

4,631

1,054

JavaScript

MCP Serversanthropicdeep-research

View details

(简体中文|English|日本語|한국어)

Quick Start

No local setup? Open the Colab quickstart to transcribe a public sample or upload your own audio in a browser.

pip install torch torchaudio
pip install funasr

Flagship model — Fun-ASR-Nano (LLM-ASR, 31 languages; the default recommendation, needs a GPU):

from funasr import AutoModel

model = AutoModel(model="FunAudioLLM/Fun-ASR-Nano-2512", device="cuda")
result = model.generate(input="https://isv-data.oss-cn-hangzhou.aliyuncs.com/ics/MaaS/ASR/test_audio/asr_example_zh.wav")
print(result[0]["text"])
# 欢迎大家来体验达摩院推出的语音识别模型。

On CPU (or for multilingual + emotion in one pass), use SenseVoice — which also returns speaker diarization and timestamps:

from funasr import AutoModel
from funasr.utils.postprocess_utils import rich_transcription_postprocess

model = AutoModel(model="iic/SenseVoiceSmall", vad_model="fsmn-vad", spk_model="cam++", device="cuda")  # use device="cpu" if you don't have a GPU
result = model.generate(
    input="https://isv-data.oss-cn-hangzhou.aliyuncs.com/ics/MaaS/ASR/test_audio/asr_example_zh.wav",
    batch_size_s=300,
)

# One call returns VAD segments with speaker id + timestamps — render them however you like:
for seg in result[0]["sentence_info"]:
    print(f"[{seg['start']/1000:.1f}s] Speaker {seg['spk']}: {rich_transcription_postprocess(seg['sentence'])}")

Output — structured text with speaker labels, timestamps, and punctuation:

[0.6s] Speaker 0: 欢迎大家来体验达摩院推出的语音识别模型

That's it. One model, one call — VAD segmentation, speech recognition, punctuation, speaker diarization all happen automatically.

Scale & deploy the flagship

At scale, accelerate Fun-ASR-Nano with vLLM (batch processing):

from funasr.auto.auto_model_vllm import AutoModelVLLM

model = AutoModelVLLM(model="FunAudioLLM/Fun-ASR-Nano-2512", tensor_parallel_size=1)
results = model.generate(["audio1.wav", "audio2.wav"], language="auto")

Deploy as API server: funasr-server --device cuda → OpenAI-compatible endpoint at localhost:8000

Use with AI agents: MCP Server for Claude/Cursor · OpenAI API for LangChain/Dify/AutoGen

Why FunASR?

Whisper is a single model; FunASR is a toolkit — you pick the right model per job: Fun-ASR-Nano (flagship LLM-ASR, GPU, 340x realtime with vLLM, 31 languages), SenseVoice (CPU-friendly, + emotion & audio events), Paraformer (low-latency streaming). The table shows what the toolkit delivers vs one Whisper model — each capability is labelled with the model that provides it:

	FunASR (toolkit)	Whisper	Cloud APIs
Top speed	340x realtime (Fun-ASR-Nano + vLLM)	13x realtime	~1x realtime
Speaker ID	✅ Built-in	❌ Needs pyannote	✅ Extra cost
Emotion	✅ via SenseVoice	❌	❌
Languages	50+ (Qwen3-ASR 52, Nano 31)	57	Varies
Streaming	✅ WebSocket (Paraformer)	❌	✅
CPU viable	✅ 17x realtime (SenseVoice)	❌ Too slow	N/A
Self-hosted	✅ MIT license	✅ MIT license	❌ Cloud only
Cost	Free	Free	$0.006/min+

Trying FunASR for the first time? Use the Colab quickstart before setting up a local environment. Choosing a first model? Start with the model selection guide. Planning a switch from Whisper or a cloud ASR provider? Use the migration guide and benchmark example to test representative audio, map features, and roll out safely.

Installation

pip install funasr

git clone https://github.com/modelscope/FunASR.git && cd FunASR
pip install -e ./

Requirements: Python ≥ 3.8. Install PyTorch + torchaudio first (pytorch.org), then pip install funasr.

Model Zoo

Model	Task	Languages	Params	Links
Fun-ASR-Nano	ASR + timestamps	31 languages	800M	⭐ 🤗
SenseVoiceSmall	ASR + emotion + events	zh/en/ja/ko/yue	234M	⭐ 🤗
Paraformer-zh	ASR + timestamps	zh/en	220M	⭐ 🤗
Paraformer-zh-streaming	Streaming ASR	zh/en	220M	⭐ 🤗
Qwen3-ASR	ASR, 52 languages	multilingual	1.7B	usage
GLM-ASR-Nano	ASR, 17 languages	multilingual	1.5B	usage
Whisper-large-v3	ASR + translation	multilingual	1550M	usage
Whisper-large-v3-turbo	ASR + translation	multilingual	809M	usage
ct-punc	Punctuation	zh/en	290M	⭐ 🤗
fsmn-vad	VAD	zh/en	0.4M	⭐ 🤗
cam++	Speaker diarization	—	7.2M	⭐ 🤗
emotion2vec+large	Emotion recognition	—	300M	⭐ 🤗

Usage

Full examples with parameter docs: Tutorial →

from funasr import AutoModel

# Chinese production (VAD + ASR + punctuation + speaker)
model = AutoModel(model="paraformer-zh", vad_model="fsmn-vad", punc_model="ct-punc", spk_model="cam++", device="cuda")
result = model.generate(input="https://isv-data.oss-cn-hangzhou.aliyuncs.com/ics/MaaS/ASR/test_audio/asr_example_zh.wav", hotword="关键词 20")


# Streaming real-time (feed audio chunk by chunk)
import soundfile as sf
model = AutoModel(model="paraformer-zh-streaming", device="cuda")
audio, sr = sf.read("speech.wav", dtype="float32")   # 16 kHz mono
chunk_size = [0, 10, 5]                               # 600 ms chunks
chunk_stride = chunk_size[1] * 960
cache = {}
n_chunks = (len(audio) - 1) // chunk_stride + 1
for i in range(n_chunks):
    chunk = audio[i * chunk_stride : (i + 1) * chunk_stride]
    res = model.generate(input=chunk, cache=cache, is_final=(i == n_chunks - 1),
                         chunk_size=chunk_size, encoder_chunk_look_back=4, decoder_chunk_look_back=1)
    if res[0]["text"]:
        print(res[0]["text"], end="", flush=True)

# Emotion recognition
model = AutoModel(model="emotion2vec_plus_large", device="cuda")
result = model.generate(input="audio.wav", granularity="utterance")

CLI (Agent-Friendly)

# Transcribe audio (simplest)
funasr audio.wav

# JSON output (for AI agents)
funasr audio.wav --output-format json

# SRT subtitles
funasr audio.wav --output-format srt --output-dir ./subs

# Speaker diarization + timestamps
funasr audio.wav --spk --timestamps -f json

# Choose model and language
funasr audio.wav --model paraformer --language zh

# Batch transcribe
funasr *.w