by sysprog21
A linguistic linter for Traditional Chinese (zh-TW)
# Add to your Claude Code skills
git clone https://github.com/sysprog21/zhtw-mcpA linguistic linter for Traditional Chinese (zh-TW) that enforces Taiwan Ministry of Education (MoE) standards on vocabulary, punctuation, and character shapes. It plugs into AI coding assistants through the Model Context Protocol (MCP) and catches Mainland Chinese (zh-CN) regional drift before it reaches the user.
The tool enforces three official Taiwan standards:
Over 1000 vocabulary rules and 15 casing rules are compiled into the binary. For ambiguous terms, the server asks the AI assistant it runs inside for help deciding -- no extra API keys required.
In the late Qing dynasty, scholars had to express Western concepts in a writing system with no native vocabulary for them. Whether coining new words or importing translations via Japanese (和製漢語), they assembled a literary system under enormous time pressure. Many translated terms were inconsistent, ambiguous, or contradictory. The Chinese-speaking world has lived with these deficiencies for over a century.
The PRC simplification effort reduced not just stroke counts but vocabulary precision. Terms that should vary by domain got flattened into single catch-all translations. Many PRC translations were coined hastily: if a term worked in one context, it spread uncritically to others.
No comments yet. Be the first to share your thoughts!
AI language models learn from web text where Simplified Chinese vastly outweighs Traditional Chinese (roughly 2.6:1 in CC-100). Major datasets like CulturaX do not even track Traditional Chinese separately. A FAccT 2025 study confirmed that most models favor zh-CN terminology when asked to write zh-TW. The output looks plausible but is not how people in Taiwan actually write.
This goes beyond character conversion. The same word often means different things across the strait:
| English | zh-CN | zh-TW | Why it matters | |---------|-------|-------|----------------| | concurrency | 並發 | 並行 | In zh-CN, 並行 means "parallel" -- a different concept entirely | | parallel | 並行 | 平行 | zh-CN 並行 = "parallel"; in Taiwan, 並行 = "concurrent" | | process (OS) | 進程 | 行程 | 進程 in Taiwan means "progress," not an OS process | | file / document | 文件 / 文檔 | 檔案 / 文件 | 文件 in China = "file"; in Taiwan = "document" | | render | 渲染 | 算繪 | 渲染 in Taiwan = "exaggerate" (a painting technique) ...