goldenmatch

Name: goldenmatch
Author: benseverndev-oss

Verified

Zero-config entity resolution & record linkage. The zero-tuning Fellegi-Sunter path beats hand-tuned Splink head-to-head and scales from a CSV to a verified 100M-row dedupe in 9.2 min. Fuzzy/exact/probabilistic + PPRL + LLM + identity graph. Python + edge-safe TypeScript (WASM), SQL-native in Postgres & DuckDB, MCP/REST + dbt/Airflow.

125stars

13forks

Python

Installation

# Add to your Claude Code skills
git clone https://github.com/benseverndev-oss/goldenmatch

Getting Started

Guides for using mcp servers skills like goldenmatch.

Security ReportVerified

Last scanned: 6/11/2026

{
  "issues": [],
  "status": "PASSED",
  "scannedAt": "2026-06-11T08:49:58.054Z",
  "npmAuditRan": true,
  "pipAuditRan": true,
  "promptInjectionRan": true
}

README.md

Frequently Asked Questions

What is goldenmatch?

goldenmatch is an open-source mcp servers skill for AI coding assistants such as Claude Code, Codex CLI, and ChatGPT, built by benseverndev-oss. Zero-config entity resolution & record linkage. The zero-tuning Fellegi-Sunter path beats hand-tuned Splink head-to-head and scales from a CSV to a verified 100M-row dedupe in 9.2 min. Fuzzy/exact/probabilistic + PPRL + LLM + identity graph. Python + edge-safe TypeScript (WASM), SQL-native in Postgres & DuckDB, MCP/REST + dbt/Airflow. It has 125 GitHub stars.

Is goldenmatch safe to use?

Yes. goldenmatch passed SkillsLLM's automated security scan — a dependency vulnerability audit plus prompt-injection heuristics — with no high-severity issues. You can read the full report in the Security Report section on this page.

How do I install goldenmatch?

Clone the repository with "git clone https://github.com/benseverndev-oss/goldenmatch" and add it to your Claude Code skills directory (see the Installation section above).

What programming language is goldenmatch written in?

goldenmatch is primarily written in Python. It is open-source under benseverndev-oss on GitHub, so you can review or fork the full source.

Are there alternatives to goldenmatch?

Yes. SkillsLLM lists many other MCP Servers skills you can browse and compare side by side. Open the MCP Servers category from the badge at the top of this page, or use the Related Skills and comparison links further down to weigh goldenmatch against similar tools.

MCP for Beginners

Build MCP servers that give AI assistants real capabilities

36 minBeginner

Comments (0)

to leave a comment.

No comments yet. Be the first to share your thoughts!

Related Skills

ECC

by affaan-m

The agent harness performance optimization system. Skills, instincts, memory, security, and research-first development for Claude Code, Codex, Opencode, Cursor and beyond.

233,400

Popular in MCP Servers

Top skills in this category by stars

Scrapling

by D4Vinci

🕷️ An adaptive Web Scraping framework that handles everything from a single request to a full-scale crawl!

71,275

Developers Also Liked

Based on votes and bookmarks from developers who liked this skill

awesome-nsfw-ai

by ristponex

🔞 Curated list of uncensored AI models for NSFW content generation — video, image, and text models with API access.

multi-llm-mcp gopher-mcp

MCP Serversdata-cleaningdata-engineeringdata-matchingdata-qualitydeduplicationentity-resolutionfellegi-sunterfuzzy-matchingknowledge-graphllmmaster-data-managementmcp-serverpolarspprlpythonrecord-linkagerustsplinktypescriptzero-config

AI Agentsai-agentsanthropic

hermes-agent

by NousResearch

The agent that grows with you

220,566

41,997

Python

AI Agentsaiai-agent

View details

Compare

n8n

by n8n-io

Fair-code workflow automation platform with native AI capabilities. Combine visual building with custom code, self-host or cloud, 400+ integrations.

198,027

59,632

TypeScript

MCP Serversaiapis

View details

Compare

everything-claude-code

by affaan-m

The agent harness performance optimization system. Skills, instincts, memory, security, and research-first development for Claude Code, Codex, Opencode, Cursor and beyond.

185,940

28,768

JavaScript

AI Agentsai-agentsanthropic

View details

Compare

cc-switch

by farion1231

A cross-platform desktop All-in-One assistant for Claude Code, Codex, OpenCode, OpenClaw, Grok Build & Hermes Agent. Only official website: ccswitch.io

121,205

8,146

Rust

AI Agentsai-toolsclaude-code

An open-source AI agent that brings the power of Gemini directly into your terminal.

106,181

14,314

TypeScript

AI Agentsaiai-agents

View details

Compare

MCP Serversaiai-scraping

TrendRadar

by sansan0

⭐AI-driven public opinion & trend monitor with multi-platform aggregation, RSS, and smart alerts.🎯 告别信息过载，你的 AI 舆情监控助手与热点筛选工具！聚合多平台热点 + RSS 订阅，支持关键词精准筛选。AI 智能筛选新闻 + AI 翻译 + AI 分析简报直推手机，也支持接入 MCP 架构，赋能 AI 自然语言对话分析、情感洞察与趋势预测等。支持 Docker ，数据本地/云端自持。集成微信/飞书/钉钉/Telegram/邮件/ntfy/bark/slack 等渠道智能推送。

60,883

24,807

Python

MCP Serversaibark

View details

API Integrationadult-aiadult-content

cc-gateway

by motiful

AI API identity gateway — reverse proxy that normalizes device fingerprints and telemetry for privacy-preserving API proxying

2,982

505

TypeScript

API Integrationanthropicapi-gateway

View details

claude-skill-social-post

by Hao0321

A Claude Code skill by Hao (駱君昊) that learns your Facebook voice and auto-posts to FB / IG / Threads / X with a 14-day content calendar. Mega-viral validated: 80K reach / 448 likes / 500 comments on first post. Includes Day 2 flop postmortem.

570

128

AI Agentsai-agentaigc

View details

facebook-ads-library-mcp

by talknerdytome-labs

MCP Server for Facebook ADs Library - Get instant answers from FB's ad library

192

Python

MCP Serversaianalytics

View details

Bloom

by Li-Evan

Hire a private AI tutor for anything — it reads how you actually learn and teaches the next lesson just for you. Bloom's 2-Sigma research as a Claude Code skill + self-hostable web app · 中文优先苏格拉底式 AI 家教

209

Python

AI Agentsadaptive-learningagent-skills

View details

Scrapling

by D4Vinci

🕷️ An adaptive Web Scraping framework that handles everything from a single request to a full-scale crawl!

71,275

7,072

Python

MCP Serversaiai-scraping

View details

Golden Suite

Zero-config entity resolution that scales — dedupe & match messy records from a laptop CSV to 100M+ rows. No training data, no tuning.

The headline package, GoldenMatch, does the matching — fuzzy + exact + probabilistic (Fellegi-Sunter) + LLM — and beats hand-tuned Splink out of the box (96.4% F1 on DBLP-ACM), identical in Python, edge-safe TypeScript, and SQL. It even runs on unstructured input: extract records from PDFs and images, then dedupe. Around it sits a full data-quality suite — Check, Flow, Analysis, Pipe, InferMap — with a Rust layer for Postgres / DuckDB and optional WebAssembly acceleration behind the TS ports.

Made for GraphRAG, too — entity resolution is the stage knowledge-graph pipelines do worst (the same entity scatters across documents as duplicate surface forms). GoldenMatch drops into neo4j-graphrag / LlamaIndex / Graphiti as the resolution stage (goldenmatch-kg), or builds a KG straight from text with that resolution at its core (goldengraph). → Knowledge graphs

Verified at scale: 100,000,000 records deduped in 9.2 min on a Ray cluster — recall-complete across any partitioning, 0.36 GB driver footprint.

# Dedupe a CSV in 30 seconds — zero config, writes <timestamp>_golden.csv.
# Add --tui to review interactively, --output-all for every artifact.
pip install goldenmatch && goldenmatch dedupe customers.csv

# From Python — zero-config, returns golden records
python -c "import goldenmatch as gm; gm.dedupe('customers.csv').golden.write_csv('deduped.csv')"

npm install goldenmatch     # TypeScript / edge runtimes
pip install golden-suite    # the WHOLE suite (Check + Flow + Match + Analysis + Pipe + InferMap) + native

v3.5.0 — New date scorer for date fields (#1858). jaro_winkler scores unrelated ISO birthdays 0.80+ (the fixed YYYY-MM-DD shape + shared digit alphabet dominate), so it can't tell a typo from a different person. The date scorer compares dates by Damerau-Levenshtein over the canonical digits — a typo scores 0.90, an unrelated date 0.00 — with a levenshtein fallback for non-ISO input. Cross-surface (Python, native kernel, TypeScript), and a preflight check warns when a name-oriented scorer sits on a date field.

v3.4.0 — Embeddings are first-class on Fellegi-Sunter matchkeys. embedding and record_embedding field scorers now train (EM) and score end-to-end on the probabilistic path via the vectorized matrix — previously they raised Unknown scorer on both training and scoring. They are matrix-only, so a matchkey carrying one always runs vectorized, and the TUI now routes FS through the same native/vectorized selector.

v3.3.0 — 3.3.0 — negative evidence on Fellegi-Sunter matchkeys. negative_evidence now works on type: probabilistic matchkeys as EM-learned __ne__ dimensions (no labels needed; penalty_bits as a fixed override), and the Splink migration upgrade pass gains a fan-out lever — a risk-gated NE suggestion plus cluster-guard tuning from your reference clusters. goldenmatch-native 0.1.15 scores NE in the Rust kernels (FS_SUPPORTS_NE; older wheels keep the pure-Python fallback automatically).

Why a suite?

Each tool stands alone, but they compose into a single pipeline:

flowchart LR
    raw([raw rows])
    golden([golden records])

    subgraph orchestration ["GoldenPipe orchestrates"]
        direction LR
        infermap[InferMap]
        goldencheck[GoldenCheck]
        goldenflow[GoldenFlow]
        goldenmatch[GoldenMatch]
        infermap --> goldencheck --> goldenflow --> goldenmatch
    end

    raw --> infermap
    goldenmatch --> golden

Step	Role
InferMap	schema mapping — auto-aligns columns across heterogeneous sources
GoldenCheck	profile + validate — encoding, format, anomaly detection
GoldenFlow	standardize + transform — phone, date, address, categorical normalization
GoldenMatch	dedupe + cluster + survivorship — fuzzy / exact / probabilistic / LLM
GoldenAnalysis	analysis + reporting — one exportable report over any stage, plus cross-run regression detection
GoldenPipe	orchestrator — declarative YAML pipeline wiring the steps

What sets it apart:

Zero-config that beats hand-tuned. 96.4% F1 on DBLP-ACM out of the box; the opt-in Fellegi-Sunter engine beats expert-tuned Splink head-to-head on every dataset Splink scores (historical_50k pairwise F1 0.778 vs 0.757, cluster B³ 0.844 vs 0.789; one shared evaluator, reproducible bake-off). Every step self-verifies (preflight + postflight) and returns an inspectable report instead of failing silently.
A healing loop, not a one-shot. Zero-config gets you most of the way; the healer attaches ranked, self-verified config tweaks and closes the gap to expert-tuned without you being the expert. ↓ details
Durable identity. Learning Memory persists corrections across runs (re-anchored across row reorders); the Identity Graph gives stable entity_ids that survive re-runs, an append-only event log, and create / absorb / merge / split semantics on CLI, REST, MCP, and SQL.
Privacy-preserving record linkage — match across organizations without sharing raw data (PPRL, 92.4% F1 on FEBRL4).
AI-native by design — every package ships an MCP server, a REST API, and an A2A agent surface (70+ MCP tools across the suite), all exposing the same JSON telemetry shape across web, TUI, CLI, Postgres, DuckDB, and MCP.
Polyglot parity, edge-safe, optional native speed. The full suite ships on npm alongside PyPI; Python and TypeScript track the same outputs to 4-decimal precision. The TS cores are dependency-free and node:*-free (browsers, Cloudflare Workers, Vercel Edge, Deno); an opt-in WebAssembly backend (await enableWasm()) swaps in the same pyo3-free Rust kernels the Python wheels and SQL UDFs use,