by dcostenco
Persistent memory + local AI for coding agents. 1.7B–32B open-weight LLM fleet, cross-session Mind Palace, cognitive routing, L3 grounding verifier, multi-agent Hivemind. Works with Claude Code, Cursor, VS Code. Offline-first, HIPAA-ready. Free tier included.
# Add to your Claude Code skills
git clone https://github.com/dcostenco/prism-coderLast scanned: 5/30/2026
{
"issues": [],
"status": "PASSED",
"scannedAt": "2026-05-30T16:14:50.114Z",
"npmAuditRan": true,
"pipAuditRan": true
}prism-coder is an open-source ai agents skill for AI coding assistants such as Claude Code, Codex CLI, and ChatGPT, built by dcostenco. Persistent memory + local AI for coding agents. 1.7B–32B open-weight LLM fleet, cross-session Mind Palace, cognitive routing, L3 grounding verifier, multi-agent Hivemind. Works with Claude Code, Cursor, VS Code. Offline-first, HIPAA-ready. Free tier included. It has 146 GitHub stars.
Yes. prism-coder passed SkillsLLM's automated security scan — a dependency vulnerability audit plus prompt-injection heuristics — with no high-severity issues. You can read the full report in the Security Report section on this page.
Clone the repository with "git clone https://github.com/dcostenco/prism-coder" and add it to your Claude Code skills directory (see the Installation section above).
prism-coder is primarily written in TypeScript. It is open-source under dcostenco on GitHub, so you can review or fork the full source.
Yes. SkillsLLM lists many other AI Agents skills you can browse and compare side by side. Open the AI Agents category from the badge at the top of this page, or use the Related Skills and comparison links further down to weigh prism-coder against similar tools.
No comments yet. Be the first to share your thoughts!
Give your AI agent memory that lasts. Persistent sessions, knowledge graphs, and offline tool-routing — fully local and free.
Prism Coder is an MCP server that gives Claude, Cursor, and other AI tools long-term memory that survives across sessions. It ships with the open-weight prism-coder model fleet (2B–32B) for fast, offline tool-routing — no cloud required.
No account needed. No API keys. Runs on your machine.
A paid subscription adds cloud sync, higher model tiers, and team features through the Synalux portal.
The free tier needs no account, no API key, and no cloud. Add the server to your MCP client:
{
"mcpServers": {
"prism": {
"command": "npx",
"args": ["-y", "prism-mcp-server"]
}
}
}
Open Claude Desktop or Cursor and your agent now has memory backed by a local SQLite database (~/.prism-mcp/data.db).
Optional — local model fleet for offline tool-routing. Pull whichever fits your hardware:
ollama pull dcostenco/prism-coder:2b # 2.3 GB · mobile / lightweight (99.1% routing accuracy)
ollama pull dcostenco/prism-coder:4b # 3.4 GB · balanced (100% accuracy)
ollama pull dcostenco/prism-coder:14b # 8.4 GB · Mac default (100% accuracy)
ollama pull dcostenco/prism-coder:32b # 16 GB · complex tasks (100% accuracy)
Prism detects both the namespaced (dcostenco/prism-coder:14b) and bare (prism-coder:14b) Ollama tags automatically.
Your AI agent forgets everything between sessions. Prism fixes that — and adds verification, drift detection, and multi-agent coordination on top.
Every conversation feeds a persistent store. The next session loads the right context automatically — no re-explaining.
The dashboard shows your current project state, pending TODOs, intent health, and a neural knowledge graph — all built automatically from your agent sessions.
Ask "what did I decide about the auth flow last month?" and get an answer with citations, combining vector similarity, full-text search, and graph traversal.
Every session is logged with files changed, decisions made, and TODOs. Search, filter, and replay any past session.
Long agent sessions can wander from their original goal. session_detect_drift compares current work against the stated goal and returns on_track / minor_drift / major_drift so the agent can self-correct.
AI agents apply patterns from checklists without understanding the real-world impact. The verify_behavior tool challenges the agent with a scenario it must answer before editing — forcing it to think through what the end user will experience.
Agent: "I'll revert this kitchen display change"
Prism: "⚠️ Scenario: A cook sees a 3-item ticket. One item is voided.
What should the cook see after the void?"
Agent: "The ticket stays visible with the remaining 2 items."
Prism: "Correct — your revert would hide the ticket entirely."
17 built-in domains (billing, auth, ordering, clinical, HR, and more). Custom domains per workspace on Enterprise. No hooks needed — works in any MCP client.
Roll back to any previous session state. Compare diffs between versions. Restore a known-good state with one click.
Three memory types, automatically sorted: episodic (what happened — session logs, decisions), semantic (what's true — facts, architecture), and procedural (how to do X — workflows, patterns). When you search, the router picks the right store instead of dumping everything.
Coordinate multiple AI agents working on the same project. Each agent has its own session, but they share memory through the knowledge graph. The Hivemind Radar shows real-time agent status, tasks, and activity.
Search across all memories with highlighted results, knowledge graph editing, and memory density metrics.
The free tier runs entirely on your machine. Paid tiers add cloud sync through the Synalux portal, which is what enables cross-device memory and team sharing.
| Local tier (free) | Cloud tier (paid) | |
|---|---|---|
| Memory storage | Local SQLite | Synalux portal (Supabase-backed) |
| Inference | Local Ollama models | Local models + cloud fallback |
| API keys required | None | Synalux subscription key |
| Web search / scrape | Not included | Via Synalux portal (provider keys server-side) |
| What leaves your machine | Nothing | Memory text + file paths + search queries, sent to the portal over TLS (PHI-redacted before transit) |
| Works offline | ✅ | Local features yes; sync/cloud no |
Handling sensitive data. All cloud writes pass through automatic redaction (SSNs, dates of birth, medical record numbers, phone numbers, emails, and clinical identifiers are stripped before transit). For regulated workloads, run the local tier for full air-gap, or use Enterprise which includes a HIPAA Business Associate Agreement.
The prism-coder fleet uses Qwen3.5 for MCP tool-routing. The 14B and 32B are fine-tuned from Qwen3; the 2B and 4B slots use stock Qwen3.5-4B with prompt engineering at different quantization levels (100% routing accuracy without fine-tuning). They are not general-purpose chat models — they route reliably and run offline; Claude and other frontier models remain better at reasoning, coding, and open-domain work. The intended pattern is local routing with an optional cloud fallback for hard cases.
| Model | Ollama tag | Size | BFCL Accuracy | Role | Tier |
|---|---|---|---|---|---|
| Qwen3.5-4B Q3_K_M | prism-coder:2b |
2.3 GB | 99.1% × 3 seeds | iPhone / mobile first gate | Free |
| Qwen3.5-4B Q4_K_M | prism-coder:4b |
3.4 GB | 100% × 3 seeds | Verifier + 8 GB+ devices | Free |
| prism-coder:14b | prism-coder:14b |
8.4 GB | 100% × 3 seeds | Default router | Standard+ |
| prism-coder:32b | prism-coder:32b |
16 GB | 100% × 3 seeds | Complex tasks | Advanced+ |
Weights: huggingface.co/dcostenco (public GGUF). Latency depends on model size and hardware — see Benchmarks to measure it on your own machine rather than trusting a printed number.
query → prism-coder:14b (local router, Mac default)
→ qwen3.5:4b (grounding verifier)
→ prism-coder:2b (iPhone / mobile, auto-selected by RAM)
→ prism-coder:32b (complex tasks, on demand)
→ cloud fallback (paid tiers, for max quality)
Reproduce every number yourself. All evals are open-source and self-contained:
git clone https://github.com/dcostenco/prism-coder && cd prism-coder
pip install anthropic requests
python3 tests/benchmarks/prism-routing-100/benchmark.py --models 2b 4b 14b 32b
Routing eval (115 cases, 12 categories, 3-seed mean). On this narrow tool-routing task all fleet models achieve near-perfect accuracy. Be honest with yourself about what that means: the eval is near-saturated for this taxonomy — it measures whether the right one of a small set of MCP tools is selected, not general capability. The useful takeaway is offline routing reliability at zero cost, not that a 2.3 GB model rivals a frontier model in general.
| Model | Routing accuracy | Notes |
|---|---|---|
| prism-coder:2b (Q3_K_M) | 99.1% × 3 seeds | 1 failure: regex→knowledge_search |
| prism-coder:4b / 14b / 32b | 100% × 3 seeds | Perfect on all 115 cases |
| Claude (frontier, same eval) | ~98% | Stronger everywhere outside this narrow task |
Memory uplift (LoCoMo-Plus, self-published). A separate long-context dialogue benchmark (dcostenco/Locomo-Plus) measures how much structured memory helps a base model retain multi-day context. Results show large gains when a model is paired with Prism memory versus running raw. Note this benchmark is authored, run, and LLM-judged by this project — treat it as a reproducible demonstration, not an independen