Matryoshka

Name: Matryoshka
Author: yogthos

Verified

MCP server for token-efficient large document analysis via the use of REPL state

141stars

16forks

TypeScript

Installation

# Add to your Claude Code skills
git clone https://github.com/yogthos/Matryoshka

Getting Started

Guides for using ai agents skills like Matryoshka.

Caveman: Cut Claude Token Use by 65%
How agent-side prompt compression works, when to use it, and when not to.
What is an AI Skills Marketplace?
Definitions, how marketplaces work, and how to choose between them in 2026.
Getting Started with AI Skills

Security ReportVerified

Last scanned: 5/30/2026

{
  "issues": [],
  "status": "PASSED",
  "scannedAt": "2026-05-30T16:16:26.707Z",
  "npmAuditRan": true,
  "pipAuditRan": true
}

README.md

Frequently Asked Questions

What is Matryoshka?

Matryoshka is an open-source ai agents skill for AI coding assistants such as Claude Code, Codex CLI, and ChatGPT, built by yogthos. MCP server for token-efficient large document analysis via the use of REPL state. It has 141 GitHub stars.

Is Matryoshka safe to use?

Yes. Matryoshka passed SkillsLLM's automated security scan — a dependency vulnerability audit plus prompt-injection heuristics — with no high-severity issues. You can read the full report in the Security Report section on this page.

How do I install Matryoshka?

Clone the repository with "git clone https://github.com/yogthos/Matryoshka" and add it to your Claude Code skills directory (see the Installation section above).

What programming language is Matryoshka written in?

Matryoshka is primarily written in TypeScript. It is open-source under yogthos on GitHub, so you can review or fork the full source.

Are there alternatives to Matryoshka?

Yes. SkillsLLM lists many other AI Agents skills you can browse and compare side by side. Open the AI Agents category from the badge at the top of this page, or use the Related Skills and comparison links further down to weigh Matryoshka against similar tools.

Agentic AI for Beginners

Build your first AI agent from scratch - tool use, ReAct pattern, memory, deployment

41 minBeginner

Comments (0)

to leave a comment.

No comments yet. Be the first to share your thoughts!

Related Skills

superpowers

by obra

An agentic skills framework & software development methodology that works.

234,966

Graphite NeuronFS

Matryoshka

Process documents 100x larger than your LLM's context window—without vector databases or chunking heuristics.

The Problem

LLMs have fixed context windows. Traditional solutions (RAG, chunking) lose information or miss connections across chunks. RLM takes a different approach: the model reasons about your query and outputs symbolic commands that a logic engine executes against the document.

Based on the Recursive Language Models paper.

How It Works

Unlike traditional approaches where an LLM writes arbitrary code, RLM uses Nucleus—a constrained symbolic language based on S-expressions. The LLM outputs Nucleus commands, which are parsed, type-checked, and executed by Lattice, our logic engine.

┌─────────────────┐     ┌─────────────────┐     ┌─────────────────┐
│   User Query    │────▶│   LLM Reasons   │────▶│ Nucleus Command │
│ "total sales?"  │     │  about intent   │     │  (sum RESULTS)  │
└─────────────────┘     └─────────────────┘     └────────┬────────┘
                                                         │
┌─────────────────┐     ┌─────────────────┐     ┌────────▼────────┐
│  Final Answer   │◀────│ Lattice Engine  │◀────│     Parser      │
│   13,000,000    │     │    Executes     │     │    Validates    │
└─────────────────┘     └─────────────────┘     └─────────────────┘

Why this works better than code generation:

Reduced entropy - Nucleus has a rigid grammar with fewer valid outputs than JavaScript
Fail-fast validation - Parser rejects malformed commands before execution
Safe execution - Lattice only executes known operations, no arbitrary code
Small model friendly - 7B models handle symbolic grammars better than freeform code

Architecture

The Nucleus DSL

The LLM outputs commands in the Nucleus DSL—an S-expression language designed for document analysis:

; Search for patterns
(grep "ERROR")

; Filter results
(filter RESULTS (lambda x (match x "timeout" 0)))

; Aggregate
(sum RESULTS)    ; Auto-extracts numbers from lines
(count RESULTS)  ; Count matching items

; Final answer
<<<FINAL>>>13000000<<<END>>>

Feature availability by execution path

Matryoshka has two execution paths and not every primitive works in both:

Feature	`runRLM` (CLI / programmatic)	`lattice-mcp` (MCP server)
`(grep …)`, `(filter …)`, `(map …)`, etc.	✅	✅
`(llm_query …)`, `(llm_batch …)`	✅	✅ via MCP sampling protocol
`(rlm_query …)`, `(rlm_batch …)`	✅ (concurrent rlm_batch)	✅ — child Nucleus session spawns via the same MCP sampling bridge; M suspensions per `rlm_query` call. `rlm_batch` runs sequentially (children one at a time) because the multi-turn suspension protocol only carries one pending request at a time — concurrent children would lose suspensions. Round-trip count is the same; wall-clock is N×slower for non-sampling clients.
`(context N)` selector	✅ (multi-doc via `runRLMFromContent(query, string[])`)	partial — `(context 0)` works; multi-doc loading not exposed via `lattice_load`
`(grep "X" haystack)`	✅	✅
`(show_vars)`	✅	✅ (internal `_<name>` bindings filtered out)
`FINAL_VAR(name)` resolution	✅	N/A — MCP returns query results directly
`maxTimeoutMs` / `maxChars` / `maxErrors`	✅	❌ — MCP has its own session timeout
`compactionThresholdChars`	✅	❌ — MCP doesn't have a multi-turn FSM history

The resource-limit features remain runRLM-only. The recursive primitives (rlm_query/rlm_batch) work in both paths — the MCP path spawns a child runRLMFromContent whose llmClient is the same sampling bridge as the parent, so each child turn flows through the existing MCP suspension/sampling protocol.

Recursive Primitives

rlm_query spawns a child Nucleus session with its own FSM loop. The child runs to FINAL and returns a string — useful when a sub-task needs multi-turn reasoning over a structured handle:

; Child sees the resolved handle as its working document, NOT a
; JSON-stringified prompt blob. Lets the child use grep/lines/
; chunk_by_lines over arrays without JSON-syntax noise.
(rlm_query "extract dates" (context RESULTS))

; No (context …) → child's document is the prompt itself.
(rlm_query "summarize each error type")

rlm_batch runs the same per-item recursion across a collection. Each item produces one entry in the returned array, in input order. Per-item failures surface as "Error: rlm_batch item N failed — …" strings without aborting the rest of the batch:

(rlm_batch (chunk_by_lines 100)
  (lambda c (rlm_query "extract metrics" (context c))))

runRLM: children fan out concurrently via a worker pool capped at maxConcurrentSubcalls (default 4).
lattice-mcp: children run sequentially because the multi-turn suspension protocol can carry only one pending request at a time. Round-trip count is identical to the concurrent path (N children × M turns each); only wall-clock differs.

Multi-Context Loading

Pass string[] to runRLMFromContent to load multiple documents. Address them via (context N); index 0 is the default for primitives that don't specify a haystack:

(grep "DEPLOY" (context 0))   ; deploy.log
(grep "OUTAGE" (context 2))   ; comms.log

; (context N) is just a term — pipe it anywhere a string is expected
(rlm_query "scan" (context (context 1)))   ; child sees doc 1

Per-doc line numbers come back, so the LLM can cite "doc 0 line 4, doc 2 line 2" with confidence rather than inventing absolute offsets across a concatenation.

Introspection

(show_vars)   ; Returns a string summary of every binding currently
              ; in scope. Useful before a (filter RESULTS …) or a
              ; FINAL_VAR(name) reference when the LLM lost track of
              ; what's bound. Same surface as the `lattice_bindings`
              ; MCP tool but reachable from inside a query.

Unknown FINAL_VAR markers surface a clear error rather than passing the literal text through:

<<<FINAL>>>FINAL_VAR(_99)<<<END>>>
→ "[FINAL_VAR error: unknown binding "_99". Available: _1, RESULTS]"

Resource Limits

All optional. With none set, behavior is unchanged:

runRLM(query, file, {
  maxTimeoutMs: 30_000,    // wall-clock cap, propagates to children
  maxChars: 100_000,       // cumulative chars sent + received
  maxErrors: 5,            // consecutive parse/execution errors
  compactionThresholdChars: 50_000,  // summarize history when prompt grows past this
})

When a limit hits, the run terminates cleanly with a string of the form:

[aborted: timeout 32100ms of 30000ms]

Best partial answer:
<the most recent meaningful solver result>

The partial answer is always preserved when present — completed work is never silently lost on abort.

The Lattice Engine

The Lattice engine (src/logic/) processes Nucleus commands:

Parser (lc-parser.ts) - Parses S-expressions into an AST
Type Inference (type-inference.ts) - Validates types before execution
Constraint Resolver (constraint-resolver.ts) - Handles symbolic constraints like [Σ⚡μ]
Solver (lc-solver.ts) - Executes commands against the document

Lattice uses miniKanren (a relational programming engine) for pattern classification and filtering operations.

In-Memory Handle Storage

For large result sets, RLM uses a handle-based architecture with in-memory SQLite (src/persistence/) that achieves 97%+ token savings:

Traditional:  LLM sees full array    [15,000 tokens for 1000 results]
Handle-based: LLM sees stub          [50 tokens: "$grep_error: Array(1000) [preview...]"]

How it works:

Results are stored in SQLite with FTS5 full-text indexing
LLM receives descriptive handle references derived from the command (e.g., $grep_error, $bm25_timeout, $filter_status)
Operations execute server-side, returning new handles
Full data is only materialized when needed

Handle names are auto-generated from the Nucleus command: (grep "ERROR") produces $grep_error, (list_symbols "function") produces $list_symbols_function. Repeated commands get a numeric suffix ($grep_error_2, $grep_error_3).

Memory Pad

The Lattice engine doubles as a context memory for LLM agents. Instead of roundtripping large text blobs in every message, agents stash context server-side and carry only compact handle stubs:

Agent reads file, summarizes → lattice_memo "auth architecture"
                              → $memo_auth_architecture: "auth architecture" (2.1KB, 50 lines)

20 messages later, needs it  → lattice_expand $memo_auth_architecture
                              → Full 50-line summary

Token math (30-message session, 3 source files stashed):

Traditional roundtripping: 836K tokens
Memo-based (stubs + 6 expands): 57K tokens — 93% savings

Memos persist across document loads (lattice_load clears query handles but keeps memos), support LRU eviction (100 memo cap, 10MB budget), and can be explicitly deleted when stale. No document needs to be loaded to use memos.

The Role of the LLM

The LLM does reasoning, not code generation:

Understands intent - Interprets "total of north sales" as needing grep + filter + sum
Chooses operations - Decides which Nucleus commands achieve the goal
Verifies results - Checks if the current results answer the query
Iterates - Refines search if results are too broad or narrow

The LLM never writes JavaS