by yogthos
MCP server for token-efficient large document analysis via the use of REPL state
# Add to your Claude Code skills
git clone https://github.com/yogthos/MatryoshkaProcess documents 100x larger than your LLM's context window—without vector databases or chunking heuristics.
LLMs have fixed context windows. Traditional solutions (RAG, chunking) lose information or miss connections across chunks. RLM takes a different approach: the model reasons about your query and outputs symbolic commands that a logic engine executes against the document.
Based on the Recursive Language Models paper.
Unlike traditional approaches where an LLM writes arbitrary code, RLM uses Nucleus—a constrained symbolic language based on S-expressions. The LLM outputs Nucleus commands, which are parsed, type-checked, and executed by Lattice, our logic engine.
┌─────────────────┐ ┌─────────────────┐ ┌─────────────────┐
│ User Query │────▶│ LLM Reasons │────▶│ Nucleus Command │
│ "total sales?" │ │ about intent │ │ (sum RESULTS) │
└─────────────────┘ └─────────────────┘ └────────┬────────┘
│
┌─────────────────┐ ┌─────────────────┐ ┌────────▼────────┐
│ Final Answer │◀────│ Lattice Engine │◀────│ Parser │
│ 13,000,000 │ │ Executes │ │ Validates │
└─────────────────┘ └─────────────────┘ └─────────────────┘
Why this works better than code generation:
No comments yet. Be the first to share your thoughts!
The LLM outputs commands in the Nucleus DSL—an S-expression language designed for document analysis:
; Search for patterns
(grep "SALES_DATA")
; Filter results
(filter RESULTS (lambda x (match x "NORTH" 0)))
; Aggregate
(sum RESULTS) ; Auto-extracts numbers like "$2,340,000" from lines
(count RESULTS) ; Count matching items
; Final answer
<<<FINAL>>>13000000<<<END>>>
The Lattice engine (src/logic/) processes Nucleus commands:
lc-parser.ts) - Parses S-expressions into an ASTtype-inference.ts) - Validates types before executionconstraint-resolver.ts) - Handles symbolic constraints like [Σ⚡μ]lc-solver.ts) - Executes commands against the documentLattice uses miniKanren (a relational programming engine) for pattern classification and filtering operations.
For large result sets, RLM uses a handle...