by yogthos
MCP server for token-efficient large document analysis via the use of REPL state
# Add to your Claude Code skills
git clone https://github.com/yogthos/MatryoshkaProcess documents 100x larger than your LLM's context window—without vector databases or chunking heuristics.
LLMs have fixed context windows. Traditional solutions (RAG, chunking) lose information or miss connections across chunks. RLM takes a different approach: the model reasons about your query and outputs symbolic commands that a logic engine executes against the document.
Based on the Recursive Language Models paper.
Unlike traditional approaches where an LLM writes arbitrary code, RLM uses Nucleus—a constrained symbolic language based on S-expressions. The LLM outputs Nucleus commands, which are parsed, type-checked, and executed by Lattice, our logic engine.
┌─────────────────┐ ┌─────────────────┐ ┌─────────────────┐
│ User Query │────▶│ LLM Reasons │────▶│ Nucleus Command │
│ "total sales?" │ │ about intent │ │ (sum RESULTS) │
└─────────────────┘ └─────────────────┘ └────────┬────────┘
│
┌─────────────────┐ ┌─────────────────┐ ┌────────▼────────┐
│ Final Answer │◀────│ Lattice Engine │◀────│ Parser │
│ 13,000,000 │ │ Executes │ │ Validates │
└─────────────────┘ └─────────────────┘ └─────────────────┘
Why this works better than code generation:
No comments yet. Be the first to share your thoughts!
The LLM outputs commands in the Nucleus DSL—an S-expression language designed for document analysis:
; Search for patterns
(grep "ERROR")
; Filter results
(filter RESULTS (lambda x (match x "timeout" 0)))
; Aggregate
(sum RESULTS) ; Auto-extracts numbers from lines
(count RESULTS) ; Count matching items
; Final answer
<<<FINAL>>>13000000<<<END>>>
The Lattice engine (src/logic/) processes Nucleus commands:
lc-parser.ts) - Parses S-expressions into an ASTtype-inference.ts) - Validates types before executionconstraint-resolver.ts) - Handles symbolic constraints like [Σ⚡μ]lc-solver.ts) - Executes commands against the documentLattice uses miniKanren (a relational programming engine) for pattern classification and filtering operations.
For large result sets, RLM uses a handle-based architecture with in-memory SQLite (src/persistence/) that achieves 97%+ token savings:
Traditional: LLM sees full array [15,000 tokens for 1000 results]
Handle-based: LLM sees stub [50 tokens: "$grep_error: Array(1000) [preview...]"]
How it works:
$grep_error, $bm25_timeout, $filter_status)Handle names are auto-generated from the Nucleus command: (grep "ERROR") produces $grep_error, (list_symbols "function") produces $list_symbols_function. Repeated commands get a numeric suffix ($grep_error_2, $grep_error_3).
The Lattice engine doubles as a context memory for LLM agents. Instead of roundtripping large text blobs in every message, agents stash context server-side and carry only compact handle stubs:
Agent reads file, summarizes → lattice_memo "auth architecture"
→ $memo_auth_architecture: "auth architecture" (2.1KB, 50 lines)
20 messages later, needs it → lattice_expand $memo_auth_architecture
→ Full 50-line summary
Token math (30-message session, 3 source files stashed):
Memos persist across document loads (lattice_load clears query handles but keeps memos), support LRU eviction (100 memo cap, 10MB budget), and can be explicitly deleted when stale. No document needs to be loaded to use memos.
The LLM does reasoning, not code generation:
The LLM never writes JavaScript. It outputs Nucleus commands that Lattice executes safely.
Install from npm:
npm install -g matryoshka-rlm
Or run without installing:
npx matryoshka-rlm "How many ERROR entries are there?" ./server.log
The package provides several CLI tools:
| Command | Description |
|---------|-------------|
| rlm | Main CLI for document analysis with LLM reasoning |
| rlm-mcp | MCP server with full RLM + LLM orchestration (analyze_document tool) |
| lattice-mcp | MCP server exposing direct Nucleus commands (no LLM required) |
| lattice-repl | Interactive REPL for Nucleus commands |
| lattice-http | HTTP server for Nucleus queries |
| lattice-pipe | Pipe adapter for programmatic access |
| lattice-setup | Setup script for Claude Code integration |
git clone https://github.com/yogthos/Matryoshka.git
cd Matryoshka
npm install
npm run build
Copy config.example.json to config.json and configure your LLM provider:
{
"llm": {
"provider": "ollama"
},
"providers": {
"ollama": {
"baseUrl": "http://localhost:11434",
"model": "qwen3-coder:30b",
"options": { "temperature": 0.2, "num_ctx": 8192 }
},
"deepseek": {
"baseUrl": "https://api.deepseek.com",
"apiKey": "${DEEPSEEK_API_KEY}",
"model": "deepseek-chat",
"options": { "temperature": 0.2 }
}
},
"rlm": {
"maxTurns": 10
}
}
# Basic usage
rlm "How many ERROR entries are there?" ./server.log
# With options
rlm "Count all ERROR entries" ./server.log --max-turns 15 --verbose
# See all options
rlm --help
RLM includes lattice-mcp, an MCP (Model Context Protocol) server for direct access to the Nucleus engine. This allows coding agents to analyze documents with 80%+ token savings compared to reading files directly.
The key advantage is handle-based results: query results are stored server-side in SQLite, and the agent receives compact stubs like $grep_error: Array(1000) [preview...] instead of full data. Handle names are derived from the command for easy identification. Operations chain server-side without roundtripping data.
| Tool | Description |
|------|-------------|
| lattice_load | Load a document for analysis |
| lattice_query | Execute Nucleus commands on the loaded document |
| lattice_expand | Expand a handle to see full data (with optional limit/offset) |
| lattice_memo | Store arbitrary context as a memo handle (no document required) |
| lattice_memo_delete | Delete a stale memo to free memory |
| lattice_close | Close the session and free memory |
| lattice_status | Get session status, document info, and memo usage |
| lattice_bindings | Show current variable bindings and memo labels |
| lattice_reset | Reset all bindings and memos but keep document loaded |
| lattice_llm_respond | Respond to a pending (llm_query ...) suspension |
| lattice_llm_batch_respond | Respond to a pending (llm_batch ...) suspension with all N responses |
| lattice_help | Get Nucleus command reference |
{
"mcp": {
"lattice": {
"type": "stdio",
"command": "lattice-mcp"
}
}
}
1. lattice_load("/path/to/large-file.txt") # Load document (use for >500 lines)
2. lattice_query('(grep "ERROR")') # Search → $grep_error: Array(500) [preview]
3. lattice_query('(filter RESULTS ...)') # Narrow → $filter_timeout: Array(50) [preview]
4. lattice_query('(count RESULTS)') # Count without seeing data → 50
5. lattice_expand("$filter_timeout", limit=10) # Expand only what you need to see
6. lattice_close() # Free memory when done
Token efficiency tips:
lattice_expand with limit to see only what you needgrep → filter → count/sum to refine progressivelyRESULTS in queries (always points to last result)$grep_error) with lattice_expand to inspect specific resultsTwo primitive families power the paper's Ω(|P|²) semantic-horizon pattern:
Chunking — pre-slice a document that's too big to map over directly:
(chunk_by_size 2000) ; 2000-character slices
(chunk_by_lines 100) ; 100-line slices
(chunk_by_regex "\\n\\n") ; Split on blank lines; capture groups ignored
Sub-LLM calls — (llm_query ...) invokes a sub-LLM with an
interpolated prompt. Works at the top level and nested inside
map / filter / reduce lambdas:
(llm_query "Summarize this")