by LycheeMem
Compact, efficient, and extensible long-term memory for LLM agents.
# Add to your Claude Code skills
git clone https://github.com/LycheeMem/LycheeMemLycheeMem is a compact memory framework for LLM agents. It starts from efficient conversational memory—through structured organization, lightweight consolidation, and adaptive retrieval—and gradually extends toward action-aware, usage-aware memory for more capable agentic systems.
LycheeMem organizes memory into three complementary stores:
The working memory window holds the active conversation context for a session. It operates under a dual-threshold token budget:
Compression produces summary anchors (past context, distilled) + raw recent turns (last N turns, verbatim). Both are passed downstream as the conversation history.
No comments yet. Be the first to share your thoughts!
Semantic memory is organised around typed, action-annotated MemoryRecords. The storage layer is SQLite (FTS5 full-text search) + LanceDB (vector index).
Each memory entry is stored as a MemoryRecord. The memory_type field distinguishes seven semantic categories:
| Type | Description |
|------|-------------|
| fact | Objective facts about the user, environment, or world |
| preference | User preferences (style, habits, likes/dislikes) |
| event | Specific events that have occurred |
| constraint | Conditions that must be respected |
| procedure | Reusable step-by-step procedures / methods |
| failure_pattern | Previously failed action paths and their causes |
| tool_affordance | Capabilities and applicable scenarios of tools/APIs |
Beyond text, every MemoryRecord carries action-facing metadata (tool_tags, constraint_tags, failure_tags, affordance_tags) and usage statistics (retrieval_count, action_success_count, etc.) to seed future reinforcement-learning signals.
Related MemoryRecords can be fused online by the Record Fusion Engine into a denser CompositeRecord; composite entries are ranked above source fragments during retrieval.
A single-pass pipeline that converts conversation turns into a list of MemoryRecords:
memory_type, tool_tags, constraint_tags, failure_tags, affordance_tags, and other structured labels.record_id = SHA256(normalized_text) — naturally idempotent; duplicate content is deduplicated automatically.
Triggered online after each consolidation:
synthesis_judge).CompositeRecord written to both SQLite and LanceDB; original records are retained.Before retrieval, ActionAwareRetrievalPlanner analyses the user query and emits a SearchPlan:
mode: answer (factual Q&A) / action (needs execution) / mixedsemantic_queries: content-facing search termspragmatic_queries: action/tool/constraint-facing search termstool_hints: tools likely needed for this requestrequired_constraints: constraints that are missingmissing_slots: parameters / slots that are absentThe plan drives five-channel recall:
MemoryRecord + CompositeRecordsemantic_text embeddingsnormalized_text embeddings (for pragmatic queries)tool_hints / constraint_tagsSearchPlan.temporal_filter time windowCandidates from all channels are de-duplicated and ranked by MemoryScorer using a weighted linear combination:
$$\text{Score} = \alpha \cdot S_\text{sem} + \beta \cdot S_\text{action} + \gamma \cdot S_\text{temporal} + \delta \cdot S_\text{recency} + \eta \cdot S_\text{evidence} - \lambda \cdot C_\text{token}$$
| Weight | Meaning | Default | |--------|---------|---------| | α | SemanticRelevance (vector distance -> similarity) | 0.30 | | β | ActionUtility (tag match score, mode-aware) | 0.25 | | γ | TemporalFit (temporal reference match) | 0.15 | | δ | Recency (memory freshness) | 0.10 | | η | EvidenceDensity (evidence span density) | 0.10 | | λ | TokenCost penalty (text length penalty) | 0.10 |
The skill store preserves reusable how-to knowledge as structured skill entries, each carrying:
doc_markdown — a full Markdown document describing the procedure, commands, parameters, and caveats.Skill retrieval uses HyDE (Hypothetical Document Embeddings): the query is first expanded into a hypothetical ideal answer by the LLM, then that draft text is embedded to produce a query vector that matches well against stored procedure descriptions, even when the user's original phrasing is vague.
Every request passes through a fixed sequence of five agents. Four are synchronous stages in the LangGraph pipeline; one is a background post-processing task.
Rule-based agent (no LLM prompt). Appends the user turn to the session log, counts tokens, and fires compression if either threshold is crossed. Produces compressed_history and raw_recent_turns for downstream stages.
ActionAwareRetrievalPlanner first analyses the user query and produces a SearchPlan containing mode, semantic_queries, pragmatic_queries, tool_hints, and more. Five parallel recall channels (FTS full-text, semantic vector, normalised vector, tag filter, temporal filter) then query SQLite + LanceDB, and the resulting candidates are ranked by the six-dimensional Scorer formula before being merged into background_context. Skill sub-queries use HyDE embe