by Arkya-AI
A complete operating system for Claude that prevents context loss and makes multi-session work reliable
# Add to your Claude Code skills
git clone https://github.com/Arkya-AI/claude-context-osNew CLAUDE.md that solves the compaction/context loss problem and mimics persistent memory through effective session handoff in token efficient manner
v3 got 150+ stars. Then I read 9 peer-reviewed papers on how LLMs actually process system prompts. Turns out v3 was working against itself.
The research said three things that broke my assumptions:
LLMs track 5-10 rules before performance degrades. v3 had 15+. Some rules were silently dropped every session. (arXiv:2409.10715)
Explanatory prose actively interferes with instruction compliance. The "Why It Matters" sections in v3 weren't neutral — they competed with the actual rules for Claude's attention. Topically related filler is worse than random noise. (Chroma Context Rot, 2025)
Claude 4.6 overtriggers on emphatic language. "CRITICAL", "NEVER", "YOU MUST" worked for Claude 3.x. The newer models are more responsive to system prompts and now overcorrect when they see aggressive language. (Anthropic Claude 4 Best Practices)
The result: v4 is 47 lines and 7 rules. Down from 327 lines and 15+ rules. Every design choice maps to a specific research finding.
| | v3 | v4 | |---|---|---| | Lines | 327 | 47 | | Rules | 15+ | 7 | | Explanatory prose | Heavy | Zero | | Language style | "CRITICAL / NEVER / MUST" | Plain imperatives | | Session startup | Read all summaries | Read only what handoff references | | Reference content | Embedded in CLAUDE.md | Moved to docs/context/ (loaded on demand) | | Templates | Same | Same + task template + handoff index |
No comments yet. Be the first to share your thoughts!
Claude's default summarization loses five specific things:
The fix isn't better summarization — it's structured templates with explicit fields that mechanically prevent these five failure modes.
CLAUDE.md (47 lines, ~1,200 tokens) — the operating system
templates/
claude-templates.md (on-demand only) — 6 templates, 3 subagent contracts,
3 workflow phases, task definition format
docs/
context/ (on-demand) — reference files loaded by work type
processing-protocol.md — how to handle multiple documents
archive-rules.md — summary lifecycle and file management
project-structure.md — full directory tree + scaffold commands
subagent-rules.md — when to use subagents vs. main agent
client-briefs/ — per-client context (you populate these)
.claude/commands/
handoff.md /handoff — end a session cleanly
process-doc.md /process-doc — summarize an input document
status.md /status — check project state
examples/
software-dev-project/ worked example with filled-in templates
CLAUDE-v3.md previous version (preserved for reference)
Research on transformer working memory (arXiv:2409.10715) shows LLMs can actively track 5-10 constraints before performance degrades to near-random. With 15 rules, Claude silently drops the ones it considers least relevant. With 7, compliance per rule goes up significantly.
The rules that were cut didn't disappear — they moved to docs/context/ where Claude loads them only when the task requires it. This matches Anthropic's own recommendation for "just-in-time context".
v3 had sections like "Structured Summarization: Why It Matters" and "Cost of NOT Splitting" — prose that explained why each rule existed.
The Chroma Context Rot study (July 2025, 18 LLMs tested) found that coherent filler text that's topically related to the target task interferes MORE than random noise. Explanatory prose about context management rules actively competes with the context management rules themselves.
v4 has rules only. No justification, no persuasion, no theory. The rationale lives here in the README — for humans, not for Claude.
Anthropic's Claude 4 best practices note that Claude Opus 4.5/4.6 is "significantly more proactive" and "more responsive to the system prompt." Instructions that needed emphasis for Claude 3.x now cause overtriggering in Claude 4.6.
v4 uses calm imperatives: "Do not mix client contexts" instead of "NEVER mix client contexts. This is CRITICAL."
The Lost in the Middle paper (Liu et al., TACL 2024) showed LLM performance follows a U-shaped curve — highest attention at the beginning and end of context, lowest in the middle. Anthropic's own testing shows queries placed at the end of prompts improve response quality by up to 30%.
The verification checklist is deliberately placed at the end of CLAUDE.md so it's the last instruction Claude processes before starting work.
CLAUDE.md now tells Claude which context files to load based on the type of work:
- Proposals/sales → client-briefs/[client].md + sales-frameworks.md
- Content/thought leadership → content-strategy.md
- Workshops → workshop-design.md + client-briefs/[client].md
- Agent development → use codebase directly
No more loading everything. Claude loads what the task needs.
Inspired by GSD's atomic task structure:
## Task: [name]
Context files: [which files to load]
Action: [what to produce]
Verify: [how to check it's correct]
Done: [what files exist when complete]
Inspired by claude-mem's progressive disclosure pattern. Every handoff now includes a "Files to Load Next Session" field — an explicit manifest that tells the next session exactly what to read, instead of loading all summaries.
CLAUDE.md to your project roottemplates/ and docs/context/ directories.claude/commands/ for slash commandsdocs/context/ files for your domain (sales frameworks, client briefs, etc.)No dependencies. No install.
Anyone doing multi-session work with Claude — consultants, developers, researchers, sales professionals, no-code builders. If your projects span days or weeks, not minutes.
v4 was specifically designed to work for non-developers too — consultants, sales professionals, researchers, and no-code builders using Claude Code for business tasks like proposals, competitive analysis, content strategy, and workshop design.
| Design Decision | Research | |---|---| | 7 rules max | Working Memory Limits — LLMs track 5-10 constraints | | No explanatory prose | Context Rot — coherent filler interferes | | Rules are 1-3 lines each | Hidden in the Haystack — too-short instructions harder to find | | Verification at end | Lost in the Middle — U-shaped attention curve | | Plain imperatives | Claude 4 Best Practices — overtriggering | | "State your understanding" step | Anthropic + EMNLP 2025 — recite-before-reasoning | | Just-in-time context loading | Anthropic Context Engineering | | Session splitting as architecture | NoLiMa — 32K token performance cliff |
Additional references:
If you're using v3:
templates/claude-templates.md is mostly compatible — v4 adds Template 6 (Task Definition) and a "Files to Load Next Session" field to the handoff templatedocs/context/ directory — this is where v3's displaced content now livesCLAUDE.md with v4docs/summaries/