by wanshuiyin
ARIS ⚔️ (Auto-Research-In-Sleep) — Claude Code skills for autonomous ML research: cross-model review loops, idea discovery, and experiment automation via Codex MCP
# Add to your Claude Code skills
git clone https://github.com/wanshuiyin/Auto-claude-code-research-in-sleep中文版 README | English

🌙 Let Claude Code do research while you sleep. Wake up to find your paper scored, weaknesses identified, experiments run, and narrative rewritten — autonomously.
Custom Claude Code skills for autonomous ML research workflows. These skills orchestrate cross-model collaboration — Claude Code drives the research while an external LLM (via Codex MCP) acts as a critical reviewer. 🔀 Also supports alternative model combinations (e.g., GLM + GPT, GLM + MiniMax) — no Claude API required.
💭 Why not self-play with a single model? Using Claude Code subagents or agent teams for both execution and review is technically possible, but tends to fall into local minima — the same model reviewing its own patterns creates blind spots. Claude Code's strength is fast, fluid execution; Codex (GPT-5.4 xhigh) is slower but more deliberate and rigorous in critique. These complementary styles — speed × rigor — produce better outcomes than either model talking to itself.
A real overnight 4-round run on an ML research project, from borderline reject to submission-ready:
| Round | Score | What Happened | |-------|-------|---------------| | Initial | 5.0/10 | Borderline reject | | Round 1 | 6.5/10 | Added standard metrics, discovered metric decoupling | | Round 2 | 6.8/10 | Key claim failed to reproduce, pivoted narrative | | Round 3 | 7.0/10 | Large seed study killed main improvement claim | | Round 4 | 7.5/10 ✅ | Diagnostic evidence solidified, submission ready |
The loop autonomously ran 20+ GPU experiments, rewrote the paper's narrative framing, and killed claims that didn't hold up — all without human intervention.
Don't have a concrete idea yet? Just give a research direction — /idea-creator handles the rest:
The output is a ranked IDEA_REPORT.md with hypotheses, pilot results, reviewer objections, and a suggested execution order. Ideas that fail are documented too, saving future dead-end exploration.
These skills compose into a full research lifecycle. The two workflows can be used independently or chained together:
No comments yet. Be the first to share your thoughts!