by ProjectDXAI
Autonomous multi-branch research lab. Branches compete for compute budget. The system converges on what works.
# Add to your Claude Code skills
git clone https://github.com/ProjectDXAI/labratlabrat is a local-first runtime that puts Claude Code or Codex on a real research problem with a scoreboard and enough structure to run for hours. Population search, not single-thread: families of ideas compete for compute budget, and the ones that produce real signal earn more room to keep going.

Live example run: the baseline still leads on the main selection metric, while classifier_search has already won two decisive held-out challenges and earned extra funding.
labrat treats Claude Code and Codex as peer operator interfaces. Stronger reasoning models still help most on synthesis, audit, and consolidation, but the runtime contract and file layout stay the same across both.
Jump to → Run it in 5 minutes · Start from a profile · Create your own lab · Why it exists
In plain English:
No comments yet. Be the first to share your thoughts!
This means a family can become strategically important even before it becomes the global selection champion. The dashboard now shows both the current champion and the current decisive-challenge leaders.
labrat works best when the problem has:
Good examples:
If you only open three things first:
labrat is not a philosophy-of-science engine, but Lakatos is a useful mental model for the runtime.
Stay inside a family while it is still producing real signal. Escalate to audit or frame break when local repairs stop paying for themselves. In Lakatos's terms, a programme “is progressive if it is both theoretically and empirically progressive, and degenerating if it is not”. In labrat, that means a family should not only improve the known metric, but also win a decisive held-out test that rivals do not already own. More on that in docs/DEEP_RESEARCH.md.
Start with the flagship example:
python -m venv .venv
source .venv/bin/activate
pip install -e '.[nlp-sentiment]'
labrat doctor --lab-dir examples/nlp-sentiment/research_lab
labrat bootstrap --lab-dir examples/nlp-sentiment/research_lab
python -m http.server 8787 --directory examples/nlp-sentiment/research_lab
labrat status --lab-dir examples/nlp-sentiment/research_lab
labrat next-prompt --lab-dir examples/nlp-sentiment/research_lab --runner claude --phase auto
Use --runner codex for Codex.
The editable install is intentional: the labrat CLI keeps using the templates, profiles, and scripts from this checkout. If you prefer the original in-lab workflow, the copied scripts/*.py entrypoints still work unchanged inside each lab.
What you get from the example:
labrat does not depend on hidden local skills, private prompts, or machine-specific setup. The operator contract ships in the repo and in every generated lab:
AGENTS.md for Codex, CLAUDE.md for Claude CodeAGENTS.md, CLAUDE.md, .claude/commands/, and agent_prompts/labrat ... from the repo root or python scripts/... inside a labThat means a user can clone the repo, open either Codex or Claude Code, and operate the lab from files that are already present in version control. There is no required SKILLS.md convention to make the repo work.
If you already know the shape of your research problem, a profile scaffolds a runnable lab in one command. No Phase 0 hand-editing, no LABRAT_PLACEHOLDER stubs.
labrat new ~/labs/my_search --profile=transformer-arch
cd ~/labs/my_search
python -m pip install -r requirements.txt
labrat doctor --lab-dir .
labrat check-readiness --lab-dir .
labrat bootstrap --lab-dir .
Every lab, whether profile-scaffolded or hand-built, ships both primary operator surfaces:
AGENTS.md for CodexCLAUDE.md plus .claude/commands/ for Claude Codeagent_prompts/ for the shared phase prompts consumed by either interfaceThe Claude Code slash commands are short markdown files that wrap common operator actions so you do not have to remember the CLI invocations:
/next — print the prompt for the current phase and execute it./why-stuck — diagnose a stalled frontier from state/frontier.json and recent evaluations./synthesize — summarise the last ~10 evaluations before dispatching more work./audit-candidate — walk the highest-signal suspicious candidate through the audit worker./frame-break — propose a structural pivot once cheap probes and audits are exhausted./consolidate — write a compact checkpoint note to logs/checkpoints/.Open Claude Code in the lab directory and type /next, or hand-run python scripts/operator_helper.py next-prompt --runner claude --phase auto. In Codex, read AGENTS.md and run python scripts/operator_helper.py next-prompt --runner codex --phase auto.
transformer-arch — tiny character-level transformer architecture search with held-out-distribution decisive challenges. Ships a synthetic runner so you can exercise the full runtime loop without a training framework; replace scripts/run_experiment.py with your own trainer when you want real training.More profiles (world-model, multi-dataset) land in follow-up PRs. See docs/PROFILES.md for the profile contract and docs/LONG_HORIZON.md for interim-checkpoint and long-running-job conventions.
If no profile fits your problem, scaffold an empty lab and finish Phase 0 by hand. The default path is deep research first.
labrat new my_lab
cd my_lab
labrat doctor --lab-dir .
labrat next-prompt --lab-dir . --runner claude --phase design
labrat check-readiness --lab-dir .
labrat bootstrap --lab-dir .
python -m http.server 8787
labrat next-prompt --lab-dir . --runner claude --phase auto
Use --runner codex if you are operating from Codex instead of Claude Code.
Phase 0 must produce:
branches.yamldead_ends.mdresearch_brief.mdresearch_sources.mdevaluation.yamlruntime.yamlevaluation.yaml now includes at least one held-out prediction_tests challenge. That is how the runtime distinguishes “fit the known metric a bit better” from “this family actually predicted something hard.”
checkpoints.jsonl contract, failure_class values, per-pool timeouts/loop cadence, stop criteria, cold-start recoverylabrat comes out of DXRG. We first used variants of this runtime internally to explore different financial world-model architectures and adjacent research workflows, then published the parts that generalized cleanly beyond that domain.
labrat is its own system, but the current shape is informed by a few clear predecessors and adjacent designs: