๐ฅ An autonomous AI agent that runs your deep learning experiments 24/7 while you sleep. Zero-cost monitoring, Leader-Worker architecture, constant-size memory.
# Add to your Claude Code skills
git clone https://github.com/Xiangyue-Zhang/auto-deep-researcher-24x72026-04-09
2026-04-08
workspace/progress_tracking/.If you only want the shortest path to a working experiment loop, do this:
PROJECT_BRIEF.md/auto-experiment --project /path/to/project --gpu 0/experiment-status or optional Obsidian/local text notesPrefer AI-guided setup? Open AI_GUIDE.md in Claude / ChatGPT / Codex and let the assistant walk you through it.
| Requirement | Required | Notes |
|-------------|----------|-------|
| Python 3.10+ | Yes | Runtime |
| 1+ NVIDIA GPU | Yes | For training |
| API key | Yes | Anthropic or OpenAI |
| PROJECT_BRIEF.md | Yes | Main control file |
| Project config.yaml | Optional | Only if you want to override defaults |
| Obsidian vault | Optional | If absent, notes fall back to local text files |
The smallest project you can launch looks like this:
my-first-experiment/
โโโ PROJECT_BRIEF.md
โโโ workspace/ # auto-created
No comments yet. Be the first to share your thoughts!
Minimal PROJECT_BRIEF.md:
# Goal
Train a ResNet-50 on CIFAR-100 to reach 80%+ accuracy.
# Codebase
Create the training code from scratch in PyTorch.
# What to Try
- Start with a basic ResNet-50 baseline.
- If accuracy < 75%, improve optimization and schedule.
- If accuracy is 75-80%, try augmentation.
- If accuracy > 80%, stop and report.
# Constraints
- Use GPU 0 only
- Max 100 epochs per run
That is enough to start. Everything else is optional refinement.
This project is for people who already know what experiment they want to run, but do not want to babysit the loop:
It is not trying to replace the researcher. It is trying to take over the repetitive experiment-ops layer.
You control the research direction through three files:
PROJECT_BRIEF.md: stable goal, constraints, allowed search spaceHUMAN_DIRECTIVE.md: temporary redirect for the next cycleworkspace/MEMORY_LOG.md: rolling memory of results and decisionsCommon control patterns:
# Keep the search narrow
- Only tune augmentation.
- Do not change the backbone.
- Keep training budget fixed.
# Make the agent stop exploring a weak direction
- If gain stays below 0.3 points for 3 runs, stop this branch.
- Return to the last trusted baseline and try a different idea.
# Force result verification
- If a result looks unusually strong, rerun with the same seed and one new seed.
- Do not claim improvement until both reproduce.
You should never have to guess what the agent is doing.
/experiment-status shows current goal, best result, cycle count, running status, and recent decisions/progress-report generates a structured summary/obsidian-sync refreshes persistent notes manuallyworkspace/progress_tracking/ stores local text notes when no Obsidian vault is configuredIf you want a dashboard outside the terminal:
obsidian:
enabled: true
vault_path: "~/Documents/MyObsidianVault" # Optional
auto_append_daily: true
If vault_path is empty, the same information is saved locally:
workspace/progress_tracking/Dashboard.txt
workspace/progress_tracking/Daily/YYYY-MM-DD.txt
Our hope is simple: science stays pure, and the human stays in the loop.
We built this framework for one reason โ to take the repetitive, mechanical parts of running deep learning experiments off the researcher's plate (launching jobs, watching GPUs, parsing logs, sweeping hyperparameters) so that more of your time can go into the part that actually matters: thinking.
If you're here because you want to spend less time babysitting training runs and more time reading, reasoning, and chasing your own ideas โ welcome. That's exactly who we built this for.
A gentle thought we'd love every user to share with us:
The agent is happy to run the experiments. But please let the ideas, the interpretation, and the scientific judgment remain yours. We don't see automation and academic integrity as being in tension โ quite the opposite. The hours this tool gives back are meant to be reinvested in deeper thinking, not in skipping it.
So we'd kindly ask that this project not be used to fabricate results, to generate "research" with no human in the loop, or to shortcut the parts of science that depend on a human actually understanding what they're doing. That isn't the future we want to help build โ and we don't think it's the one most of you want either.
Science should stay pure. The agent can run the experiments โ but the ideas, the interpretation, and the responsibility belong to the human.
ๅญฆๆฏๅบๅฝไฟๆ็บฏ็ฒนใ Agent ๅฏไปฅๆฟไฝ ่ทๅฎ้ช๏ผไฝ ideaใๅคๆญไธ่ดฃไปป๏ผ่ฏท็็ปไบบๆฅๆฟๆ ใๆไปฌ็ๅฟๅธๆๆฏไธไฝไฝฟ็จ่ ้ฝ่ฝ human in the loop ๅฐๅปๆ่๏ผๆ่ฟไธชๅทฅๅ ท็ไธๆฅ็ๆถ้ด๏ผๆๅ ฅๅฐ็ๆญฃๅฑไบไฝ ่ชๅทฑ็็ ็ฉถๆนๅ้ใ
็งๅญฆใฏ็ด็ฒใงใใในใใงใใ Agent ใฏๅฎ้จใ่ตฐใใใใใจใใงใใพใใใใขใคใใขใป่งฃ้ใป่ฒฌไปปใฏใใฉใใไบบ้ใฎๆใซๆฎใใฆใใ ใใใ
๊ณผํ์ ์์ํด์ผ ํฉ๋๋ค. Agent๋ ์คํ์ ๋์ ์คํํด ์ค ์ ์์ง๋ง, ์์ด๋์ด์ ํด์, ๊ทธ๋ฆฌ๊ณ ์ฑ ์์ ๋ถ๋ ์ฌ๋์ ๋ชซ์ผ๋ก ๋จ๊ฒจ์ฃผ์ธ์.
We trust the people who pick up this tool to take that seriously โ and we built it because we believe most of you already do. Thank you for being one of them. ๐
You design the experiment. The agent handles the repetitive loop.
Deep Researcher Agent:
You sleep 8 hours โ Agent runs 3 experiment cycles
You go on vacation โ Agent explores 50+ hyperparameter configs
You write your paper โ Agent already has the results table ready
Not benchmarks. Real results from months of 24/7 autonomous operation across research projects.
| Metric | Result | |--------|--------| | Autonomous experiment cycles completed | 500+ | | Best single-project improvement | 52% over baseline (across 200+ auto-run experiments) | | Concurrent projects managed | 4 projects across 4 GPU servers | | Longest continuous autonomous operation | 30+ days without human intervention | | Average LLM cost per 24h cycle | ~$0.08 |
The #1 concern with running LLM agents 24/7: cost.
Most agent frameworks call the LLM every few minutes to "check progress". That's $50+/day.
Experiment Agent sleeps during training โ zero API calls. It only wakes the LLM when training finishes.
LLM Active Zero Cost LLM Active
โโโโโโโโโโโโโโ โโโโโโโโโโโโโโโโโโโโโโโ โโโโโโโโโโโโโโ
โ THINK โ โ TRAIN & MONITOR โ โ REFLECT โ
โ (5-10 min) โ โ (hours/days) โ โ (5-10 min) โ
โ โ โ โ โ โ
โ โข Analyze โ โ โข kill -0 $PID โ โ โข Parse โ
โ โข Plan โ โ โข nvidia-smi โ โ logs โ
โ โข Code โ โ โข