auto-deep-researcher-24x7

Name: auto-deep-researcher-24x7
Author: Xiangyue-Zhang

Verified

🔥 An autonomous AI agent that runs your deep learning experiments 24/7 while you sleep. Zero-cost monitoring, Leader-Worker architecture, constant-size memory.

1,238stars

101forks

Python

Installation

# Add to your Claude Code skills
git clone https://github.com/Xiangyue-Zhang/auto-deep-researcher-24x7

Getting Started

Guides for using ai agents skills like auto-deep-researcher-24x7.

Caveman: Cut Claude Token Use by 65%
How agent-side prompt compression works, when to use it, and when not to.
What is an AI Skills Marketplace?
Definitions, how marketplaces work, and how to choose between them in 2026.
Getting Started with AI Skills

Security ReportVerified

Last scanned: 5/6/2026

{
  "issues": [],
  "status": "PASSED",
  "scannedAt": "2026-05-06T06:33:22.718Z",
  "semgrepRan": false,
  "npmAuditRan": true,
  "pipAuditRan": false
}

README.md

Frequently Asked Questions

What is auto-deep-researcher-24x7?

auto-deep-researcher-24x7 is an open-source ai agents skill for AI coding assistants such as Claude Code, Codex CLI, and ChatGPT, built by Xiangyue-Zhang. 🔥 An autonomous AI agent that runs your deep learning experiments 24/7 while you sleep. Zero-cost monitoring, Leader-Worker architecture, constant-size memory. It has 1,238 GitHub stars.

Is auto-deep-researcher-24x7 safe to use?

Yes. auto-deep-researcher-24x7 passed SkillsLLM's automated security scan — a dependency vulnerability audit plus prompt-injection heuristics — with no high-severity issues. You can read the full report in the Security Report section on this page.

How do I install auto-deep-researcher-24x7?

Clone the repository with "git clone https://github.com/Xiangyue-Zhang/auto-deep-researcher-24x7" and add it to your Claude Code skills directory (see the Installation section above).

What programming language is auto-deep-researcher-24x7 written in?

auto-deep-researcher-24x7 is primarily written in Python. It is open-source under Xiangyue-Zhang on GitHub, so you can review or fork the full source.

Are there alternatives to auto-deep-researcher-24x7?

Yes. SkillsLLM lists many other AI Agents skills you can browse and compare side by side. Open the AI Agents category from the badge at the top of this page, or use the Related Skills and comparison links further down to weigh auto-deep-researcher-24x7 against similar tools.

Agentic AI for Beginners

Build your first AI agent from scratch - tool use, ReAct pattern, memory, deployment

41 minBeginner

Comments (0)

to leave a comment.

No comments yet. Be the first to share your thoughts!

Related Skills

superpowers

by obra

An agentic skills framework & software development methodology that works.

234,966

nocturne_memory AI-fundermentals

Recent Updates

2026-06-03 — Domestic LLM API presets

Run the agent on a Chinese LLM API instead of a Claude/Codex subscription by setting agent.provider to a one-word preset — deepseek, qwen (dashscope), kimi (moonshot), or glm (zhipu). The preset auto-fills the OpenAI-compatible base_url and the default key env (DEEPSEEK_API_KEY / DASHSCOPE_API_KEY / MOONSHOT_API_KEY / ZHIPUAI_API_KEY); you just set model to that vendor's model id. base_url / api_key_env stay overridable for self-hosted or proxied endpoints. This is a thin alias over the existing OpenAI-compatible path — no new dependency. (core/agents.py)
```
agent:
  provider: "deepseek"      # or qwen / kimi / glm
  model: "deepseek-chat"    # vendor's model id
```

2026-06-02 — Slurm execution backend + truthful experiment outcomes

Slurm execution backend — added execution.mode: "slurm" so the agent can drive experiments on a Slurm cluster. The controller stays local; training is submitted to the login node with sbatch --parsable over a single transient SSH call that exits immediately — no process is ever left running on the login node. sacct is the sole liveness authority (Slurm enforces --time), GPU status is read from the partition's squeue occupancy, and two bounds inside the liveness check (consecutive-unknown grace + a --time-derived wall-clock backstop) guarantee the monitor loop terminates even if the cluster goes unreachable — without ever reaping a job sacct still reports as queued or running. File and repo-reading ops reuse the SSH path (the login node shares the NFS workspace). (core/execution.py)
Truthful experiment outcomes — the monitor now asks the backend for a finished job's real terminal state via final_status(), so a FAILED / TIMEOUT / CANCELLED run is no longer silently recorded as completed. The outcome flows into state.json, the experiment ledger, and the REFLECT context, so the agent reasons over what actually happened. On Slurm the state comes from sacct; pid-only backends (local/ssh) report it as indeterminate and keep prior behavior. (core/monitor.py, core/loop.py)
Additive and opt-in; local/ssh behavior is unchanged. (+21 unit tests, no cluster required.)

2026-06-01 — v2.0 (major update)

This release gives the agent (a) a persistent, queryable memory of its own experiments, (b) explicit progress/quality/safety signals derived from that memory, and (c) much stronger code- and literature-reading tools. Every change is additive and backward-compatible — existing projects keep working unchanged, the new gate and rate limit are opt-in, and the whole suite is unit-tested without a GPU or network (60 → 99 tests).

New: autonomy layer

Experiment ledger — every cycle's hypothesis, metrics, and outcome are appended to workspace/experiments.jsonl. Crash-safe, zero token cost, and fed back into planning so the agent remembers what it already tried. (core/ledger.py)
Data-driven stagnation signal — the planner is told, from the ledger's metric trajectory, whether results are still improving or have stalled (set ledger.metric_key), instead of only a binary repeat-counter.
Append-only research journals — DEAD_ENDS.md (failed approaches — do not retry) and INSIGHTS.md (durable observations). Never compacted; rotated to dated backups when large, so history is never silently dropped. (core/journal.py)
Zero-cost violation scanner + advisory phase gate — surface stuck/stale states and whether a baseline metric bar is met, as pure functions over state + ledger. (core/safety.py, core/ledger.py)
Proactive anti-burn rate limiting — optional agent.max_cycles_per_hour cap protects budget when the agent is stuck in a loop.

New: agent tools

Code comprehension — search_code (regex grep across the workspace), list_tree (recursive, depth-limited repo map), and read_file line ranges so large files are no longer blindly truncated. Symlink-safe (never escapes the workspace).
Literature — get_paper (paper details + reference/citation snowballing) and search_arxiv (freshest preprints), alongside the existing Semantic Scholar search.
All new tools work identically in local and SSH execution modes.

Config: new optional sections ledger:, stagnation:, journal:, safety:, gates:, and agent.max_cycles_per_hour — all default to current behavior. See config.yaml.

2026-04-22

Added explicit compatible-API configuration, dual Claude/Codex skill installation, and safer skill-installer ownership checks.

2026-04-21

Added an optional SSH execution backend so the controller can stay local while code edits, training, logs, PID checks, and GPU queries run on one remote host.

2026-04-19

Added a real multi-turn worker tool-use loop with authoritative tool-result handoff, stricter CLI behavior, and safer tool-call parsing.

2026-04-18

Added subscription-backed claude_cli and codex_cli provider modes with fail-fast provider validation and more defensive CLI subprocess handling.

2026-04-09

Reduced token growth and tightened loop/tool safeguards with leader-history resets, no-progress fallback, and stronger path and shell protections.

2026-04-08

Added progress tracking exports with optional Obsidian sync and local text fallback when no vault is configured.

Start In 3 Steps

If you only want the shortest path to a working experiment loop, do this:

Create a project folder with one file: PROJECT_BRIEF.md
Run /auto-experiment --project /path/to/project --gpu 0
Check progress with /experiment-status or optional Obsidian/local text notes

Prefer AI-guided setup? Open AI_GUIDE.md in Claude / ChatGPT / Codex and let the assistant walk you through it.

What You Actually Need

Requirement	Required	Notes
Python 3.10+	Yes	Runtime
1+ NVIDIA GPU	Yes	For training
API key	Yes	Anthropic-compatible or OpenAI-compatible endpoint
`PROJECT_BRIEF.md`	Yes	Main control file
Project `config.yaml`	Optional	Only if you want to override defaults
Obsidian vault	Optional	If absent, notes fall back to local text files

Minimum Working Example

The smallest project you can launch looks like this:

my-first-experiment/
├── PROJECT_BRIEF.md
└── workspace/                  # auto-created

Minimal PROJECT_BRIEF.md:

# Goal
Train a ResNet-50 on CIFAR-100 to reach 80%+ accuracy.

# Codebase
Create the training code from scratch in PyTorch.

# What to Try
- Start with a basic ResNet-50 baseline.
- If accuracy < 75%, improve optimization and schedule.
- If accuracy is 75-80%, try augmentation.
- If accuracy > 80%, stop and report.

# Constraints
- Use GPU 0 only
- Max 100 epochs per run

That is enough to start. Everything else is optional refinement.

What This Project Is Good At

This project is for people who already know what experiment they want to run, but do not want to babysit the loop:

edit code
launch training
monitor runs
parse logs
decide the next variation
keep going while you sleep

It is not trying to replace the researcher. It is trying to take over the repetitive experiment-ops layer.

Why It Feels Different From A Simple Script

It does not just launch one run. It keeps iterating.
It does not just monitor. It reflects and decides the next step.
It stays cheap because training-time monitoring makes zero LLM calls.
It stays controllable because the human can override direction at any cycle.
It now supports persistent progress notes in Obsidian or local text files.

How You Stay In Control

You control the research direction through three files:

PROJECT_BRIEF.md: stable goal, constraints, allowed search space
HUMAN_DIRECTIVE.md: temporary redirect for the next cycle
workspace/MEMORY_LOG.md: rolling memory of results and decisions

Common control patterns:

# Keep the search narrow
- Only tune augmentation.
- Do not change the backbone.
- Keep training budget fixed.

# Make the agent stop exploring a weak direction
- If gain stays below 0.3 points for 3 runs, stop this branch.
- Return to the last tru