ResearcherSkill

Name: ResearcherSkill
Author: krzysztofdudek

by krzysztofdudek

Verified

One file. Your AI coding agent becomes a scientist. 30+ experiments while you sleep.

239stars

27forks

Python

Added 3/26/2026

View on GitHub Download ZIP Scan for vulnerabilities

30 days in the Featured rail · terms & refunds

AI Agentsai-agentai-codingautonomousautoresearchclaude-codecodexcursordeveloper-toolsexperimentationgemini-clioptimizationprompt-engineeringresearch-automationskill

Installation

# Add to your Claude Code skills
git clone https://github.com/krzysztofdudek/ResearcherSkill

Getting Started

Guides for using ai agents skills like ResearcherSkill.

Security ReportVerified

Last scanned: 5/30/2026

{
  "issues": [],
  "status": "PASSED",
  "scannedAt": "2026-05-30T15:31:14.723Z",
  "npmAuditRan": true,
  "pipAuditRan": true
}

README.md

Researcher Skill

One file. Your AI coding agent becomes a scientist.

Install as a Claude Code plugin, or drop skills/researcher/SKILL.md into Codex, Cursor, or any agent that reads markdown skills. The agent designs experiments, tests hypotheses, discards what fails, keeps what works — 30+ experiments overnight while you sleep.

Install

Claude Code plugin (recommended)

Two slash commands inside Claude Code — first registers this repo as a marketplace, second installs the plugin from it:

/plugin marketplace add krzysztofdudek/ResearcherSkill
/plugin install researcher@researcher-marketplace

Run /reload-plugins to activate it (or restart Claude Code), then trigger the skill with /researcher or by asking the agent to run a research loop on something.

To upgrade later: /plugin marketplace update researcher-marketplace then /plugin install researcher@researcher-marketplace again.

GitHub Copilot CLI plugin

The same repo is also a GitHub Copilot CLI marketplace. Register it, then install the plugin:

copilot plugin marketplace add krzysztofdudek/ResearcherSkill
copilot plugin install researcher@researcher-marketplace

To upgrade later: copilot plugin update researcher. The same skill body powers both Claude Code and Copilot — trigger it the same way.

Codex CLI plugin

Codex reads the same skill. Register this repo as a marketplace, then install:

codex plugin marketplace add krzysztofdudek/ResearcherSkill
codex plugin install researcher@researcher-marketplace

To upgrade later: codex plugin marketplace upgrade researcher-marketplace. Or drop the single file into ~/.agents/skills/researcher/SKILL.md (user-level) or .agents/skills/researcher/SKILL.md (project-level).

Cursor plugin

Cursor auto-discovers the skill from the plugin manifest at the repo root. Install it locally:

git clone https://github.com/krzysztofdudek/ResearcherSkill.git
ln -s "$(pwd)/ResearcherSkill" ~/.cursor/plugins/local/researcher

Then reload Cursor (Developer: Reload Window). Or drop the single file into ~/.cursor/skills/researcher/SKILL.md (user-level) or .cursor/skills/researcher/SKILL.md (project-level).

Single-file drop-in (any agent)

The canonical skill body is skills/researcher/SKILL.md in this repo (one file, ~300 lines, frontmatter-tagged). Copy it into your agent's skill directory:

Claude Code (user-level): ~/.claude/skills/researcher/SKILL.md
Claude Code (project-level): .claude/skills/researcher/SKILL.md in your repo
Codex / other agents: wherever your tool reads skills or instructions from (consult its docs)

Trigger with /researcher (Claude Code) or by asking the agent to enter "researcher mode".

What it looks like running

Experiment b4 — READ/WRITE phase separation

Branch: research/graph-protocol-optimization · Parent: #b1 · Type: real

Hypothesis: Agents read architectural rules but treat them as optional. Separating the instruction into a READ phase ("load constraints first") and a WRITE phase ("now implement") with a guard ("if you haven't done READ, stop") should improve compliance. Changes: restructured agent rules into explicit READ/WRITE phases, added structural guard Result: 7.04/10 (was 1.82 baseline, 5.91 best) — new best Status: keep

Insight: Every attempt to add verification checklists regressed. What worked was changing the structure, not adding steps. Agents respond to framing, not policing.

b0: baseline (no special instructions): 1.82/10. keep.
b1: reframe rules as "constraints, not suggestions": 5.91. keep.
b2: exhaustive checklist: regression. discard.
b3: lightweight checkpoint: regression. discard.
b4: READ/WRITE separation + structural guard: 7.04. keep.
b5: contractual "implement or document exception": regression. discard.
b6: JIT re-reading: 5.23, evaluator disagreement. interesting.
b7: mandatory pattern-triggered re-reading: 1.4. regression below baseline. discard.

Real experiment from optimizing Yggdrasil agent rules. The skill works on any codebase.

Same loop, different problems:

npm run build takes 40s → agent gets it to 18s
prompt returns wrong format 30% of the time → agent gets it to 3%
API p99 is 200ms → agent finds the bottleneck and cuts it to 80ms
document parser misses edge cases → agent improves match rate from 74% to 91%

How it works

The agent interviews you about what to optimize, sets up a lab on a git branch, and works autonomously. Thinks, tests, reflects. Commits before every experiment, reverts on failure, logs everything.

It detects when it's stuck and changes strategy. Forks branches to explore different approaches. Keeps going until you stop it or it hits a target. Resume where you left off across sessions.

Generalizes autoresearch beyond ML. Works on any problem where you can measure a result — code, configs, prompts, documents.

All experiment history lives in an untracked .lab/ directory. Git manages code. .lab/ manages knowledge.

Want the full walkthrough? Read the guide. It walks through a complete example from start to finish.

FAQ

How is this different from autoresearch? Autoresearch's core loop is universal, but the repo is wired to train.py, val_bpb, and GPU training. To use it on something else you'd rewrite the setup. This gives you that loop ready to go for any codebase.

When would I use this instead of ML? It's not instead of ML. ML is one possible domain. This works on anything where the agent can try things, measure, and iterate. Code, scripts, documents, configs. Slow builds, flaky tests, API latency, prompt accuracy.

How does it measure success for non-ML code? Whatever you can measure. Test pass rate, benchmark output, type check errors, build time. You set it up in the discovery phase. The agent asks what to measure and how. If you can run a command and get a number, that's your metric. For cases where there's no command to run, the agent scores against a qualitative rubric you define together.

How does convergence detection work? The agent checks a table of signals after every experiment. If it sees 5+ failures in a row, a metric plateau, or the same area modified too many times, it knows to change approach. Some signals are advisory (consider pivoting), others are hard guardrails (you must pivot). Details in the guide.

Can it improve itself? Sort of. The skill was optimized using the skill itself. A research document about how LLMs process instructions (attention decay, primacy/recency, instruction budgets) was used as criteria, and the agent ran the loop against its own prompt. Not fully recursive, but the loop was: research → skill → use skill to improve skill.

Can't I just ask Claude to build this from the autoresearch repo? You can try. This saves you the work and includes things autoresearch doesn't have: thought experiments, non-linear branching, convergence detection, qualitative metrics, and session resume.

License

MIT

The Yggdrasil family

Four tools, one thesis: make an AI coding agent prove correctness, stage by stage — because "done" isn't done. Each is a checkpoint at a different point in the pipeline, where the agent has to show its work before it continues.

Tool	Stage	What it makes the agent prove
Ratatoskr	request → intent	Reads your request back in plain words so you see what it understood before it builds.
Urd	intent → code	When the spec is ambiguous, it consults the source of truth and asks instead of guessing.
Yggdrasil	code → architecture	Every change satisfies the rules that govern it, checked before the agent moves on.
Researcher (this one)	code → measured result	Point it at a metric and it runs experiments, hypotheses kept and discarded.

Frequently Asked Questions

What is ResearcherSkill?

ResearcherSkill is an open-source ai agents skill for AI coding assistants such as Claude Code, Codex CLI, and ChatGPT, built by krzysztofdudek. One file. Your AI coding agent becomes a scientist. 30+ experiments while you sleep. It has 239 GitHub stars.

Is ResearcherSkill safe to use?

Yes. ResearcherSkill passed SkillsLLM's automated security scan — a dependency vulnerability audit plus prompt-injection heuristics — with no high-severity issues. You can read the full report in the Security Report section on this page.

How do I install ResearcherSkill?

Clone the repository with "git clone https://github.com/krzysztofdudek/ResearcherSkill" and add it to your Claude Code skills directory (see the Installation section above).

What programming language is ResearcherSkill written in?

ResearcherSkill is primarily written in Python. It is open-source under krzysztofdudek on GitHub, so you can review or fork the full source.

Are there alternatives to ResearcherSkill?

Yes. SkillsLLM lists many other AI Agents skills you can browse and compare side by side. Open the AI Agents category from the badge at the top of this page, or use the Related Skills and comparison links further down to weigh ResearcherSkill against similar tools.

Agentic AI for Beginners

Build your first AI agent from scratch - tool use, ReAct pattern, memory, deployment

41 minBeginner

Comments (0)

to leave a comment.

No comments yet. Be the first to share your thoughts!

Related Skills

superpowers

by obra

An agentic skills framework & software development methodology that works.

234,966

20,863

Shell

AI Agentsaibrainstorming

View details

Compare

ECC

by affaan-m

The agent harness performance optimization system. Skills, instincts, memory, security, and research-first development for Claude Code, Codex, Opencode, Cursor and beyond.

220,737

33,797

JavaScript

AI Agentsai-agentsanthropic

The agent that grows with you

201,254

35,916

Python

AI Agentsaiai-agent

View details

Compare

everything-claude-code

by affaan-m

The agent harness performance optimization system. Skills, instincts, memory, security, and research-first development for Claude Code, Codex, Opencode, Cursor and beyond.

185,940

28,768

JavaScript

AI Agentsai-agentsanthropic

View details

Compare

claude-code

by anthropics

Claude Code is an agentic coding tool that lives in your terminal, understands your codebase, and helps you code faster by executing routine tasks, explaining complex code, and handling git workflows - all through natural language commands.

120,031

19,897

Shell

AI Agents

View details

Compare

gstack

by garrytan

Use Garry Tan's exact Claude Code setup: 23 opinionated tools that serve as CEO, Designer, Eng Manager, Release Manager, Doc Engineer, and QA

112,329

16,695

TypeScript

AI AgentsSKILL.md

View details

Compare

Browse all AI Agents skills

PIXRA hermes-agent-cn-desktop

Researcher Skill

One file. Your AI coding agent becomes a scientist.

Install

Claude Code plugin (recommended)

Two slash commands inside Claude Code — first registers this repo as a marketplace, second installs the plugin from it:

/plugin marketplace add krzysztofdudek/ResearcherSkill
/plugin install researcher@researcher-marketplace

Run /reload-plugins to activate it (or restart Claude Code), then trigger the skill with /researcher or by asking the agent to run a research loop on something.

To upgrade later: /plugin marketplace update researcher-marketplace then /plugin install researcher@researcher-marketplace again.

GitHub Copilot CLI plugin

The same repo is also a GitHub Copilot CLI marketplace. Register it, then install the plugin:

copilot plugin marketplace add krzysztofdudek/ResearcherSkill
copilot plugin install researcher@researcher-marketplace

To upgrade later: copilot plugin update researcher. The same skill body powers both Claude Code and Copilot — trigger it the same way.

Codex CLI plugin

Codex reads the same skill. Register this repo as a marketplace, then install:

codex plugin marketplace add krzysztofdudek/ResearcherSkill
codex plugin install researcher@researcher-marketplace

Cursor plugin

Cursor auto-discovers the skill from the plugin manifest at the repo root. Install it locally:

git clone https://github.com/krzysztofdudek/ResearcherSkill.git
ln -s "$(pwd)/ResearcherSkill" ~/.cursor/plugins/local/researcher

Then reload Cursor (Developer: Reload Window). Or drop the single file into ~/.cursor/skills/researcher/SKILL.md (user-level) or .cursor/skills/researcher/SKILL.md (project-level).

Single-file drop-in (any agent)

The canonical skill body is skills/researcher/SKILL.md in this repo (one file, ~300 lines, frontmatter-tagged). Copy it into your agent's skill directory:

Claude Code (user-level): ~/.claude/skills/researcher/SKILL.md
Claude Code (project-level): .claude/skills/researcher/SKILL.md in your repo
Codex / other agents: wherever your tool reads skills or instructions from (consult its docs)

Trigger with /researcher (Claude Code) or by asking the agent to enter "researcher mode".

What it looks like running

Experiment b4 — READ/WRITE phase separation

Branch: research/graph-protocol-optimization · Parent: #b1 · Type: real

Hypothesis: Agents read architectural rules but treat them as optional. Separating the instruction into a READ phase ("load constraints first") and a WRITE phase ("now implement") with a guard ("if you haven't done READ, stop") should improve compliance. Changes: restructured agent rules into explicit READ/WRITE phases, added structural guard Result: 7.04/10 (was 1.82 baseline, 5.91 best) — new best Status: keep

Insight: Every attempt to add verification checklists regressed. What worked was changing the structure, not adding steps. Agents respond to framing, not policing.

b0: baseline (no special instructions): 1.82/10. keep.
b1: reframe rules as "constraints, not suggestions": 5.91. keep.
b2: exhaustive checklist: regression. discard.
b3: lightweight checkpoint: regression. discard.
b4: READ/WRITE separation + structural guard: 7.04. keep.
b5: contractual "implement or document exception": regression. discard.
b6: JIT re-reading: 5.23, evaluator disagreement. interesting.
b7: mandatory pattern-triggered re-reading: 1.4. regression below baseline. discard.

Real experiment from optimizing Yggdrasil agent rules. The skill works on any codebase.

Same loop, different problems:

npm run build takes 40s → agent gets it to 18s
prompt returns wrong format 30% of the time → agent gets it to 3%
API p99 is 200ms → agent finds the bottleneck and cuts it to 80ms
document parser misses edge cases → agent improves match rate from 74% to 91%

How it works

The agent interviews you about what to optimize, sets up a lab on a git branch, and works autonomously. Thinks, tests, reflects. Commits before every experiment, reverts on failure, logs everything.

It detects when it's stuck and changes strategy. Forks branches to explore different approaches. Keeps going until you stop it or it hits a target. Resume where you left off across sessions.

Generalizes autoresearch beyond ML. Works on any problem where you can measure a result — code, configs, prompts, documents.

All experiment history lives in an untracked .lab/ directory. Git manages code. .lab/ manages knowledge.

Want the full walkthrough? Read the guide. It walks through a complete example from start to finish.

FAQ

License

MIT

The Yggdrasil family

Tool	Stage	What it makes the agent prove
Ratatoskr	request → intent	Reads your request back in plain words so you see what it understood before it builds.
Urd	intent → code	When the spec is ambiguous, it consults the source of truth and asks instead of guessing.
Yggdrasil	code → architecture	Every change satisfies the rules that govern it, checked before the agent moves on.
Researcher (this one)	code → measured result	Point it at a metric and it runs experiments, hypotheses kept and discarded.