Academic Research Skills for Claude Code

A comprehensive suite of Claude Code skills for academic research, covering the full pipeline from research to publication.

AI is your copilot, not the pilot. This tool won't write your paper for you. It handles the grunt work — hunting down references, formatting citations, verifying data, checking logical consistency — so you can focus on the parts that actually require your brain: defining the question, choosing the method, interpreting what the data means, and writing the sentence after "I argue that."

Unlike a humanizer, this tool doesn't help you hide the fact that you used AI. It helps you write better. Style Calibration learns your voice from past work. Writing Quality Check catches the patterns that make prose feel machine-generated. The goal is quality, not cheating.

Why human-in-the-loop, not full automation?

Lu et al. (2026, Nature 651:914-919) built The AI Scientist — the first fully autonomous AI research system to publish a paper through blind peer review at a top-tier ML venue (ICLR 2025 workshop, score 6.33/10 vs workshop average 4.87). Their Limitations section enumerates the failure modes that any fully-autonomous AI research pipeline inherits: implementation bugs, hallucinated results, shortcut reliance, bug-as-insight reframing, methodology fabrication, frame-lock, citation hallucinations.

ARS is built on the premise that a human researcher augmented by AI avoids these failure modes better than either alone. Stage 2.5 and Stage 4.5 integrity gates run a 7-mode blocking checklist (see academic-pipeline/references/ai_research_failure_modes.md); the reviewer offers an opt-in calibration mode that measures its own FNR/FPR against a user-supplied gold set.

v3.3 was inspired by PaperOrchestra (Song, Song, Pfister & Yoon, 2026, Google): Semantic Scholar API verification, anti-leakage protocol, VLM figure verification, and score trajectory tracking.

Architecture & pipeline

👉 docs/ARCHITECTURE.md — the full pipeline view: flow diagram, stage-by-stage matrix, data-access flow, skill dependency graph, quality gates, and mode list.

The architecture doc supersedes the sprawling pipeline description that used to live here. Everything about what runs in which stage now lives in one place.

Setup & installation

👉 docs/SETUP.md — install Claude Code, set up API keys, optional Pandoc/tectonic for DOCX/PDF, cross-model verification (ARS_CROSS_MODEL), and four installation methods including claude.ai Project import.

Performance & cost

👉 docs/PERFORMANCE.md — per-mode token budgets, full-pipeline estimate (~$4–6 for a 15k-word paper), and recommended Claude Code settings (Skip Permissions; Agent Team optional).

Guides & articles

Academic Writing Shouldn't Be a Solo Act — full pipeline walkthrough (English)
學術寫作不該是一個人的事：一套開源 AI 協作工具如何改變研究者的工作流 — 完整使用指南（繁體中文）

Features at a glance

Deep Research — 13-agent research team with Socratic guided mode, PRISMA systematic review, intent detection, dialogue health monitoring, optional cross-model DA, Semantic Scholar API verification.
Academic Paper — 12-agent paper writing with Style Calibration, Writing Quality Check, LaTeX hardening, visualization, revision coaching, citation conversion, anti-leakage protocol, and VLM figure verification.
Academic Paper Reviewer — 7-agent multi-perspective peer review with 0–100 quality rubrics (EIC + 3 dynamic reviewers + Devil's Advocate), concession threshold protocol, attack intensity preservation, optional cross-model DA critique / calibration, R&R traceability matrix, read-only constraint.
Academic Pipeline — 10-stage pipeline orchestrator with adaptive checkpoints, claim verification, Material Passport, optional repro_lock, optional cross-model integrity verification, mid-conversation reinforcement, and score trajectory tracking.
Data Access Level Metadata (v3.3.2+) — every skill declares data_access_level (raw / redacted / verified_only); enforced by scripts/check_data_access_level.py. Pattern adapted from Anthropic's automated-w2s-researcher (2026). See shared/ground_truth_isolation_pattern.md.
Task Type Annotation (v3.3.2+) — every skill declares task_type (open-ended or outcome-gradable). All current ARS skills are open-ended.
Benchmark Report Schema (v3.3.5+) — JSON Schema + lint for honest benchmark comparisons. See shared/benchmark_report_pattern.md.
Artifact Reproducibility Lockfile (v3.3.5+) — optional repro_lock sub-block on Material Passport. Configuration documentation, not replay guarantee — LLM outputs are not byte-reproducible. See shared/artifact_reproducibility_pattern.md.

Showcase: real pipeline output

See the complete artifacts from a real 10-stage pipeline run — peer review reports, integrity verification reports, and the final paper:

Browse all pipeline artifacts →

| Artifact | Description | |---|---| | Final Paper (EN) | APA 7.0 formatted, LaTeX-compiled | | Final Paper (ZH) | Chinese version, APA 7.0 | | Integrity Report — Pre-Review | Stage 2.5: caught 15 fabricated refs + 3 statistical errors | | Integrity Report — Final | Stage 4.5: zero regressions confirmed | | Peer Review Round 1 | EIC + 3 Reviewers + Devil's Advocate | | Re-Review | Verification after revisions | | Peer Review Round 2 | Follow-up review | | Response to Reviewers | Point-by-point author response | | Post-Publication Audit Report | Independent full-reference audit: found 21/68 issues missed by 3 rounds of integrity checks |

Companion: Experiment Agent

If your research involves running experiments (code or human studies) before writing, the Experiment Agent skill fills the gap between ARS Stage 1 (RESEARCH) and Stage 2 (WRITE).

ARS Stage 1 RESEARCH  →  RQ Brief + Methodology Blueprint
        ↓
  experiment-agent     →  run/manage experiments → validate results
        ↓
ARS Stage 2 WRITE     →  write paper with verified experiment results

What it does: executes code experiments (Python, R, etc.) with real-time monitoring, manages human study protocols with IRB ethics checklist, interprets statistics with 11-type fallacy detection, and verifies reproducibility.

How to use together: pause the ARS pipeline after Stage 1, run experiments in a separate experiment-agent session, then bring the results (with Material Passport) back to ARS Stage 2. ARS requires zero modification. See the experiment-agent README for setup instructions.

Usage

Quick Start

# Start a full research pipeline
You: "I want to write a research paper on AI's impact on higher education QA"

# Start with Socratic guidance
You: "Guide my research on AI in educational evaluation"

# Write a paper with guided planning
You: "Guide me through writing a paper on demographic decline"

# Review an existing paper
You: "Review this paper" (then provide the paper)

# Check pipeline status
You: "status"

Individual Skills

Deep Research (7 modes)

"Research the impact of AI on higher education"       → full mode
"Give me a quick brief on X"                          → quick mode
"Do a systematic review on X with PRISMA"             → systematic-review mode
"Guide my research on X"                              → socratic mode (guided)
"Fact-check these claims"                             → fact-check mode
"Do a literature review on X"                         → lit-review mode
"Review this paper's research quality"                → review mode

Academic Paper (10 modes)

"Write a paper on X"                                  → full mode
"Guide me through writing a paper"                    → plan mode (guided)
"Build a paper outline"                               → outline-only mode
"I have a draft, here are reviewer comments"          → revision mode
"Parse these reviewer comments into a roadmap"        → revision-coach mode
"Write an abstract for this paper"                    → abstract-only mode
"Turn this into a literature review paper"            → lit-review mode
"Convert to LaTeX" / "Convert citations to IEEE"      → format-convert mode
"Check citations"                                     → citation-check mode
"Generate an AI disclosure statement for NeurIPS"     → disclosure mode

Academic Paper Reviewer (6 modes)

"Review this paper"                                   → full mode (EIC + R1/R2/R3 + Devil's

Academic Research Skills for Claude Code

繁體中文版

A comprehensive suite of Claude Code skills for academic research, covering the full pipeline from research to publication.

AI is your copilot, not the pilot. This tool won't write your paper for you. It handles the grunt work — hunting down references, formatting citations, verifying data, checking logical consistency — so you can focus on the parts that actually require your brain: defining the question, choosing the method, interpreting what the data means, and writing the sentence after "I argue that."

Unlike a humanizer, this tool doesn't help you hide the fact that you used AI. It helps you write better. Style Calibration learns your voice from past work. Writing Quality Check catches the patterns that make prose feel machine-generated. The goal is quality, not cheating.

Why human-in-the-loop, not full automation?

v3.3 was inspired by PaperOrchestra (Song, Song, Pfister & Yoon, 2026, Google): Semantic Scholar API verification, anti-leakage protocol, VLM figure verification, and score trajectory tracking.

Architecture & pipeline

👉 docs/ARCHITECTURE.md — the full pipeline view: flow diagram, stage-by-stage matrix, data-access flow, skill dependency graph, quality gates, and mode list.

The architecture doc supersedes the sprawling pipeline description that used to live here. Everything about what runs in which stage now lives in one place.

Setup & installation

Performance & cost

👉 docs/PERFORMANCE.md — per-mode token budgets, full-pipeline estimate (~$4–6 for a 15k-word paper), and recommended Claude Code settings (Skip Permissions; Agent Team optional).

Guides & articles

Academic Writing Shouldn't Be a Solo Act — full pipeline walkthrough (English)
學術寫作不該是一個人的事：一套開源 AI 協作工具如何改變研究者的工作流 — 完整使用指南（繁體中文）

Features at a glance

Deep Research — 13-agent research team with Socratic guided mode, PRISMA systematic review, intent detection, dialogue health monitoring, optional cross-model DA, Semantic Scholar API verification.
Academic Paper — 12-agent paper writing with Style Calibration, Writing Quality Check, LaTeX hardening, visualization, revision coaching, citation conversion, anti-leakage protocol, and VLM figure verification.
Academic Paper Reviewer — 7-agent multi-perspective peer review with 0–100 quality rubrics (EIC + 3 dynamic reviewers + Devil's Advocate), concession threshold protocol, attack intensity preservation, optional cross-model DA critique / calibration, R&R traceability matrix, read-only constraint.
Academic Pipeline — 10-stage pipeline orchestrator with adaptive checkpoints, claim verification, Material Passport, optional repro_lock, optional cross-model integrity verification, mid-conversation reinforcement, and score trajectory tracking.
Data Access Level Metadata (v3.3.2+) — every skill declares data_access_level (raw / redacted / verified_only); enforced by scripts/check_data_access_level.py. Pattern adapted from Anthropic's automated-w2s-researcher (2026). See shared/ground_truth_isolation_pattern.md.
Task Type Annotation (v3.3.2+) — every skill declares task_type (open-ended or outcome-gradable). All current ARS skills are open-ended.
Benchmark Report Schema (v3.3.5+) — JSON Schema + lint for honest benchmark comparisons. See shared/benchmark_report_pattern.md.
Artifact Reproducibility Lockfile (v3.3.5+) — optional repro_lock sub-block on Material Passport. Configuration documentation, not replay guarantee — LLM outputs are not byte-reproducible. See shared/artifact_reproducibility_pattern.md.

Showcase: real pipeline output

See the complete artifacts from a real 10-stage pipeline run — peer review reports, integrity verification reports, and the final paper:

Browse all pipeline artifacts →

Companion: Experiment Agent

If your research involves running experiments (code or human studies) before writing, the Experiment Agent skill fills the gap between ARS Stage 1 (RESEARCH) and Stage 2 (WRITE).

ARS Stage 1 RESEARCH  →  RQ Brief + Methodology Blueprint
        ↓
  experiment-agent     →  run/manage experiments → validate results
        ↓
ARS Stage 2 WRITE     →  write paper with verified experiment results

Usage

Quick Start

# Start a full research pipeline
You: "I want to write a research paper on AI's impact on higher education QA"

# Start with Socratic guidance
You: "Guide my research on AI in educational evaluation"

# Write a paper with guided planning
You: "Guide me through writing a paper on demographic decline"

# Review an existing paper
You: "Review this paper" (then provide the paper)

# Check pipeline status
You: "status"

Individual Skills

Deep Research (7 modes)

"Research the impact of AI on higher education"       → full mode
"Give me a quick brief on X"                          → quick mode
"Do a systematic review on X with PRISMA"             → systematic-review mode
"Guide my research on X"                              → socratic mode (guided)
"Fact-check these claims"                             → fact-check mode
"Do a literature review on X"                         → lit-review mode
"Review this paper's research quality"                → review mode

Academic Paper (10 modes)

"Write a paper on X"                                  → full mode
"Guide me through writing a paper"                    → plan mode (guided)
"Build a paper outline"                               → outline-only mode
"I have a draft, here are reviewer comments"          → revision mode
"Parse these reviewer comments into a roadmap"        → revision-coach mode
"Write an abstract for this paper"                    → abstract-only mode
"Turn this into a literature review paper"            → lit-review mode
"Convert to LaTeX" / "Convert citations to IEEE"      → format-convert mode
"Check citations"                                     → citation-check mode
"Generate an AI disclosure statement for NeurIPS"     → disclosure mode

Academic Paper Reviewer (6 modes)

"Review this paper"                                   → full mode (EIC + R1/R2/R3 + Devil's

academic-research-skills

Academic Research Skills for Claude Code

Why human-in-the-loop, not full automation?

Architecture & pipeline

Setup & installation

Performance & cost

Guides & articles

Features at a glance

Showcase: real pipeline output

Companion: Experiment Agent

Usage

Quick Start

Individual Skills

Deep Research (7 modes)

Academic Paper (10 modes)

Academic Paper Reviewer (6 modes)

Related Skills

Developers Also Liked

academic-research-skills

Academic Research Skills for Claude Code

Why human-in-the-loop, not full automation?

Architecture & pipeline

Setup & installation

Performance & cost

Guides & articles

Features at a glance

Showcase: real pipeline output

Companion: Experiment Agent

Usage

Quick Start

Individual Skills

Deep Research (7 modes)

Academic Paper (10 modes)

Academic Paper Reviewer (6 modes)

Related Skills

Developers Also Liked