paperjury

Name: paperjury
Author: u7079256

Verified

Pre-submission AI review stress-test for research papers. A Claude Code skill: review, verdict, revise, verify.

829stars

40forks

JavaScript

Installation

# Add to your Claude Code skills
git clone https://github.com/u7079256/paperjury

Getting Started

Guides for using ai agents skills like paperjury.

Caveman: Cut Claude Token Use by 65%
How agent-side prompt compression works, when to use it, and when not to.
What is an AI Skills Marketplace?
Definitions, how marketplaces work, and how to choose between them in 2026.
Getting Started with AI Skills

SKILL.md

Security ReportVerified

Last scanned: 6/16/2026

{
  "issues": [],
  "status": "PASSED",
  "scannedAt": "2026-06-16T09:27:00.395Z",
  "npmAuditRan": true,
  "pipAuditRan": true,
  "promptInjectionRan": true
}

README.md

Frequently Asked Questions

What is paperjury?

paperjury is an open-source ai agents skill for AI coding assistants such as Claude Code, Codex CLI, and ChatGPT, built by u7079256. Pre-submission AI review stress-test for research papers. A Claude Code skill: review, verdict, revise, verify. It has 829 GitHub stars.

Is paperjury safe to use?

Yes. paperjury passed SkillsLLM's automated security scan — a dependency vulnerability audit plus prompt-injection heuristics — with no high-severity issues. You can read the full report in the Security Report section on this page.

How do I install paperjury?

Clone the repository with "git clone https://github.com/u7079256/paperjury" and add it to your Claude Code skills directory (see the Installation section above). paperjury ships a SKILL.md manifest, so compatible agents can discover and load it automatically.

What programming language is paperjury written in?

paperjury is primarily written in JavaScript. It is open-source under u7079256 on GitHub, so you can review or fork the full source.

Are there alternatives to paperjury?

Yes. SkillsLLM lists many other AI Agents skills you can browse and compare side by side. Open the AI Agents category from the badge at the top of this page, or use the Related Skills and comparison links further down to weigh paperjury against similar tools.

Agentic AI for Beginners

Build your first AI agent from scratch - tool use, ReAct pattern, memory, deployment

41 minBeginner

Comments (0)

to leave a comment.

No comments yet. Be the first to share your thoughts!

Related Skills

ECC

by affaan-m

The agent harness performance optimization system. Skills, instincts, memory, security, and research-first development for Claude Code, Codex, Opencode, Cursor and beyond.

236,354

heym gemini-skill

name: paperjury description: Three modes for CS-conference papers (CVPR/ICCV/ECCV vision, ACL/EMNLP/NAACL NLP, ICLR/NeurIPS/ICML/AAAI ML). DIRECT-EDIT mode (common): the user describes a change in Chinese or English and the manuscript (LaTeX or Markdown) is edited directly through a CS-venue writing toolkit with author sign-off (use for 改这段 / 把中文想法写成 latex / polish / de-AI / translate / compress a passage). REVIEW mode (occasional, pre-submission): harden the paper through an adversarial courtroom review engine (N holistic domain reviewers / contestability routing / two-sided trial / three-way verdict / clerk-converged multi-round loop) with consensus-gated, author-signed revisions (use for review / critique / 审稿 / 评审 / mock-review). AUTO mode (unattended, opt-in via /goal): run the review-revise loop toward a verifiable goal, applying safe fixes under a drift-bounded policy and queueing risky ones. Resolves all inputs at runtime, no hardcoded paths. Not a from-scratch drafter (use ml-paper-writing) and not an official-venue rebuttal. version: 1.2.1 author: Yiran Wang license: MIT tags: [Academic Writing, Peer Review, Adversarial Review, CVPR, ICCV, ECCV, ACL, EMNLP, NAACL, ICLR, NeurIPS, ICML, AAAI, Workflow, LaTeX]

PaperJury (CS-conference paper review and editing)

PaperJury edits and hardens any CS-conference paper. It runs in three modes. In direct-edit mode (the common case) the user describes a change in Chinese or English and the LaTeX is edited directly through a CS-venue writing toolkit, with author sign-off. In review mode (occasional, pre-submission) it exposes the manuscript to a harsh, multi-perspective courtroom review engine that adjudicates each issue (N holistic domain reviewers -> contestability routing -> two-sided trial -> three-way verdict, with a polish track and a clerk-converged multi-round loop), gates every change behind consensus, and tracks issues in a durable ledger. In auto mode (unattended, opt-in via /goal) it runs that same engine toward a verifiable goal, applying safe fixes under a drift-bounded policy and queueing the risky ones for one human pass on return. All modes share the same writing toolkit, hard rules, ledger, and author sign-off (auto via up-front policy sign-off plus the queue, see hard rule 1).

This skill is fully generic. It ships no hardcoded paths, no project files, and no embedded paper. Everything specific to a given paper (where the manuscript is, the venue, who signs off, the house style) is resolved at runtime or supplied by a config the project owns. The skill itself is the backbone; any concrete paper is just an instantiation of it.

Scope: CS conferences only. Three venue families, each with its own style profile:

Vision: CVPR, ICCV, ECCV, WACV
NLP: ACL, EMNLP, NAACL, COLING
ML: ICLR, NeurIPS, ICML, AAAI, COLM

When to use / when not

Three modes, one skill. Pick by what the user is asking for:

Direct-edit mode (the common case). The user describes a change in Chinese (or English) and wants the LaTeX edited directly: "把这段改成...", "polish this paragraph", "把我对 intro 的想法写成 LaTeX", "tighten this". No review panel; go straight to drafting the patch through the writing toolkit, with author sign-off.
Review mode (occasional, pre-submission). The user wants the paper critiqued or hardened: review / critique / 审稿 / 评审 / mock-review, or iterating a draft to clear reviewer-raised issues. This runs the courtroom review engine (references/review-engine-v3.md).
Auto mode (unattended). The user opts in via /goal (or config mode: auto) to run the review-revise loop AFK toward a verifiable goal. Establish the spine up front (the one human step), then the engine applies safe fixes under the bounded-aggressive policy and queues the rest. The drafter input passes the significance floor (node scripts/ledger.js floor: valid-fixable majors only) and the ledger view is initialized collapsed (--display collapse: minors fold into a Minor digest, majors stay itemized). See references/auto-mode.md. Never self-detect auto; it is explicit only.

Do NOT use for: writing a paper from scratch (use ml-paper-writing), figure or diagram generation (use academic-plotting), or an official-venue rebuttal (this is a pre-submission self-hardening loop, no score gate).

Soft update reminder: at the start of each PaperJury invocation, before choosing the mode or editing a manuscript, run node scripts/check-update.js from the skill root unless PAPERJURY_DISABLE_UPDATE_CHECK=1 is set. If it reports an available update, show the notice once and continue. If the check is skipped, silent, or cannot reach GitHub, continue without mentioning it; update checks are never allowed to block review or editing.

The three primitives

This paradigm is expressed as Skill + Workflow + Memory. Each carries one concern; together they replace the heavy per-round file-and-flag machinery a hand-rolled version accumulates.

Skill (this folder) = entry point + methodology. The protocol, the reviewer panel, the contestability routing, the writing toolkit, the human gates. Detail in references/review-engine-v3.md, references/reviewer-personas.md, references/writing-toolkit.md.
Workflow = fan-out engine. The semantic, no-human-in-the-middle steps run as Workflows (parallelism + schema-validated output by construction). The simple panel is workflows/review-panel.workflow.js; the v3 courtroom engine is assign-reviewers -> reading-check -> coverage-auditor -> merge -> {trial (+ escalate) || polish} -> recall-audit -> drafter -> {edit-audit | meaning-audit} -> clerk. The DETERMINISTIC guards run orchestrator-side via Bash between workflow calls (the Workflow sandbox has no fs): scripts/ holds decompose, extract-docx, ledger, journal, apply-patch, anchor-diff, cross-ref, spine, rekey, compile-guard, compliance-check (plus doctor, the install/repo health check: npm run doctor). Build note: this harness delivers a workflow's args as a JSON STRING, so every workflow parses it defensively. Protocol + every orchestrator seam: references/review-engine-v3.md.
Memory = durable state + learned conventions. Two layers:
- Ledger (LEDGER.json resolved at runtime = the machine source of truth, plus a rendered LEDGER.md view; managed by scripts/ledger.js): the live, mutable issue state across rounds and sessions. Schema + status state machine: references/ledger-schema.md.
- Claude memory (the active project's memory): stable conventions worth recalling next session, e.g. this paper's house style, venue, persona tuning.

Resolving inputs at runtime (no hardcoded paths)

The skill ships ZERO hardcoded paths or project files. On trigger it resolves each input by discovery first, then asking:

manuscript: detect the main source, then route it through the INTAKE FORMAT GATE by extension. Four routes, none silent:
- .tex: the native LaTeX path. Detect the main source (the .tex with \documentclass / \begin{document}, or the file the user names). If several candidates, ask.
- .md / .markdown / .txt: the native text path. The full multi-round engine runs; compile checks are not applicable (compile-guard returns compiled:null plus a markdown sanity lint, an honest UNKNOWN, never a fake pass); LaTeX-only compliance checks are skipped and reported as skipped_checks.
- .docx: if a .paper-review/ working copy AND a ledger already exist, REUSE them, never re-extract. If the sha256 of the docx no longer matches the ledger's meta.original_sha256, STOP and ask: continue on the working copy, or extract --force knowingly discarding the applied edits (an explicit new-intake event). Otherwise run node scripts/extract-docx.js extract <file.docx> (one time) and tell the user explicitly: the original Word file is never modified; all rounds run on .paper-review/<basename>.md (print the full working-copy path); they get back the edited Markdown plus a per-edit change list; the extraction report lists everything dropped or degraded. Write ledger meta {manuscript: <working copy>, working_format: 'markdown', source_format: 'docx', original, original_sha256, extracted_at, extraction_report}. If the report shows nonzero tracked-change counts, seed a round-1 author-required ledger row ("manuscript contains unresolved tracked changes; accepted-all for review").
- any other extension (.doc, .pdf, .rtf, .odt, ...): explicitly unsupported. Say so and suggest exporting .docx / .md / .tex; never silently degrade.
After intake, the working copy IS the manuscript for every rule and gate in this file (sign-off, spine freeze, round-0 baseline, edit safety, journal); the original uploaded file is permanently read-only.
venue_family: the user can name it, or an agent reads the class file to GUESS the family (e.g. a cvpr/iccv style, an acl style, a neurips/iclr style). There is no hardcoded venue list and no deterministic detector; if unclear, ask.
ledger: default to <manuscript-dir>/.paper-review/LEDGER.json (the machine source of truth; scripts/ledger.js also renders a LEDGER.md view). Create if absent, reuse if present. The user may point elsewhere.
author: ask who signs off on edits (default: the current user). Every edit needs explicit authorization.
personas: default to N domain-expert holistic reviewers assigned at runtime (assign-reviewers, from the project gatekeeper core + a generated domain overlay); the three generic lenses in references/reviewer-personas.md are the degrade fallback. If the project defines its own named reviewer subagents, use them as agentType; otherwise inline the persona prompts.
style_profile: start from the venue-family default; refine from any conventions recalled from memory or pinned in a project config.

A project MAY pin these by dropping a config in ITS OWN repo (see configs/config-template.md for the shape). That file is owned by the project, never by this skill. At round start, recall any pinned conventions from memory.

Direct-edit mode (the common case)

The user states a change in Chinese or English; you draft and apply the LaTeX edit. No panel, no ledger, no discussion. Minimal flow:

Locate. Resolve the manuscript and find the target passage the instruction refers to (a paragraph, sentence, caption, table cell). If it is ambiguous on a large file, ask which passage; do not guess. On a .docx: if a working copy already exists, it IS the manuscript, edit it; if none exists, offer an explicit choice between (a) paste-back, returning the rewritten passage as text for the user to apply in Word (no working copy), and (b) running the one-time intake extraction and editing the working copy. Never edit the .docx file itself.
Draft. Pick the writing-toolkit prompt matching the instruction (translate-to-english for a Chinese idea, polish-english / de-ai for a rewrite, compress / expand for length, caption / experiment-analysis for those units) and draft the patch to do exactly what was asked. The Common guards apply (markup-safe for the working format, plain CS prose, no log leakage into the manuscript).
Self-gate. Run logic-check on the drafted passage.
Sign-off. Show the patch and get explicit author approval (hard rule 1).
Apply. Write only the patch into the manuscript; keep any back-translation or note author-side.

This is the writing toolkit used on its own. Escalate to review mode only when the user wants the paper critiqued or hardened, not for a single asked-for edit.

Why fan-out is a Workflow and the rest is conversation

The reviewer panel and the trial jury are pure fan-out: spawn, collect, merge. A Workflow does this deterministically (parallelism enforced by construction, structured outputs via schema, isolation by default since each agent sees only the prompt you give it). That isolation is what replaces the snapshot-and-whitelist defense: a reviewer cannot see peers, the ledger, or prior rounds because you simply do not put them in its prompt.

But the loop has genuine human gates (the author reviews the issue list, gives per-issue direction, authorizes edits, breaks ties). Workflows run to completion and return a result; they do not pause mid-run for hours of human input. So:

fan-out steps (reviewers, trial, polish, recall, merge) -> Workflow
human gates (per-issue direction, authorization, override) -> main conversation turns
cross-round truth (the ledger) + stable conventions -> Memory

Review mode: one round, end to end

The full adversarial loop (the v3 courtroom engine). Use it to harden the paper, not for a single asked-for edit (that is direct-edit mode). Full protocol + the 14 orchestrator seams: references/review-engine-v3.md. [WF] = Workflow step, [det] = deterministic Node guard run orchestrator-side between workflow calls, [HUMAN] = author gate, [LEDGER] = state write.

Resolve + recall. Resolve the inputs above; recall this paper's conventions from memory. Pick scope: full (whole paper) or passage (one section / para / claim).
[det] decompose. Split the manuscript into reading units + stable passage_ids + the canonical section list.
[WF] assign-reviewers + [HUMAN] confirm. Name N subfields (2-4, default 3); instantiate N holistic domain reviewers from the gatekeeper core + a generated overlay. An unconfirmable slot degrades per slot to a generic gatekeeper (the three generic lenses in reviewer-personas.md are the fallback). The author confirms the assignment (or pins it via config).
[WF] reading-check. Each reviewer reads the WHOLE paper → weaknesses {significance(major|minor), kind(mechanical|substantive), verbatim quote — cannot quote = did not read} + one overall_confidence + a per-section coverage report. Anti-skim is three layers: [det] per-section quote-verify, [WF] coverage-auditor, [WF] targeted re-invoke.
[WF] merge. Semantic dedup across reviewers; derive significance (MAX) / kind (substantive-dominates) / corroboration. [LEDGER] intake as raised.
[det] route. mechanical → polish; substantive&minor → polish; substantive&major → trial (two parallel tracks).
[WF] trial. Per substantive-major charge: a whole-paper DEFENSE → 5 decorrelated local-context jurors (+ on-demand expansion) → a deterministic verdict (decide iff quorum surviving >= ceil(0.8*jurySize) AND one side > 60% of surviving votes; else escalate to 12). Verdict ∈ {invalid-drop, valid-fixable, author-required, escalate}; the judge sets a close_criterion ONLY for a valid-fixable charge, satisfiable by editing existing text (no new data). [WF] polish runs the off-gate mechanical/minor track in parallel (never silently dropped).
[WF] recall-audit. Mode A revives wrongly-dropped charges; Mode B spot-checks strong-consensus majors BEFORE the edit. Runs before the drafter.
[HUMAN] Authorize + [WF] drafter + edit-safety. On authorization, the drafter writes the minimal patch per surviving valid-fixable. The edit-safety chain gates it: [det] anchor-diff + cross-ref → [WF] meaning-audit (frozen anchor, four-state) / edit-audit (risky non-anchor); [det] apply-patch + compile-guard land a passing patch and [LEDGER] mark closed; a drift / anchor / failed edit is reverted and queued. Revision logs / back-translations stay author-side.
[WF] clerk + report. The clerk reconciles the round boundary (carried open-questions vs this round's edits, via a passage_id + similarity merge key) and emits convergence counts. Summarize new/closed counts with the minor/polish part as a one-line digest (counts), never per-item paragraphs; in review mode do not auto-start the next round (auto mode drives the outer loop via /goal). The rendered LEDGER.md obeys meta.display_mode (flip anytime: node scripts/ledger.js mode <ledger.json> <show|collapse>; review defaults to the flat table, auto initializes collapsed). At round end run node scripts/rekey.js <working file> <ledger> <journal> to re-link open rows whose passage_id no longer resolves after this round's edits (both formats).

GATE: node scripts/ledger.js gate = 0 gate-blocking active major (gate-blocking = {raised, in-trial, re-trial, valid-fixable}; author-required / queued / dropped / closed are gate-OK and author-required accumulates to the queue). Full protocol + ledger schema + status machine: references/review-engine-v3.md, references/ledger-schema.md. The legacy single-pass 3-reviewer panel (workflows/review-panel.workflow.js, the discussion-mode flow in references/methodology.md) is kept only as a quick check.

Hard rules (load-bearing, venue-agnostic)

Never edit the manuscript without explicit author sign-off. Auto-mode carve-out: the rule HOLDS; auto satisfies it via UP-FRONT sign-off (the spine confirmation + the pre-authorized bounded-aggressive policy) plus the return queue, not per-edit sign-off. Nothing outside the authorized envelope is applied.
Reviewers / jurors are isolated. Fresh eyes per round: no cross-talk, no prior-round leakage, no sight of the ledger. Enforced by (a) what goes into each agent's prompt AND (b) an explicit ISOLATION instruction in every reviewer-type prompt telling the agent to judge only the quoted text and not read files (workflow agents have read tools and will otherwise sometimes roam).
A valid-fixable issue carries a close_criterion (one concrete sentence an edit must satisfy), set by the judge at trial; it is null at intake.
No leakage into the reviewed text. Revision logs, back-translations, and self-check verdicts are author-side aids; they never enter the manuscript or any frozen snapshot.
Disagreement resolves through discussion, then override (logged), never a silent dismissal.
No hardcoded paths or project files in the skill. Resolve at runtime.

Memory convention

At round start: recall the paper's conventions (house style, venue, persona tuning) from memory; read the resolved LEDGER.json for open issues.
During the round: the ledger is the only mutable truth; update it at merge, trial verdicts, recall, and close.
After the round: persist any newly learned stable convention to memory (e.g. a house-style rule a reviewer surfaced), not the transient issue state.

Maximizing it under ultracode

The fan-out engine implements the strong form directly (workflows/review-panel.workflow.js):

loop-until-dry: re-runs independent fresh panels and accumulates only issues not seen before, stopping after dryStop consecutive passes that add no surviving issue (hard cap maxRounds). Raises recall past a single pass.
adversarial verify: each new issue faces perspective-diverse skeptics (misreading / already-addressed / scope-or-severity) and is kept unless a majority refute it, filtering plausible-but-wrong issues before they reach the ledger. Bias is to keep, so real flaws are not lost.

Toggle via args: ultracode on -> defaults (maxRounds 4, dryStop 2, verify true); ultracode off -> pass {maxRounds:1, verify:false} for the basic single-panel form. The loop is budget-aware and stops early if the token budget runs low.

Capabilities and status

Built: the review engine; the submission-readiness checker (deterministic desk-reject screening plus a real LaTeX compile, degrading to a structural lint when no toolchain is present); auto mode (the review-revise loop toward a goal under a drift-bounded policy, applying safe fixes and queueing risky ones for author review); and the significance floor (ledger.js floor gates the drafter to valid-fixable majors; the collapsed ledger view folds minors into a digest so trivia never floods the author's attention -- render-only, full detail kept in LEDGER.json). Roadmap: vision-based layout verification, automatic venue detection from the class file, and reviewer personas tuned to each venue community.

Related skills

ml-paper-writing: from-scratch drafting, citation verification (never hallucinate citations), conference checklists. This loop borrows its sentence-level guidance for the edit-drafting step rather than duplicating it.
academic-plotting: figure and architecture-diagram generation (out of scope here; this loop edits text and captions, not figure images).

[!IMPORTANT] PaperJury 是投稿前的自查工具，不能替代作者的科学判断，也不能替代 peer review。它不能用来编造实验、伪造结果、添加没有证据支撑的 claim，或掩盖论文局限。遇到需要新实验、缺证据、需要作者私有知识或研究判断的问题，它都会交回作者处理。

🔥 News

🎉 RedNote（小红书）里程碑： 相关分享已经达到 3 万浏览、1.8k 收藏。感谢大家转发和收藏，也感谢大家把 PaperJury 推荐给更多正在赶论文、改论文的朋友。
📄 2026-06-15：PaperJury 论文已上 arXiv。 arXiv 页面：PaperJury: Due-Process Review for Bounded LaTeX Revision（arXiv:2606.16322）。论文系统介绍了「审稿 → 裁定 → 修改 → 复查」这套引擎：哪些事交给确定性脚本，哪些判断交给语义 agent；有争议的问题如何进入审议；不同风险的编辑该上什么护栏。
🔔 2026-06-10：v1.0.0 发布。 这是第一个稳定版，和 Codex 版 v1.0 对齐。新增软更新提醒：发现新的稳定 tag 时只提示，不打断当前工作。
🚀 2026-06-05：PaperJury 的 Codex 版已经推送。 入口在这里：paperjury-codex。
🧪 Dogfood sample 已加入。 仓库里放了一个紧凑的 dogfood sample：修改前后 PDF，以及人工核对过的运行报告。

📌 引用论文

如果 PaperJury 对你的研究或写作流程有帮助，可以引用这篇 arXiv 论文：

@misc{wang2026paperjurydueprocessreviewbounded,
  title={PaperJury: Due-Process Review for Bounded LaTeX Revision},
  author={Yiran Wang and Ruixuan An and Biao Wu and Wenhao Wang},
  year={2026},
  eprint={2606.16322},
  archivePrefix={arXiv},
  primaryClass={cs.CL},
  url={https://arxiv.org/abs/2606.16322},
}

同一条目也放在 CITATION.bib。

⚡ 快速上手

在 Claude Code 里安装：

/plugin marketplace add u7079256/paperjury
/plugin install paperjury@u7079256

然后在你的论文项目里直接说需求：

审稿，重点看实验和 claim 是否站得住。

也可以更日常一点：

把 introduction 这段改紧一些，但不要改变 claim。

不需要背命令。PaperJury 会根据你的描述选择 direct-edit、review 或 auto 模式；真正落稿前，会先把补丁交给你确认。

🤔 这是什么？

PaperJury 以 Claude Code skill 的形式提供，把投稿前自查组织成一套闭环：审稿 → 裁定 → 修改 → 复查。它不会照单全收 AI 反馈，而是先把每条意见分成三类：

结果	含义
✅ 安全修复	表达不清、claim 过强、结构不顺这类文本问题；不需要补实验，也不会把原意改偏。
🧑‍💻 作者处理	缺实验、缺 ablation、缺数据或证据，必须由作者自己判断。
🛑 不成立	AI reviewer 误读了论文，或者提了不该改的问题。

🎯 适合谁

你现在的情况	可以直接这样用
📝 刚写完初稿	让它像 reviewer 一样通读全文，先找最可能影响投稿的问题。
🔍 投稿前最后自查	让它检查 claim 是否说过头、实验是否撑得住，以及有没有明显格式风险。
✍️ 只想改一段话	直接说「把这段改紧一点，但不要改变 claim」，它会先起草补丁，等你确认后再改。
🔁 想多轮打磨但不想一直盯着	明确授权 auto 模式；安全修改可以落稿，高风险问题仍会交回作者决定。

📦 你会得到什么

输出	内容
📋 问题清单	每条 reviewer-style 问题都会带证据、位置、判断结果和当前状态；不会把一堆意见直接倒进正文。
🧩 可审阅补丁	只有安全修复会进入最小补丁；高风险改动会先放着，等作者决定。
🛠️ 复查报告	有 LaTeX 工具链就真实编译；没有就明说哪些检查做不了，不会假装验证过。
🧪 真实样例	`samples/dogfood/` 里有修改前后 PDF 和人工核对过的运行报告。

🧠 能帮你做什么

场景	PaperJury 会怎么做
🔎 投稿前挑问题	模拟几位不同方向的 reviewer 通读全文，找出真正可能被抓住的弱点，并把致命问题和小修小补分开。
✍️ 安全改 LaTeX / Markdown	只针对你指定的位置起草补丁，自检后再交给你确认；不会把一处小改扩成整篇重写。
🛡️ 复查格式风险	本机有 LaTeX 工具链时会真实编译，检查报错、未定义引用、overfull box、页数和常见 desk-reject 风险；没有工具链时会明说。
🔁 多轮打磨	在你明确授权的 auto 模式下，多轮跑完「评审-修订-复查」；安全修改可以自动应用，高风险问题会留给作者处理。

PaperJury 的重点不是“让 AI 多写一点”，而是让 AI 先像 reviewer 一样认真挑错，再用确定性脚本守住能验证的边界。

🧭 三种模式

模式	什么时候用	行为	人工确认
✍️ direct-edit（常用）	只想改一处文字、caption、LaTeX 表达或段落结构。	不开评审面板，直接用写作工具包起草补丁。	作者确认后再应用。
🔎 review（偶尔）	想让它审稿、挑问题、mock-review，或只审某一节 / 某条 claim。	启动对抗式评审引擎，先判断问题是否成立，再决定要不要修改。	每处改动逐一确认。
🔁 auto（无人值守）	已经明确给出 `/goal` 或配置 `mode: auto`，希望它多轮跑到一个可验证目标。	先确认 `spine` 和评审分配，再按 bounded-aggressive + edit-safety 策略迭代。	先给整体授权；高风险项仍交回作者。

简单说：改一处 → 直接说；想被挑刺 → 说「审稿」；想无人值守 → 用 /goal。

[!WARNING] auto 必须明确开启。 只打开工具权限再发普通 prompt，只会跑一轮就停，不会进入多轮循环。原因见 docs/AGENT-GUIDE.md §3。

🧪 真实跑一遍

想看它真实产出，仓库里有一个 dogfood sample：在一篇真实草稿上跑完整多轮评审，附修改前后 PDF 和一份人工核对过的运行报告。

samples/dogfood/（original_draft.pdf · revised_draft.pdf · 运行报告）

如果只想确认稿件不会先被格式问题挡住，可以说：

跑一下 submission-readiness / 合规检查。

它会做确定性格式筛查，再配合编译驱动的版面检查。

🚀 安装

Claude Code plugin

推荐用 marketplace 路线：

/plugin marketplace add u7079256/paperjury
/plugin install paperjury@u7079256

Clone 成 skill

也可以把仓库 clone 到 Claude Code 读取 skill 的目录：

# macOS / Linux
git clone https://github.com/u7079256/paperjury ~/.claude/skills/paperjury

# Windows (PowerShell)
git clone https://github.com/u7079256/paperjury "$env:USERPROFILE\.claude\skills\paperjury"

也可以放在 <项目>/.claude/skills/ 下，只对单个项目生效。

安装后建议检查：

Claude Code 会通过 SKILL.md 自动发现它，skill 名称是 paperjury。
需要 node，因为确定性检查跑在 Node 上。
LaTeX 工具链可选；真实编译和版面检查会用到，没有时会明说哪些检查做不了。
在 skill 目录里运行 npm run doctor，可以检查仓库完整性、所需工具和论文文件识别。
启动时会对 GitHub 稳定版 release tag 做一次软更新检查；发现新版只提示，不阻塞当前工作。设置 PAPERJURY_DISABLE_UPDATE_CHECK=1 可以关闭提醒。更新后请开新会话。

Claude Code 版和 Codex 版怎么选

版本	入口	适合
Claude Code 版	本仓库；Claude Code plugin 或 `.claude/skills/`	你主要在 Claude Code 里写论文、改 LaTeX、跑 workflow。
Codex 版	paperjury-codex	你主要在 Codex / Codex plugin 环境里跑同一套评审和修订流程。

给 Claude / 编码 agent： 更深入的驱动说明见 docs/AGENT-GUIDE.md。里面写了安装、三种模式及触发方式、引擎管线、auto 与 /goal 的区别，以及并行评审如何启动。

常见问题

PaperJury 能审 Word（.docx）文件吗？

能。PaperJury 会把 .docx 一次性转成 Markdown，并明确告诉你转换保留了什么、哪些内容带不过来，比如复杂表格和公式。随后它在这份 Markdown 工作副本上跑完整多轮评审。原始 Word 文件不会被改动。结束后你拿回的是改好的 Markdown 和逐条修改清单；要不要合并回 Word，由你自己决定。你也可以先把论文导出成 .md 或 .tex，再直接交给它。

它会不会擅自改我的论文？

不会。direct-edit 和 review 模式下，补丁需要你确认后才会应用。auto 模式也必须显式开启，并且会先拿到对核心方向、修订范围和策略的整体授权。

深入了解

新用户可以先跳过这一节。想看机制、源码结构或 agent 驱动方式，可以从这里开始：

你想了解	入口
真实运行效果	`samples/dogfood/RUN_REPORT.zh-CN.md`
怎么驱动 Claude / 编码 agent	`docs/AGENT-GUIDE.md`
引擎设计细节	`docs/REVIEW_ENGINE_V3_DESIGN.md`
完整协议和状态机	`references/review-engine-v3.md` · `references/ledger-schema.md`
在线可视化说明	交互式总览

引擎原理

PaperJury 把审稿拆成一套有边界的“庭审”流程：先由有限数量的 reviewer 找问题，再把有争议的意见拿出来审议；编辑阶段按风险加护栏，多轮结束时由确定性脚本判断是否收敛。

assign-reviewers → reading-check → coverage-auditor → merge
  → { trial ‖ polish } → recall-audit → drafter
  → { edit-audit | meaning-audit } → clerk

能用脚本检查的部分都放在 scripts/ 里，由 orchestrator 在各个 workflow 之间调用；需要判断语义的问题，则交给相互隔离的 model agents。

读稿分解：把手稿（LaTeX 或 Markdown）切成阅读单元、规范段落列表和稳定段落编号，防止问题锚点漂移。
Word 提取：把 .docx 一次性转成 Markdown 工作副本，并说明哪些内容保住了、哪些内容可能带不过来；原始 Word 文件不改动。
核心声明（仅 auto 模式）：提取核心 claim，交给作者确认后冻结为配置。
Ledger：用机器可读的记录保存活跃问题，跨轮次、跨会话都能接上。只要没有仍然阻塞的 major 问题，就视为工具侧完成；author-required 会进入人工待办，不算工具侧未完成。
日志：编辑历史只追加记录，方便回滚。
补丁应用：原子性应用编辑，记录日志，必要时可以恢复。
锚点追踪：定位已冻结的核心 claim；上下文变化时，标出需要重新审计的部分。
交叉引用检查：编辑前先查改动关键词是否还出现在其他位置；如果出现，就标记为需要语义审计。
段落重新对齐：每轮结束后，重新对齐被编辑挪动的段落编号，避免问题失去锚点。
编译检查：尝试真实 LaTeX 编译；无法编译时退到结构检查，并明确说明哪些结果不可验证。
提交合规检查：用脚本先筛一遍常见提交格式风险。
装机自检：npm run doctor，检查仓库完整性、所需工具和手稿识别。

评审员分配：根据论文研究方向，分配 N 位领域 reviewer。
完整阅读检查：每位 holistic reviewer 通读全文一遍，列出弱点、原文引文、总体置信度和按节覆盖情况；引不出原文，就视为没有真读。
覆盖审计：检查哪些 reviewer / section 组合可能被略读。
去重：合并重复评论，并整理重要性、问题类