by u7079256
Pre-submission AI review stress-test for research papers. A Claude Code skill: review, verdict, revise, verify.
# Add to your Claude Code skills
git clone https://github.com/u7079256/paperjurypaperjury is an open-source ai agents skill for AI coding assistants such as Claude Code, Codex CLI, and ChatGPT, built by u7079256. Pre-submission AI review stress-test for research papers. A Claude Code skill: review, verdict, revise, verify. It has 225 GitHub stars.
paperjury's catalog security scan is still queued. You can run an instant dependency and prompt-injection check now with the "Scan for vulnerabilities" button above.
Clone the repository with "git clone https://github.com/u7079256/paperjury" and add it to your Claude Code skills directory (see the Installation section above). paperjury ships a SKILL.md manifest, so compatible agents can discover and load it automatically.
paperjury is primarily written in JavaScript. It is open-source under u7079256 on GitHub, so you can review or fork the full source.
Yes. SkillsLLM lists many other AI Agents skills you can browse and compare side by side. Open the AI Agents category from the badge at the top of this page, or use the Related Skills and comparison links further down to weigh paperjury against similar tools.
No comments yet. Be the first to share your thoughts!
Unlocks once the catalog security scan passes (runs nightly).
The deep catalog scan for this skill is still queued. Run an instant dependency check now instead.
PaperJury edits and hardens any CS-conference paper. It runs in
three modes. In direct-edit mode (the common case) the user describes a change
in Chinese or English and the LaTeX is edited directly through a CS-venue writing
toolkit, with author sign-off. In review mode (occasional, pre-submission) it
exposes the manuscript to a harsh, multi-perspective courtroom review engine that
adjudicates each issue (N holistic domain reviewers -> contestability routing ->
two-sided trial -> three-way verdict, with a polish track and a clerk-converged
multi-round loop), gates every change behind consensus, and tracks issues in a durable
ledger. In auto mode (unattended, opt-in via /goal) it runs that same engine
toward a verifiable goal, applying safe fixes under a drift-bounded policy and
queueing the risky ones for one human pass on return. All modes share the same
writing toolkit, hard rules, ledger, and author sign-off (auto via up-front policy
sign-off plus the queue, see hard rule 1).
This skill is fully generic. It ships no hardcoded paths, no project files, and no embedded paper. Everything specific to a given paper (where the manuscript is, the venue, who signs off, the house style) is resolved at runtime or supplied by a config the project owns. The skill itself is the backbone; any concrete paper is just an instantiation of it.
Scope: CS conferences only. Three venue families, each with its own style profile:
Three modes, one skill. Pick by what the user is asking for:
references/review-engine-v3.md)./goal (or config mode: auto)
to run the review-revise loop AFK toward a verifiable goal. Establish the spine
up front (the one human step), then the engine applies safe fixes under the
bounded-aggressive policy and queues the rest. The drafter input passes the
significance floor (node scripts/ledger.js floor: valid-fixable majors only) and
the ledger view is initialized collapsed (--display collapse: minors fold into a
Minor digest, majors stay itemized). See references/auto-mode.md.
Never self-detect auto; it is explicit only.Do NOT use for: writing a paper from scratch (use ml-paper-writing), figure or
diagram generation (use academic-plotting), or an official-venue rebuttal (this
is a pre-submission self-hardening loop, no score gate).
Soft update reminder: at the start of each PaperJury invocation, before choosing
the mode or editing a manuscript, run node scripts/check-update.js from the
skill root unless PAPERJURY_DISABLE_UPDATE_CHECK=1 is set. If it reports an
available update, show the notice once and continue. If the check is skipped,
silent, or cannot reach GitHub, continue without mentioning it; update checks are
never allowed to block review or editing.
This paradigm is expressed as Skill + Workflow + Memory. Each carries one concern; together they replace the heavy per-round file-and-flag machinery a hand-rolled version accumulates.
references/review-engine-v3.md, references/reviewer-personas.md,
references/writing-toolkit.md.workflows/review-panel.workflow.js; the v3 courtroom engine is
assign-reviewers -> reading-check -> coverage-auditor -> merge ->
{trial (+ escalate) || polish} -> recall-audit -> drafter ->
{edit-audit | meaning-audit} -> clerk. The DETERMINISTIC guards run
orchestrator-side via Bash between workflow calls (the Workflow sandbox has no fs):
scripts/ holds decompose, extract-docx, ledger, journal, apply-patch,
anchor-diff, cross-ref, spine, rekey, compile-guard, compliance-check
(plus doctor, the install/repo health check: npm run doctor). Build note: this harness
delivers a workflow's args as a JSON STRING, so every workflow parses it
defensively. Protocol + every orchestrator seam: references/review-engine-v3.md.LEDGER.json resolved at runtime = the machine source of truth,
plus a rendered LEDGER.md view; managed by scripts/ledger.js): the live,
mutable issue state across rounds and sessions. Schema + status state machine:
references/ledger-schema.md.The skill ships ZERO hardcoded paths or project files. On trigger it resolves each input by discovery first, then asking:
manuscript: detect the main source, then route it through the INTAKE FORMAT GATE by extension. Four routes, none silent:
.tex: the native LaTeX path. Detect the main source (the .tex with
\documentclass / \begin{document}, or the file the user names). If
several candidates, ask..md / .markdown / .txt: the native text path. The full multi-round
engine runs; compile checks are not applicable (compile-guard returns
compiled:null plus a markdown sanity lint, an honest UNKNOWN, never a
fake pass); LaTeX-only compliance checks are skipped and reported as
skipped_checks..docx: if a .paper-review/ working copy AND a ledger already exist,
REUSE them, never re-extract. If the sha256 of the docx no longer matches
the ledger's meta.original_sha256, STOP and ask: continue on the working
copy, or extract --force knowingly discarding the applied edits (an
explicit new-intake event). Otherwise run
node scripts/extract-docx.js extract <file.docx> (one time) and tell the
user explicitly: the original Word file is never modified; all rounds run
on .paper-review/<basename>.md (print the full working-copy path); they
get back the edited Markdown plus a per-edit change list; the extraction
report lists everything dropped or degraded. Write ledger meta
{manuscript: <working copy>, working_format: 'markdown', source_format: 'docx', original, original_sha256, extracted_at, extraction_report}. If the
report shows nonzero tracked-change counts, seed a round-1 author-required
ledger row ("manuscript contains unresolved tracked changes; accepted-all
for review")..doc, .pdf, .rtf, .odt, ...): explicitly
unsupported. Say so and suggest exporting .docx / .md / .tex; never
silently degrade.After intake, the working copy IS the manuscript for every rule and gate in this file (sign-off, spine freeze, round-0 baseline, edit safety, journal); the original uploaded file is permanently read-only.
venue_family: the user can name it, or an agent reads the class file to GUESS the family (e.g. a cvpr/iccv style, an acl style, a neurips/iclr style). There is no hardcoded venue list and no deterministic detector; if unclear, ask.
ledger: default to <manuscript-dir>/.paper-review/LEDGER.json (the machine
source of truth; scripts/ledger.js also renders a LEDGER.md view). Create if
absent, reuse if present. The user may point elsewhere.
author: ask who signs off on edits (default: the current user). Every edit needs explicit authorization.
personas: default to N domain-expert holistic reviewers assigned at runtime
(assign-reviewers, from the project gatekeeper core + a generated domain overlay);
the three generic lenses in references/reviewer-personas.md are the degrade
fallback. If the project defines its own named reviewer subagents, use them as
agentType; otherwise inline the persona prompts.
style_profile: start from the venue-family default; refine from any conventions recalled from memory or pinned in a project config.
A project MAY pin these by dropping a config in ITS OWN repo (see
configs/config-template.md for the shape). That file is owned by the project,
never by this skill. At round start, recall any pinned conventions from memory.
The user states a change in Chinese or English; you draft and apply the LaTeX edit. No panel, no ledger, no discussion. Minimal flow:
.docx: if a working copy
already exists, it IS the manuscript, edit it; if none exists, offer an
explicit choice between (a) paste-back, returning the rewritten passage as
text for the user to apply in Word (no working copy), and (b) running the
one-time intake extraction and editing the working copy. Never edit the
.docx file itself.translate-to-english for a Chinese idea, polish-english / de-ai for a
rewrite, compress / expand for length, caption / experiment-analysis
for those units) and draft the patch to do exactly what was asked. The Common
guards apply (markup-safe for the working format, plain CS prose, no log
leakage into the manuscript).logic-check on the drafted passage.This is the writing toolkit used on its own. Escalate to review mode only when the user wants the paper critiqued or hardened, not for a single asked-for edit.
The reviewer panel and the trial jury are pure fan-out: spawn, collect, merge. A Workflow does this deterministically (parallelism enforced by construction, structured outputs via schema, isolation by default since each agent sees only the prompt you give it). That isolation is what replaces the snapshot-and-whitelist defense: a reviewer cannot see peers, the ledger, or prior rounds because you simply do not put them in its prompt.
But the loop has genuine human gates (the author reviews the issue list, gives per-issue direction, authorizes edits, breaks ties). Workflows run to completion and return a result; they do not pause mid-run for hours of human input. So:
The full adversarial loop (the v3 courtroom engine). Use it to harden the paper, not
for a single asked-for edit (that is direct-edit mode). Full protocol + the 14
orchestrator seams: references/review-engine-v3.md. [WF] = Workflow step,
[det] = deterministic Node guard run orchestrator-side between workflow calls,
[HUMAN] = author gate, [LEDGER] = state write.
full (whole paper) or passage (one section / para / claim).[det] decompose. Split the manuscript into reading units + stable
passage_ids + the canonical section list.[WF] assign-reviewers + [HUMAN] confirm. Name N subfields (2-4,
default 3); instantiate N holistic domain reviewers from the gatekeeper core + a
generated overlay. An unconfirmable slot degrades per slot to a generic gatekeeper
(the three generic lenses in reviewer-personas.md are the fallback). The author
confirms the assignment (or pins it via config).[WF] reading-check. Each reviewer reads the WHOLE paper → weaknesses
{significance(major|minor), kind(mechanical|substantive), verbatim quote —
cannot quote = did not read} + one overall_confidence + a per-section coverage
report. Anti-skim is three layers: [det] per-section quote-verify, [WF]
coverage-auditor, [WF] targeted re-invoke.[WF] merge. Semantic dedup across reviewers; derive significance (MAX) /
kind (substantive-dominates) / corroboration. [LEDGER] intake as raised.[det] route. mechanical → polish; substantive&minor → polish;
substantive&major → trial (two parallel tracks).[WF] trial. Per substantive-major charge: a whole-paper DEFENSE → 5
decorrelated local-context jurors (+ on-demand expansion) → a deterministic verdict
(decide iff quorum surviving >= ceil(0.8*jurySize) AND one side > 60% of
surviving votes; else escalate to 12). Verdict ∈ {invalid-drop, valid-fixable,
author-required, escalate}; the judge sets a close_criterion ONLY for a
valid-fixable charge, satisfiable by editing existing text (no new data). [WF]
polish runs the off-gate mechanical/minor track in parallel (never silently dropped).[WF] recall-audit. Mode A revives wrongly-dropped charges; Mode B spot-checks
strong-consensus majors BEFORE the edit. Runs before the drafter.[HUMAN] Authorize + [WF] drafter + edit-safety. On authorization, the
drafter writes the minimal patch per surviving valid-fixable. The edit-safety chain
gates it: [det] anchor-diff + cross-ref → [WF] meaning-audit (frozen anchor,
four-state) / edit-audit (risky non-anchor); [det] apply-patch + compile-guard land
a passing patch and [LEDGER] mark closed; a drift / anchor / failed edit is
reverted and queued. Revision logs / back-translations stay author-side.[WF] clerk + report. The clerk reconciles the round boundary (carried
open-questions vs this round's edits, via a passage_id + similarity merge key) and
emits convergence counts. Summarize new/closed counts with the minor/polish part
as a one-line digest (counts), never per-item paragraphs; in review mode do not
auto-start the next round (auto mode drives the outer loop via /goal). The
rendered LEDGER.md obeys meta.display_mode (flip anytime:
node scripts/ledger.js mode <ledger.json> <show|collapse>; review defaults to
the flat table, auto initializes collapsed). At round end run
node scripts/rekey.js <working file> <ledger> <journal> to re-link open rows
whose passage_id no longer resolves after this round's edits (both formats).GATE: node scripts/ledger.js gate = 0 gate-blocking active major (gate-blocking =
{raised, in-trial, re-trial, valid-fixable}; author-required / queued / dropped /
closed are gate-OK and author-required accumulates to the queue). Full protocol +
ledger schema + status machine: references/review-engine-v3.md,
references/ledger-schema.md. The legacy single-pass 3-reviewer panel
(workflows/review-panel.workflow.js, the discussion-mode flow in
references/methodology.md) is kept only as a quick check.
close_criterion (one concrete sentence an
edit must satisfy), set by the judge at trial; it is null at intake.LEDGER.json for open issues.The fan-out engine implements the strong form directly
(workflows/review-panel.workflow.js):
dryStop consecutive passes that add no
surviving issue (hard cap maxRounds). Raises recall past a single pass.Toggle via args: ultracode on -> defaults (maxRounds 4, dryStop 2,
verify true); ultracode off -> pass {maxRounds:1, verify:false} for the basic
single-panel form. The loop is budget-aware and stops early if the token budget
runs low.
Built: the review engine; the submission-readiness checker (deterministic desk-reject screening plus a real LaTeX compile, degrading to a structural lint when no toolchain is present); auto mode (the review-revise loop toward a goal under a drift-bounded policy, applying safe fixes and queueing risky ones for author review); and the significance floor (ledger.js floor gates the drafter to valid-fixable majors; the collapsed ledger view folds minors into a digest so trivia never floods the author's attention -- render-only, full detail kept in LEDGER.json). Roadmap: vision-based layout verification, automatic venue detection from the class file, and reviewer personas tuned to each venue community.
ml-paper-writing: from-scratch drafting, citation verification (never
hallucinate citations), conference checklists. This loop borrows its
sentence-level guidance for the edit-drafting step rather than duplicating it.academic-plotting: figure and architecture-diagram generation (out of scope
here; this loop edits text and captions, not figure images).English · 中文
A pre-submission AI review stress-test for research papers.
Before a reviewer tears it apart, let a jury do it first.
PaperJury turns paper feedback into a closed loop: review → verdict → revise → verify. Instead of taking every AI suggestion at face value, it sorts each issue into one of three outcomes:
It offers three modes: direct-edit, review, and auto. PaperJury is built for pre-submission self-checking. It does not replace peer review, it does not invent missing experiments, and it keeps research-level decisions with the author.
Interactive overview: the live site (GitHub Pages), or docs/overview.html in-repo.
📄 2026-06-12: The PaperJury paper is out. Read the preprint: PaperJury: Due-Process Review for Bounded LaTeX Revision — the full review → verdict → revise → verify engine written up as a paper: the deterministic-vs-semantic split, contestability routing, the due-process trial, and risk-proportional edit guards.
🔔 2026-06-10: v1.0.0 released. First stable release, aligned with the Codex port's v1.0. Adds a non-blocking update reminder that points to the latest stable release when a newer tag exists.
🚀 2026-06-05: PaperJury's Codex-first port has shipped. Open it here: paperjury-codex.
🧪 Dogfood sample added: this repo now includes a compact dogfood sample with before/after PDFs and a human-verified run report.
PaperJury is a pre-submission self-check workflow. It does not replace the author's scientific judgment, and it does not replace peer review. It should never be used to invent experiments, fabricate results, add unsupported claims, or hide a paper's limitations.
When an issue needs a new experiment, missing evidence, private knowledge, or a research-level decision, PaperJury routes it to the author instead of patching it automatically. The Fixable / Author-required / Invalid outcomes exist precisely so that judgment calls stay with you.
The intended use is to surface avoidable problems earlier, while you can still act on them: unclear claims, weak logical connections, unsupported wording, formatting risks, and the kind of reviewer-style concerns worth checking before submission.
It is a Claude Code skill, installable two ways. For the Codex-first port, use paperjury-codex.
Option A: Claude Code plugin (marketplace route). From inside Claude Code:
/plugin marketplace add u7079256/paperjury
/plugin install paperjury@u7079256
Option B: clone as a skill. Clone the repo into the folder Claude Code reads skills from:
# macOS / Linux
git clone https://github.com/u7079256/paperjury ~/.claude/skills/paperjury
# Windows (PowerShell)
git clone https://github.com/u7079256/paperjury "$env:USERPROFILE\.claude\skills\paperjury"
(or under <project>/.claude/skills/ to scope it to one project). Claude Code auto-discovers it through SKILL.md and it shows up as the paperjury skill. node is required (the deterministic checks run on it); a LaTeX toolchain is optional (the real-compile and layout checks use it, and degrade honestly when it is absent). To verify an install, run npm run doctor from the skill folder: it checks repo integrity, required tools, and that your manuscript can be detected.
At the start of a PaperJury run, the skill performs a soft update check against stable GitHub release tags. If a newer tag exists, it prints how to update (re-run the plugin install, or git pull for clone installs); if GitHub is unreachable, it stays silent and continues. Set PAPERJURY_DISABLE_UPDATE_CHECK=1 to disable this reminder. After updating, start a new session so the updated skill content is loaded.
For Claude / coding agents: the deep "how to drive this" reference is docs/AGENT-GUIDE.md: install, the three modes and their triggers, the engine pipeline, the auto vs /goal distinction, and how the fan-out launches, written for an agent to read. Curious about the internals? Just point Claude at that file and ask.
Most writing tools only push your paper forward: they draft and they polish. None of them argues the other side of your claims the way a reviewer will. PaperJury is built around that gap, in four parts.
references/review-engine-v3.md).full (whole paper) or passage (one section / paragraph / claim)./goal (or config mode: auto) to run the review-revise loop unattended toward a verifiable goal./goal context or a project config mode: auto.spine and the reviewer assignment up front (the human steps), then the engine applies safe fixes under the bounded-aggressive + edit-safety policy, queues the rest, and runs multiple rounds until it stops: on clerk convergence, or an applied-quiescence / hard-limit backstop. See references/auto-mode.md.You don't run commands; you say what you want and the skill picks the mode.
Edit one thing (the everyday case → direct-edit):
<your idea>."Get the paper critiqued before submission (→ review):
<the claim you paste>."Harden it unattended toward a goal (→ auto, needs /goal):