paperbanana

Name: paperbanana
Author: llmsresearch

Verified

Open source implementation and extension of Google Research’s PaperBanana for automated academic figures, diagrams, and research visuals, expanded to new domains like slide generation.

2,096stars

311forks

Python

Installation

# Add to your Claude Code skills
git clone https://github.com/llmsresearch/paperbanana

Getting Started

Guides for using ai agents skills like paperbanana.

Caveman: Cut Claude Token Use by 65%
How agent-side prompt compression works, when to use it, and when not to.
What is an AI Skills Marketplace?
Definitions, how marketplaces work, and how to choose between them in 2026.
Getting Started with AI Skills

Security ReportVerified

Last scanned: 4/28/2026

{
  "issues": [],
  "status": "PASSED",
  "scannedAt": "2026-04-28T06:29:59.441Z",
  "semgrepRan": false,
  "npmAuditRan": true,
  "pipAuditRan": true
}

README.md

Frequently Asked Questions

What is paperbanana?

paperbanana is an open-source ai agents skill for AI coding assistants such as Claude Code, Codex CLI, and ChatGPT, built by llmsresearch. Open source implementation and extension of Google Research’s PaperBanana for automated academic figures, diagrams, and research visuals, expanded to new domains like slide generation. It has 2,096 GitHub stars.

Is paperbanana safe to use?

Yes. paperbanana passed SkillsLLM's automated security scan — a dependency vulnerability audit plus prompt-injection heuristics — with no high-severity issues. You can read the full report in the Security Report section on this page.

How do I install paperbanana?

Clone the repository with "git clone https://github.com/llmsresearch/paperbanana" and add it to your Claude Code skills directory (see the Installation section above).

What programming language is paperbanana written in?

paperbanana is primarily written in Python. It is open-source under llmsresearch on GitHub, so you can review or fork the full source.

Are there alternatives to paperbanana?

Yes. SkillsLLM lists many other AI Agents skills you can browse and compare side by side. Open the AI Agents category from the badge at the top of this page, or use the Related Skills and comparison links further down to weigh paperbanana against similar tools.

Agentic AI for Beginners

Build your first AI agent from scratch - tool use, ReAct pattern, memory, deployment

41 minBeginner

Comments (0)

to leave a comment.

No comments yet. Be the first to share your thoughts!

Related Skills

superpowers

by obra

An agentic skills framework & software development methodology that works.

234,966

Developers Also Liked

Based on votes and bookmarks from developers who liked this skill

arxiv-mcp-server

by blazickjp

A Model Context Protocol server for searching and analyzing arXiv papers

2,948

claude-code-subagents-collection openclaw-master-skills

Disclaimer: This is an unofficial, community-driven open-source implementation of the paper "PaperBanana: Automating Academic Illustration for AI Scientists" by Dawei Zhu, Rui Meng, Yale Song, Xiyu Wei, Sujian Li, Tomas Pfister, and Jinsung Yoon (arXiv:2601.23265). This project is not affiliated with or endorsed by the original authors or Google Research. The implementation is based on the publicly available paper and may differ from the original system.

An agentic framework for generating publication-quality academic diagrams and statistical plots from text descriptions. Supports OpenAI (GPT-5.2 + GPT-Image-1.5), Azure OpenAI / Foundry, Google Gemini, and Atlas Cloud providers.

Two-phase multi-agent pipeline with iterative refinement
Multiple VLM and image generation providers (OpenAI, Azure, Gemini, Atlas Cloud)
Input optimization layer for better generation quality
Auto-refine mode and run continuation with user feedback
CLI, Python API, and MCP server for IDE integration
Batch generation from a manifest file (YAML/JSON) for multiple diagrams in one run
Batch plots — paperbanana plot-batch runs many statistical plots from one manifest (CSV/JSON per item)
PDF inputs for methodology context (optional paperbanana[pdf] / PyMuPDF), with per-page selection
PaperBanana Studio — local Gradio web UI (paperbanana studio) for diagrams, plots, evaluation, batch, and run browser
Claude Code skills for /generate-diagram, /generate-plot, and /evaluate-diagram

Atlas Cloud

Atlas Cloud is a full-modal AI inference platform that gives developers a single AI API to access video generation, image generation, and LLM APIs. Instead of managing multiple vendor integrations, you connect once and get unified access to 300+ curated models across all modalities.

Check out Atlas Cloud's new coding plan promotion for more budget-friendly API access: https://www.atlascloud.ai/console/coding-plan

Quick Start

Try it in your browser: the Colab quickstart notebook walks through install → API key → diagram generation end-to-end, no local setup required.

Prerequisites

Python 3.10+
An OpenAI API key (platform.openai.com) or Azure OpenAI / Foundry endpoint
Or a Google Gemini API key (free, Google AI Studio)

Step 1: Install

pip install paperbanana

Or install from source for development:

git clone https://github.com/llmsresearch/paperbanana.git
cd paperbanana
pip install -e ".[dev,openai,google]"

Docker

Build the image from a clone of the repo and pass your API key at runtime:

docker build -t paperbanana .
docker run --rm -e GOOGLE_API_KEY paperbanana generate --help

To generate a diagram, mount your input and an outputs folder into /work:

docker run --rm -e GOOGLE_API_KEY \
  -v "$(pwd)/method.txt:/work/method.txt:ro" \
  -v "$(pwd)/outputs:/work/outputs" \
  paperbanana generate --input method.txt --caption "Overview of our framework"

Step 2: Get Your API Key

cp .env.example .env
# Edit .env and add your API key:
#   OPENAI_API_KEY=your-key-here
#   GOOGLE_API_KEY=your-key-here
#
# For Azure OpenAI / Foundry:
#   OPENAI_BASE_URL=https://<resource>.openai.azure.com/openai/v1
#
# Optional Gemini overrides:
#   GOOGLE_BASE_URL=https://your-gemini-proxy.example.com
#   GOOGLE_VLM_MODEL=gemini-2.5-flash
#   GOOGLE_IMAGE_MODEL=gemini-3-pro-image-preview

Or use the setup wizard for Gemini:

paperbanana setup

Step 3: Generate a Diagram

paperbanana generate \
  --input examples/sample_inputs/transformer_method.txt \
  --caption "Overview of our encoder-decoder architecture with sparse routing"

With input optimization and auto-refine:

paperbanana generate \
  --input my_method.txt \
  --caption "Overview of our encoder-decoder framework" \
  --optimize --auto

Output is saved to outputs/run_<timestamp>/final_output.png along with all intermediate iterations and metadata.

PaperBanana Studio (local web UI)

Install the optional Gradio dependency, then start the app:

pip install 'paperbanana[studio]'
paperbanana studio

Open the URL shown in the terminal (default http://127.0.0.1:7860/). The Studio exposes the same workflows as the CLI: methodology diagrams, statistical plots, comparative evaluation, continuing a prior run, batch manifests (methodology or plot batch via the Batch tab), and a simple browser for run_* / batch_* output folders. Use --host, --port, --config, and --output-dir as needed.

How It Works

PaperBanana implements a multi-agent pipeline with up to 7 specialized agents:

Phase 0 -- Input Optimization (optional, --optimize):

Input Optimizer runs two parallel VLM calls:
- Context Enricher structures raw methodology text into diagram-ready format (components, flows, groupings, I/O)
- Caption Sharpener transforms vague captions into precise visual specifications

Phase 1 -- Linear Planning:

Retriever selects the most relevant reference examples from a curated set of 13 methodology diagrams spanning agent/reasoning, vision/perception, generative/learning, and science/applications domains
Planner generates a detailed textual description of the target diagram via in-context learning from the retrieved examples
Stylist refines the description for visual aesthetics using NeurIPS-style guidelines (color palette, layout, typography)

Phase 2 -- Iterative Refinement:

Visualizer renders the description into an image
Critic evaluates the generated image against the source context and provides a revised description addressing any issues
Steps 4-5 repeat for a fixed number of iterations (default 3), or until the critic is satisfied (--auto)

Providers

PaperBanana supports multiple VLM and image generation providers:

Component	Provider	Model	Notes
VLM (planning, critique)	OpenAI	`gpt-5.2`	Default
Image Generation	OpenAI	`gpt-image-1.5`	Default
VLM	Atlas Cloud	`deepseek-ai/DeepSeek-V3-0324`	OpenAI-compatible chat endpoint
Image Generation	Atlas Cloud	`openai/gpt-image-2/text-to-image`	Async prediction API
VLM	Google Gemini	`gemini-2.5-flash`	Low cost
Image Generation	Google Gemini	`gemini-3-pro-image-preview`	$0.134/image (1K)
VLM / Image	OpenRouter	Any supported model	Flexible routing

Azure OpenAI / Foundry endpoints are auto-detected — set OPENAI_BASE_URL to your endpoint. Gemini-compatible gateways are also supported — set GOOGLE_BASE_URL when needed. Atlas Cloud uses ATLASCLOUD_BASE_URL=https://api.atlascloud.ai/v1 for chat and ATLASCLOUD_IMAGE_BASE_URL=https://api.atlascloud.ai/api/v1 for image generation.

Atlas Cloud official site: [https://www.atlascloud.ai/?utm_source=github&utm_medium=link&utm_campaign=paperbanana](https://ww