by llmsresearch
Open source implementation and extension of Google Research’s PaperBanana for automated academic figures, diagrams, and research visuals, expanded to new domains like slide generation.
# Add to your Claude Code skills
git clone https://github.com/llmsresearch/paperbananaDisclaimer: This is an unofficial, community-driven open-source implementation of the paper "PaperBanana: Automating Academic Illustration for AI Scientists" by Dawei Zhu, Rui Meng, Yale Song, Xiyu Wei, Sujian Li, Tomas Pfister, and Jinsung Yoon (arXiv:2601.23265). This project is not affiliated with or endorsed by the original authors or Google Research. The implementation is based on the publicly available paper and may differ from the original system.
An agentic framework for generating publication-quality academic diagrams and statistical plots from text descriptions. Supports OpenAI (GPT-5.2 + GPT-Image-1.5), Azure OpenAI / Foundry, and Google Gemini providers.
paperbanana plot-batch runs many statistical plots from one manifest (CSV/JSON per item)paperbanana[pdf] / PyMuPDF), with per-page selectionpaperbanana studio) for diagrams, plots, evaluation, batch, and run browser/generate-diagram, /generate-plot, and /evaluate-diagramNo comments yet. Be the first to share your thoughts!
pip install paperbanana
Or install from source for development:
git clone https://github.com/llmsresearch/paperbanana.git
cd paperbanana
pip install -e ".[dev,openai,google]"
cp .env.example .env
# Edit .env and add your API key:
# OPENAI_API_KEY=your-key-here
# GOOGLE_API_KEY=your-key-here
#
# For Azure OpenAI / Foundry:
# OPENAI_BASE_URL=https://<resource>.openai.azure.com/openai/v1
#
# Optional Gemini overrides:
# GOOGLE_BASE_URL=https://your-gemini-proxy.example.com
# GOOGLE_VLM_MODEL=gemini-2.0-flash
# GOOGLE_IMAGE_MODEL=gemini-3-pro-image-preview
Or use the setup wizard for Gemini:
paperbanana setup
paperbanana generate \
--input examples/sample_inputs/transformer_method.txt \
--caption "Overview of our encoder-decoder architecture with sparse routing"
With input optimization and auto-refine:
paperbanana generate \
--input my_method.txt \
--caption "Overview of our encoder-decoder framework" \
--optimize --auto
Output is saved to outputs/run_<timestamp>/final_output.png along with all intermediate iterations and metadata.
Install the optional Gradio dependency, then start the app:
pip install 'paperbanana[studio]'
paperbanana studio
Open the URL shown in the terminal (default http://127.0.0.1:7860/). The Studio exposes the same workflows as the CLI: methodology diagrams, statistical plots, comparative evaluation, continuing a prior run, batch manifests (methodology or plot batch via the Batch tab), and a simple browser for run_* / batch_* output folders. Use --host, --port, --config, and --output-dir as needed.
PaperBanana implements a multi-agent pipeline with up to 7 specialized agents:
Phase 0 -- Input Optimization (optional, --optimize):
Phase 1 -- Linear Planning:
Phase 2 -- Iterative Refinement:
--auto)PaperBanana supports multiple VLM and image generation providers:
| Component | Provider | Model | Notes |
|-----------|----------|-------|-------|
| VLM (planning, critique) | OpenAI | gpt-5.2 | Default |
| Image Generation | OpenAI | gpt-image-1.5 | Default |
| VLM | Google Gemini | gemini-2.0-flash | Free tier |
| Image Generation | Google Gemini | gemini-3-pro-image-preview | Free tier |
| VLM / Image | OpenRouter | Any supported model | Flexible routing |
Azure OpenAI / Foundry endpoints are auto-detected — set OPENAI_BASE_URL to your endpoint.
Gemini-compatible gateways are also supported — set GOOGLE_BASE_URL when needed.
paperbanana generate -- Methodology Diagrams# Basic generation
paperbanana generate \
--input method.txt \
--caption "Overview of our framework"
# With input optimization and auto-refine
paperbanana generate \
--input method.txt \
--caption "Overview of our framework" \
--optimize --auto
# Continue the latest run with user feedback
paperbanana generate --continue \
--feedback "Make arrows thicker and colors more distinct"
# Continue a specific run
paperbanana generate --continue-run run_20260218_125448_e7b876 \
--iterations 3
# PDF as input (install PyMuPDF: pip install 'paperbanana[pdf]')
paperbanana generate \
--input paper.pdf \
--caption "Overview of our method" \
--pdf-pages "3-8"
| Flag | Short | Description |
|------|-------|-------------|
| --input | -i | Path to methodology text file or PDF (required for new runs) |
| --caption | -c | Figure caption / communicative intent (required for new runs) |
| --output | -o | Output image path (default: auto-generated in outputs/) |
| --iterations | -n | Number of Visualizer-Critic refinement rounds (default: 3) |
| --auto | | Loop until critic is satisfied (with --max-iterations safety cap) |
| --max-iterations | | Safety cap for --auto mode (default: 30) |
| --optimize | | Preprocess inputs with parallel context enrichment and caption sharpening |
| --continue | | Continue from the latest run in outputs/ |
| --continue-run | | Continue from a specific run ID |
| --feedback | | User feedback for the critic when continuing a run |
| --pdf-pages | | PDF input only: 1-based pages (e.g. 1-5, 2,4,6-8; default: all) |
| --vlm-provider | | VLM provider name (default: openai) |
| --vlm-model | | VLM model name (default: gpt-5.2) |
| --image-provider | | Image gen provider (default: openai_imagen) |
| --image-model | | Image gen model (default: gpt-image-1.5) |
| --format | -f | Output format: png, jpeg, or webp (default: png) |
| --config | | Path to YAML config file (see configs/config.yaml) |
| --verbose | -v | Show detailed agent pro