mcp-image

Name: mcp-image
Author: shinpr

Verified

MCP server for AI image generation and editing with automatic prompt optimization and quality presets. Powered by Gemini (Nano Banana 2 & Pro), with optional OpenAI GPT Image support.

139stars

23forks

TypeScript

Installation

# Add to your Claude Code skills
git clone https://github.com/shinpr/mcp-image

Getting Started

Guides for using ai agents skills like mcp-image.

Caveman: Cut Claude Token Use by 65%
How agent-side prompt compression works, when to use it, and when not to.
What is an AI Skills Marketplace?
Definitions, how marketplaces work, and how to choose between them in 2026.
Getting Started with AI Skills

Security ReportVerified

Last scanned: 5/30/2026

{
  "issues": [],
  "status": "PASSED",
  "scannedAt": "2026-05-30T16:32:17.089Z",
  "npmAuditRan": true,
  "pipAuditRan": true
}

README.md

Frequently Asked Questions

What is mcp-image?

mcp-image is an open-source ai agents skill for AI coding assistants such as Claude Code, Codex CLI, and ChatGPT, built by shinpr. MCP server for AI image generation and editing with automatic prompt optimization and quality presets. Powered by Gemini (Nano Banana 2 & Pro), with optional OpenAI GPT Image support. It has 139 GitHub stars.

Is mcp-image safe to use?

Yes. mcp-image passed SkillsLLM's automated security scan — a dependency vulnerability audit plus prompt-injection heuristics — with no high-severity issues. You can read the full report in the Security Report section on this page.

How do I install mcp-image?

Clone the repository with "git clone https://github.com/shinpr/mcp-image" and add it to your Claude Code skills directory (see the Installation section above).

What programming language is mcp-image written in?

mcp-image is primarily written in TypeScript. It is open-source under shinpr on GitHub, so you can review or fork the full source.

Are there alternatives to mcp-image?

Yes. SkillsLLM lists many other AI Agents skills you can browse and compare side by side. Open the AI Agents category from the badge at the top of this page, or use the Related Skills and comparison links further down to weigh mcp-image against similar tools.

Agentic AI for Beginners

Build your first AI agent from scratch - tool use, ReAct pattern, memory, deployment

41 minBeginner

Comments (0)

to leave a comment.

No comments yet. Be the first to share your thoughts!

Related Skills

superpowers

by obra

An agentic skills framework & software development methodology that works.

234,966

fireauto open-tag

MCP Image Generator 🍌

AI image generation and editing MCP server for Cursor, Claude Code, Codex, and any MCP-compatible tool — powered by Nano Banana 2 and Nano Banana Pro (Google Gemini), with optional OpenAI GPT Image support.

An MCP server that turns simple text prompts into high-quality images. Unlike a simple API wrapper, this server automatically enhances your prompt and configures sensible defaults for generation — you don't need to learn prompt engineering or tune settings. Just describe what you want.

How It Works

You: "cat on a roof"
        ↓
  Your AI assistant infers context
  (purpose, style, mood, resolution...)
        ↓
  MCP optimizes your prompt
  (adds lighting, composition, atmosphere, artistic details)
        ↓
  Image generation with smart defaults
  (grounding, consistency, resolution — all configured automatically)
        ↓
  High-quality image, zero effort

Your AI assistant interprets your intent — the style, purpose, and context behind your request. The MCP focuses on output quality by refining the prompt to meet a structured visual clarity standard and selecting appropriate generation settings. You just describe what you want.

The prompt optimizer uses a Subject–Context–Style framework (powered by Gemini 2.5 Flash by default, or OpenAI Responses when IMAGE_PROVIDER=openai) to fill in missing visual details — subject characteristics, environment, lighting, camera work — while preserving your original intent. It doesn't blindly add details: prompts that already meet the quality standard are left largely intact.

Example — what the optimizer does to a short prompt:

Input: "cat on a roof"

After optimization: "A sleek, midnight black cat, perched with poised elegance on the apex of a weathered, terracotta tile roof. Its emerald eyes, narrowed slightly, reflect the warm glow of a setting sun. Each individual tile is distinct, showing subtle variations in color and texture, with patches of moss clinging to the crevices. The cat's fur is sharply defined, catching the golden hour light, highlighting its sleek contours. In the background, the silhouettes of distant, old-world city buildings with ornate spires are softly blurred, bathed in a gradient of fiery orange, soft pink, and deep violet twilight. A gentle, ethereal mist begins to rise from the alleyways below, adding a touch of mystery. The composition is a medium shot, taken from a slightly low angle, emphasizing the cat's commanding presence against the vast sky. Photorealistic style, captured with a prime lens, wide aperture to create a beautiful bokeh, enhancing the depth of field."

Features

Built-in Prompt Optimization: Your simple prompt is automatically enriched with photographic and artistic details — lighting, composition, atmosphere — using Gemini 2.5 Flash by default, or OpenAI Responses when IMAGE_PROVIDER=openai. No prompt engineering skills required.
Optional OpenAI Provider: Set IMAGE_PROVIDER=openai to generate and edit images with OpenAI GPT Image models such as gpt-image-2.
Three Quality Tiers: Choose between fast iteration, balanced quality, or maximum fidelity with Nano Banana 2 (Gemini 3.1 Flash Image) and Nano Banana Pro (Gemini 3 Pro Image). See Quality Presets.
Image Editing: Transform existing images with natural language instructions (image-to-image) while preserving original style and visual consistency.
High-Resolution Output: Up to 4K image generation for professional-grade output with superior text rendering and fine details.
Flexible Aspect Ratios: From square (1:1) to ultra-wide (21:9) and ultra-tall (1:8) formats.
Character Consistency: Maintain consistent character appearance across multiple generations — ideal for storyboards, product shots, and visual series.
Advanced Capabilities:
- Google Search grounding for real-time factual accuracy
- World knowledge for photorealistic depictions of historical figures, landmarks, and factual scenarios
- Multi-image blending for composite scenes
- Purpose-aware generation (e.g., "cookbook cover" produces different results than "social media post")
Multiple Output Formats: PNG, JPEG, WebP support.

Agent Skill: Image Generation Prompt Guide

This project also provides a standalone Agent Skill (SKILL.md) that teaches AI assistants to write better image generation prompts — no MCP server or API key required.

Note: This skill does not generate images itself. It teaches your AI assistant to write better prompts for tools that already have built-in image generation (e.g., Cursor's native image generation).

Based on the Subject-Context-Style framework, covering prompt structure, visual details (lighting, textures, camera angles), advanced techniques (character consistency, composition), and image editing. Works with any image model (Gemini, GPT Image, Flux, Stable Diffusion, Midjourney, etc.).

Install

npx mcp-image skills install --path <target-directory>

The skill will be placed at <path>/image-generation/SKILL.md. Specify the skills directory for your AI tool:

# Cursor
npx mcp-image skills install --path ~/.cursor/skills

# Codex
npx mcp-image skills install --path ~/.codex/skills

# Claude Code
npx mcp-image skills install --path ~/.claude/skills

When to Use the Skill vs the MCP Server

	MCP Server	Agent Skill
Use when	Your AI tool does not have built-in image generation	Your AI tool already generates images natively
Requires	Gemini API key	Nothing
What it does	Generates images via Gemini API with automatic prompt optimization	Teaches the AI to write better prompts
Works with	MCP-compatible tools (Cursor, Claude Code, Codex, etc.)	Any tool supporting the Agent Skills open standard

Prerequisites

Node.js 22 or higher
Gemini API Key - Get yours at Google AI Studio for the default Gemini provider
OpenAI API Key - Get yours from OpenAI when using IMAGE_PROVIDER=openai
An MCP-compatible AI tool: Cursor, Claude Code, Codex, or others
Basic terminal/command line knowledge

Quick Start

1. Get Your Gemini API Key

Get your API key from Google AI Studio

To use OpenAI instead, get an OpenAI API key and set:

IMAGE_PROVIDER=openai
OPENAI_API_KEY=your_openai_api_key_here

OpenAI mode requires organization verification — see Using the OpenAI provider below for setup details and feature differences.

2. MCP Configuration

For Codex

Add to ~/.codex/config.toml:

[mcp_servers.mcp-image]
command = "npx"
args = ["-y", "mcp-image"]

[mcp_servers.mcp-image.env]
GEMINI_API_KEY = "your_gemini_api_key_here"
IMAGE_OUTPUT_DIR = "/absolute/path/to/images"

For OpenAI GPT Image from a local fork:

[mcp_servers.mcp-image]
command = "node"
args = ["/absolute/path/to/mcp-image/dist/index.js"]

[mcp_servers.mcp-image.env]
IMAGE_PROVIDER = "openai"
OPENAI_API_KEY = "your_openai_api_key_here"
IMAGE_OUTPUT_DIR = "/absolute/path/to/images"

For Cursor

Add to your Cursor settings:

Global (all projects): ~/.cursor/mcp.json
Project-specific: .cursor/mcp.json in your project root

{
  "mcpServers": {
    "mcp-image": {
      "command": "npx",
      "args": ["-y", "mcp-image"],
      "env": {
        "GEMINI_API_KEY": "your_gemini_api_key_here",
        "IMAGE_OUTPUT_DIR": "/absolute/path/to/images"
      }
    }
  }
}

For OpenAI GPT Image from a local fork:

{
  "mcpServers": {
    "mcp-image": {
      "command": "node",
      "args": ["/absolute/path/to/mcp-image/dist/index.js"],
      "env": {
        "IMAGE_PROVIDER": "openai",
        "OPENAI_API_KEY": "your_openai_api_key_here",
        "IMAGE_OUTPUT_DIR": "/absolute/path/to/images"
      }
    }
  }
}

For Claude Code

Run in your project directory to enable for that project:

cd /path/to/your/project
claude mcp add mcp-image --env GEMINI_API_KEY=your-api-key --env IMAGE_OUTPUT_DIR=/absolute/path/to/images -- npx -y mcp-image

Or add globally for all projects:

claude mcp add mcp-image --scope user --env GEMINI_API_KEY=your-api-key --env IMAGE_OUTPUT_DIR=/absolute/path/to/images -- npx -y mcp-image

For OpenAI GPT Image from a local fork:

npm install
npm run build
claude mcp add mcp-image --scope user \
  --env IMAGE_PROVIDER=openai \
  --env OPENAI_API_KEY=your-openai-api-key \
  --env IMAGE_OUTPUT_DIR=/absolute/path/to/images \
  -- node /absolute/path/to/mcp-image/dist/index.js

⚠️ Security Note: Never commit your API key to version control. Keep it secure and use environment-specific configuration.

📁 Path Requirements:

IMAGE_OUTPUT_DIR must be an absolute path (e.g., /Users/username/images, not ./images)
Defaults to ./output in the current working directory if not specified
Directory will be created automatically if it doesn't exist

Quality Presets

Choose the right balance of speed, quality, and cost:

Preset	Model	Best for	Speed
`fast` (default)	Nano Banana 2 (Gemini 3.1 Flash Image)	Quick iterations, drafts, high-volume generation	~30–40s
`balanced`	Nano Banana 2 + Thinking	Production images, good quality with reasonable speed	Medium
`quality`	Nano Banana Pro (Gemini 3 Pro Image)	Final deli