mcp-apache-spark-history-server

Name: mcp-apache-spark-history-server
Author: kubeflow

Verified

MCP Server and CLI for Apache Spark History Server. Debug Spark applications from AI agents, scripts, or the terminal.

182stars

65forks

Python

Installation

# Add to your Claude Code skills
git clone https://github.com/kubeflow/mcp-apache-spark-history-server

Getting Started

Guides for using ai agents skills like mcp-apache-spark-history-server.

Caveman: Cut Claude Token Use by 65%
How agent-side prompt compression works, when to use it, and when not to.
What is an AI Skills Marketplace?
Definitions, how marketplaces work, and how to choose between them in 2026.
Getting Started with AI Skills

Security ReportVerified

Last scanned: 5/30/2026

{
  "issues": [],
  "status": "PASSED",
  "scannedAt": "2026-05-30T15:50:27.581Z",
  "npmAuditRan": true,
  "pipAuditRan": true
}

README.md

Frequently Asked Questions

What is mcp-apache-spark-history-server?

mcp-apache-spark-history-server is an open-source ai agents skill for AI coding assistants such as Claude Code, Codex CLI, and ChatGPT, built by kubeflow. MCP Server and CLI for Apache Spark History Server. Debug Spark applications from AI agents, scripts, or the terminal. It has 182 GitHub stars.

Is mcp-apache-spark-history-server safe to use?

Yes. mcp-apache-spark-history-server passed SkillsLLM's automated security scan — a dependency vulnerability audit plus prompt-injection heuristics — with no high-severity issues. You can read the full report in the Security Report section on this page.

How do I install mcp-apache-spark-history-server?

Clone the repository with "git clone https://github.com/kubeflow/mcp-apache-spark-history-server" and add it to your Claude Code skills directory (see the Installation section above).

What programming language is mcp-apache-spark-history-server written in?

mcp-apache-spark-history-server is primarily written in Python. It is open-source under kubeflow on GitHub, so you can review or fork the full source.

Are there alternatives to mcp-apache-spark-history-server?

Yes. SkillsLLM lists many other AI Agents skills you can browse and compare side by side. Open the AI Agents category from the badge at the top of this page, or use the Related Skills and comparison links further down to weigh mcp-apache-spark-history-server against similar tools.

Agentic AI for Beginners

Build your first AI agent from scratch - tool use, ReAct pattern, memory, deployment

41 minBeginner

Comments (0)

to leave a comment.

No comments yet. Be the first to share your thoughts!

Related Skills

superpowers

by obra

An agentic skills framework & software development methodology that works.

234,966

facts Claude-Zeroclaw

Kubeflow Spark AI Toolkit

Connect AI agents and engineers to Apache Spark History Server for intelligent job analysis, performance monitoring, and investigation

[!IMPORTANT]

✨ NEW — Spark History Server CLI is now available

A standalone Go binary that queries Spark History Server directly from your terminal — no MCP, no AI framework, no daemon process. Inspect jobs, compare runs, investigate failures, and script against the Spark REST API.

Get started with the SHS CLI →

This project provides two interfaces to your Spark History Server data:

	🛠️ SHS CLI (`shs`)	⚡ MCP Server
For	Engineers, shell scripts, CI/CD, coding agents	AI agents and MCP-compatible clients
Mental model	"I know the command I want to run"	"Agent, investigate this Spark app"
Install	Single static binary — no dependencies	Python 3.12+, uv
Get started	CLI docs →	MCP docs →

📺 See it in action:

🛠️ SHS CLI (`shs`) — For Engineers & Scripts

A standalone Go binary. Query your Spark History Server directly from the terminal, shell scripts, or CI/CD pipelines. Also works as a skill for coding agents like Claude Code and Kiro.

Install

# Auto-detect latest version, OS, and architecture
VERSION=$(curl -s https://api.github.com/repos/kubeflow/mcp-apache-spark-history-server/releases | grep -m1 '"tag_name": "cli/' | cut -d'"' -f4 | sed 's|cli/||')
OS=$(uname -s | tr '[:upper:]' '[:lower:]')
ARCH=$(uname -m)
[ "$ARCH" = "x86_64" ] && ARCH="amd64"
[ "$ARCH" = "aarch64" ] && ARCH="arm64"

curl -sSL "https://github.com/kubeflow/mcp-apache-spark-history-server/releases/download/cli%2F${VERSION}/shs-${VERSION}-${OS}-${ARCH}.tar.gz" | tar xz
sudo mv shs /usr/local/bin/

Quick Start

# Generate a config file
shs setup config > config.yaml   # then set your Spark History Server URL

# Explore applications
shs apps
shs jobs -a APP_ID --status failed
shs stages -a APP_ID --sort duration
shs compare apps --app-a APP1 --app-b APP2

# Use as a skill with Claude Code or Kiro
shs setup skill > ~/.claude/skills/spark-history.md

CLI documentation for full usage, or check out a real-world example of Claude Code comparing two TPC-DS 3TB benchmark runs.

⚡ MCP Server — For AI Agents

An MCP (Model Context Protocol) server that exposes Spark History Server data as tools for AI agents. Agents query your Spark infrastructure using natural language — the server handles tool selection, multi-server routing, and structured data retrieval.

Use the MCP server when you want an AI agent to conduct multi-step investigations, synthesize findings across tools, or answer natural-language questions about your Spark applications.

Install

# Run directly with uvx (no install needed)
uvx --from mcp-apache-spark-history-server spark-mcp

# Or install with pip
uv tool install mcp-apache-spark-history-server
spark-mcp

The package is published to PyPI.

Coding Agent Integration

Register the server with a single command. Both examples run it over stdio via uvx. With no config file present, the server defaults to a Spark History Server at http://localhost:18080; point it elsewhere with a config file or SHS_SERVERS__LOCAL__URL.

Claude Code (claude mcp add):

claude mcp add --env SHS_MCP__TRANSPORT=stdio --env SHS_SERVERS__LOCAL__URL=http://localhost:18080\
  --transport stdio spark-history \
  -- uvx --from mcp-apache-spark-history-server spark-mcp

Kiro CLI (kiro-cli mcp add):

kiro-cli mcp add --name spark-history --command uvx \
  --args --from --args mcp-apache-spark-history-server --args spark-mcp \
  --env SHS_MCP__TRANSPORT=stdio --env SHS_SERVERS__LOCAL__URL=http://localhost:18080

Verify in either client with claude mcp list / kiro-cli mcp list, then ask the agent to "list the available Spark applications."

The server also ships prompts — guided, multi-step workflows you run as a command. In Claude Code: /mcp__spark-history__investigate_failure <app_id>. In Kiro CLI: /prompts investigate_failure (or @investigate_failure). See Prompts for the full list and arguments.

Passing server flags and environment

The commands above have two layers: the client's own options and the arguments/environment forwarded to spark-mcp. spark-mcp itself takes a single flag, --config / -c; everything else is set through SHS_* environment variables.

To pass to `spark-mcp`…	Claude Code	Kiro CLI
A flag (e.g. `--config`)	append after `--`: `… spark-mcp --config /path/config.yaml`	add `--args` pairs: `--args --config --args /path/config.yaml`
An environment variable	`--env KEY=value` (before `--transport`)	`--env KEY=value`

For example, to point at a remote Spark History Server with an explicit config file:

# Claude Code
claude mcp add --env SHS_MCP__TRANSPORT=stdio --transport stdio spark-history \
  -- uvx --from mcp-apache-spark-history-server spark-mcp --config ~/.config/spark-mcp/config.yaml

# Kiro CLI
kiro-cli mcp add --name spark-history --command uvx \
  --args --from --args mcp-apache-spark-history-server --args spark-mcp \
  --args --config --args ~/.config/spark-mcp/config.yaml \
  --env SHS_MCP__TRANSPORT=stdio

Configure

Basic configuration below. Create a file named config.yaml:

servers:
  local:
    default: true
    url: "http://your-spark-history-server:18080"
    auth:            # optional
      username: "user"
      password: "pass"
    include_plan_description: false   # include SQL plans by default (default: false)
mcp:
  transport: "streamable-http"   # or: stdio
  port: "18888"
  debug: false

Config file location

The server looks for its config file in the following order and uses the first one it finds:

The --config / -c flag (e.g. spark-mcp --config /path/to/config.yaml)
The SHS_MCP_CONFIG environment variable
./config.yaml in the current working directory
~/.config/spark-mcp/config.yaml (honors $XDG_CONFIG_HOME when set)

If none exist, the server starts with built-in defaults that can be overridden by SHS_* environment variables. When a path is given explicitly via the flag or SHS_MCP_CONFIG but the file is missing, the server fails fast instead of falling back.

Tip for MCP clients: when the server is launched by an MCP client (Claude Desktop, Kiro, etc.), the working directory is not guaranteed, so a ./config.yaml may not be found. Prefer --config / SHS_MCP_CONFIG, or place the file at ~/.config/spark-mcp/config.yaml.

Configurations can be overriden with environment variables. Nesting levels are separated by a double underscore (__), so field names and server names may themselves contain single underscores (e.g. SHS_SERVERS__MY_SERVER__URL maps to servers.my_server.url).

SHS_MCP__PORT          Port for MCP server (default: 18888)
SHS_MCP__TRANSPORT     Transport mode: streamable-http or stdio
SHS_MCP__DEBUG         Enable debug mode (default: false)
SHS_MCP__ADDRESS       Bind address (default: localhost)
SHS_SERVERS__*__URL     URL for a specific server
SHS_SERVERS__*__AUTH__USERNAME
SHS_SERVERS__*__AUTH__PASSWORD
SHS_SERVERS__*__AUTH__TOKEN
SHS_SERVERS__*__VERIFY_SSL
SHS_SERVERS__*__TIMEOUT
SHS_SERVERS__*__EMR_CLUSTER_ARN
SHS_SERVERS__*__INCLUDE_PLAN_DESCRIPTION

Multi-Server Setup

Configure multiple Spark History Servers and route queries to specific ones:

servers:
  production:
    default: true
    url: "http://prod-spark-history:18080"
    auth:
      username: "user"
      password: "pass"
  staging:
    url: "http://staging-spark-history:18080"

Agents can target a specific server per query:

"Get application <app_id> from the production server"

🏗️ Architecture

graph TB
    subgraph Clients
        A[🤖 AI Agent / LLM]
        B[👩‍💻 Engineer / Script / CI]
        C[🔧 Coding Agent - Claude Code / Kiro]
    end

    subgraph "Kubeflow Spark AI Toolkit"
        D[⚡ MCP Server]
        E[🛠️ CLI - shs]
    end

    subgraph "Spark History Servers"
        F[🔥 Production]
        G[🔥 Staging / Dev]
    end

    A -->|MCP Protocol| D
    B -->|Terminal commands| E
    C -->|shs skill file| E

    D -->|REST API| F
    D -->|REST API| G
    E -->|REST API| F
    E -->|REST API| G

Connect an AI Agent

Agent	Transport	Guide
**Claude Desk

mcp-apache-spark-history-server

Frequently Asked Questions

What is mcp-apache-spark-history-server?

Is mcp-apache-spark-history-server safe to use?

How do I install mcp-apache-spark-history-server?

What programming language is mcp-apache-spark-history-server written in?

Are there alternatives to mcp-apache-spark-history-server?

Related Skills

Kubeflow Spark AI Toolkit

✨ NEW — Spark History Server CLI is now available

🛠️ SHS CLI (shs) — For Engineers & Scripts

Install

Quick Start

⚡ MCP Server — For AI Agents

Install

Coding Agent Integration

Passing server flags and environment

Configure

Config file location

Multi-Server Setup

🏗️ Architecture

Connect an AI Agent

🛠️ SHS CLI (`shs`) — For Engineers & Scripts