by kubeflow
MCP Server for Apache Spark History Server. The bridge between Agentic AI and Apache Spark.
# Add to your Claude Code skills
git clone https://github.com/kubeflow/mcp-apache-spark-history-server🤖 Connect AI agents to Apache Spark History Server for intelligent job analysis and performance monitoring
Transform your Spark infrastructure monitoring with AI! This Model Context Protocol (MCP) server enables AI agents to analyze job performance, identify bottlenecks, and provide intelligent insights from your Spark History Server data.
Spark History Server MCP bridges AI agents with your existing Apache Spark infrastructure, enabling:
No comments yet. Be the first to share your thoughts!
📺 See it in action:
graph TB
A[🤖 AI Agent/LLM] --> F[📡 MCP Client]
B[🦙 LlamaIndex Agent] --> F
C[🌐 LangGraph] --> F
D[�️ Claudep Desktop] --> F
E[🛠️ Amazon Q CLI] --> F
F --> G[⚡ Spark History MCP Server]
G --> H[🔥 Prod Spark History Server]
G --> I[🔥 Staging Spark History Server]
G --> J[🔥 Dev Spark History Server]
H --> K[📄 Prod Event Logs]
I --> L[📄 Staging Event Logs]
J --> M[📄 Dev Event Logs]
🔗 Components:
The package is published to PyPI: https://pypi.org/project/mcp-apache-spark-history-server/
git clone https://github.com/kubeflow/mcp-apache-spark-history-server.git
cd mcp-apache-spark-history-server
# Install Task (if not already installed)
brew install go-task # macOS, see https://taskfile.dev/installation/ for others
# Setup and start testing
task start-spark-bg # Start Spark History Server with sample data (default Spark 3.5.5)
# Or specify a different Spark version:
# task start-spark-bg spark_version=3.5.2
task start-mcp-bg # Start MCP Server
# Optional: Opens MCP Inspector on http://localhost:6274 for interactive testing
# Requires Node.js: 22.7.5+ (Check https://github.com/modelcontextprotocol/inspector for latest requirements)
task start-inspector-bg # Start MCP Inspector
# When done, run `task stop-all`
If you just want to run the MCP server without cloning the repository:
# Run with uv without installing the module
uvx --from mcp-apache-spark-history-server spark-mcp
# OR run with pip and python. Use of venv is highly encouraged.
python3 -m venv spark-mcp && source spark-mcp/bin/activate
pip install mcp-apache-spark-history-server
python3 -m spark_history_mcp.core.main
# Deactivate venv
deactivate
Edit config.yaml for your Spark History Server:
Config File Options:
--config /path/to/config.yaml or -c /path/to/config.yamlSHS_MCP_CONFIG=/path/to/config.yaml./config.yamlservers:
local:
default: true
url: "http://your-spark-history-server:18080"
auth: # optional
username: "user"
password: "pass"
include_plan_description: false # optional, whether to include SQL execution plans by default (default: false)
mcp:
transports:
- streamable-http # streamable-http or stdio.
port: "18888"
debug: true
The repository includes real Spark event logs for testing:
spark-bcec39f6201b42b9925124595baad260 - ✅ Successful ETL jobspark-110be3a8424d4a2789cb88134418217b - 🔄 Data processing jobspark-cc4d115f011443d787f03a71a476a745 - 📈 Multi-stage analytics jobSee TESTING.md for using them.


Note: These tools are subject to change as we scale and improve the performance of the MCP server.
The MCP server provides 18 specialized tools organized by analysis patterns. LLMs can intelligently select and combine these tools based on user queries:
Basic application metadata and overview
| 🔧 Tool | 📝 Description |
|---------|----------------|
| list_applications | 📋 Get a list of all applications available on the Spark History Server with optional filtering by status, date ranges, and limits |
| get_application | 📊 Get detailed information about a specific Spark application including status, resource usage, duration, and attempt details |
Job-level performance analysis and identification
| 🔧 Tool | 📝 Description |
|---------|----------------|
| list_jobs | 🔗 Get a list of all jobs for a Spark application with optional status filtering |
| list_slowest_jobs | ⏱️ Get the N slowest jobs for a Spark application (excludes running jobs by default) |
Stage-level performance deep dive and task metrics
| 🔧 Tool | 📝 Description |
|---------|----------------|
| list_stages | ⚡ Get a list of all stages for a Spark application with optional status filtering and summaries |
| list_slowest_stages | 🐌 Get the N slowest stages for a Spark application (excludes running stages by default) |
| get_stage | 🎯 Get information about a specific stage with optional attempt ID and summary metrics |
| get_stage_task_summary | 📊 Get statistical distributions of task metrics for a specific stage (execution times, memory usage, I/O metrics) |
Resource utilization, executor performance, and allocation tracking
| 🔧 Tool | 📝 Description |
|---------|----------------|
| list_executors | 🖥️ Get executor information with optional inactive executor inclusion |
| get_executor | 🔍 Get information about a specific executor including resource allocation, task statistics, and performance metrics |
| get_executor_summary | 📈 Aggregates metrics across all executors (memory usage, disk usage, task counts, performance metrics) |
| get_resource_usage_timeline | 📅 Get chronological view of resource allocation and usage patterns including executor additions/removals |
Spark configuration, environment variables, and runtime settings
| 🔧 Tool | 📝 Description |
|---------|----------------|
| get_environment | ⚙️ Get comprehensive Spark runtime configuration including JVM info, Spark properties, system properties, and classpath |
SQL performance analysis and execution plan comparison
| 🔧 Tool | 📝 Description |
|---------|----------------|
| list_slowest_sql_queries | 🐌 Get the top N slowest SQL queries for an application with detailed execution metrics and optional plan descriptions |
| compare_sql_execution_plans | 🔍 Compare SQL execution plans between two Spark jobs, analyzing logical/physical plans and execution metrics |
Intelligent bottleneck identification and performance recommendations
| 🔧 Tool | 📝 Description |
|---------|----------------|
| get_job_bottlenecks | 🚨 Identify performance bottlenecks by analyzing stages, tasks, and executors with actionable recommendations |
Cross-application comparison for regression detection and optimization
| 🔧 Tool | 📝 Description |
|---------|----------------|
| compare_job_environments | ⚙️ Compare Spark environment configurations between two jobs to identify differences in properties and settings |
| compare_job_performance | 📈 Compare performance metrics between two Spark jobs including execution times, resource usage, and task distribution |
Query Pattern Examples:
list_applicationsget_job_bottlenecks + list_slowest_stages + get_executor_summarycompare_job_performance + compare_job_environmentsget_stage + get_stage_task_summaryget_resource_usage_timeline + get_executor_summarylist_slowest_sql_queries + compare_sql_execution_plansIf you are an existing AWS user looking to analyze your Spark Applications, we provide detailed setup guides for: