web-eval-agent

Name: web-eval-agent
Author: refreshdotdev

Verified

An MCP server that autonomously evaluates web applications.

1,241stars

109forks

Python

Installation

# Add to your Claude Code skills
git clone https://github.com/refreshdotdev/web-eval-agent

Getting Started

Guides for using mcp servers skills like web-eval-agent.

Best MCP Servers in 2026
Category-by-category picks: databases, dev tools, productivity, browser automation.
What is an AI Skills Marketplace?
Definitions, how marketplaces work, and how to choose between them in 2026.
Getting Started with AI Skills
First-time install walkthrough for Claude Code, Codex CLI, and ChatGPT.

Security ReportVerified

Last scanned: 4/29/2026

{
  "issues": [],
  "status": "PASSED",
  "scannedAt": "2026-04-29T06:26:17.149Z",
  "semgrepRan": false,
  "npmAuditRan": true,
  "pipAuditRan": false
}

README.md

Frequently Asked Questions

What is web-eval-agent?

web-eval-agent is an open-source mcp servers skill for AI coding assistants such as Claude Code, Codex CLI, and ChatGPT, built by refreshdotdev. An MCP server that autonomously evaluates web applications. It has 1,241 GitHub stars.

Is web-eval-agent safe to use?

Yes. web-eval-agent passed SkillsLLM's automated security scan — a dependency vulnerability audit plus prompt-injection heuristics — with no high-severity issues. You can read the full report in the Security Report section on this page.

How do I install web-eval-agent?

Clone the repository with "git clone https://github.com/refreshdotdev/web-eval-agent" and add it to your Claude Code skills directory (see the Installation section above).

What programming language is web-eval-agent written in?

web-eval-agent is primarily written in Python. It is open-source under refreshdotdev on GitHub, so you can review or fork the full source.

Are there alternatives to web-eval-agent?

Yes. SkillsLLM lists many other MCP Servers skills you can browse and compare side by side. Open the MCP Servers category from the badge at the top of this page, or use the Related Skills and comparison links further down to weigh web-eval-agent against similar tools.

MCP for Beginners

Build MCP servers that give AI assistants real capabilities

36 minBeginner

Comments (0)

to leave a comment.

No comments yet. Be the first to share your thoughts!

Related Skills

ECC

by affaan-m

The agent harness performance optimization system. Skills, instincts, memory, security, and research-first development for Claude Code, Codex, Opencode, Cursor and beyond.

228,318

Popular in MCP Servers

Top skills in this category by stars

Scrapling

by D4Vinci

🕷️ An adaptive Web Scraping framework that handles everything from a single request to a full-scale crawl!

69,067

wenyan-mcp jupyter-mcp-server

⚠️ PROJECT HAS BEEN SUNSET ⚠️

This project has been discontinued. We're building something new at withrefresh.com

🚀 operative.sh web-eval-agent MCP Server

Let the coding agent debug itself, you've got better things to do.

Demo

🔥 Supercharge Your Debugging

operative.sh's MCP Server launches a browser-use powered agent to autonomously execute and debug web apps directly in your code editor.

⚡ Features

🌐 Navigate your webapp using BrowserUse (2x faster with operative backend)
📊 Capture network traffic - requests are intelligently filtered and returned into the context window
🚨 Collect console errors - captures logs & errors
🤖 Autonomous debugging - the Cursor agent calls the web QA agent mcp server to test if the code it wrote works as epected end-to-end.

🧰 MCP Tool Reference

Tool	Purpose
`web_eval_agent`	🤖 Automated UX evaluator that drives the browser, captures screenshots, console & network logs, and returns a rich UX report.
`setup_browser_state`	🔒 Opens an interactive (non-headless) browser so you can sign in once; the saved cookies/local-storage are reused by subsequent `web_eval_agent` runs.

Key arguments

web_eval_agent
- url (required) – address of the running app (e.g. http://localhost:3000)
- task (required) – natural-language description of what to test ("run through the signup flow and note any UX issues")
- headless_browser (optional, default false) – set to true to hide the browser window
setup_browser_state
- url (optional) – page to open first (handy to land directly on a login screen)

You can trigger these tools straight from your IDE chat, for example:

Evaluate my app at http://localhost:3000 – run web_eval_agent with the task "Try the full signup flow and report UX issues".

🏁 Quick Start

Easy Setup with One-Click Integration

Get your API key (free) - when you create your API key, you'll see:
- "Add to Cursor" button with a deeplink for instant Cursor installation
- Prefilled Claude Code command with your API key automatically included

Manual Setup (macOS/Linux)

Pre-requisites (typically not needed):

brew: /bin/bash -c "$(curl -fsSL https://raw.githubusercontent.com/Homebrew/install/HEAD/install.sh)"
npm: (brew install npm)
jq: brew install jq

Run the installer after getting an api key (free)
- Installs playwright
- Installs uv
- Inserts JSON into your code editor (Cursor/Cline/Windsurf) for you!

curl -LSf https://operative.sh/install.sh -o install.sh && bash install.sh && rm install.sh

Visit your favorite IDE and restart to apply the changes
Send a prompt in chat mode to call the web eval agent tool! e.g.

Test my app on http://localhost:3000. Use web-eval-agent.

🛠️ Manual Installation

Get your API key at operative.sh/mcp
Install uv

curl -LsSf https://astral.sh/uv/install.sh | sh

Source environment variables after installing UV

Mac

source ~/.zshrc

Linux

source ~/.bashrc

Install playwright:

npm install -g chromium playwright && uvx --with playwright playwright install --with-deps

Add below JSON to your relevant code editor with api key
Restart your code editor

🔃 Updating

uv cache clean
refresh MCP server

    "web-eval-agent": {
      "command": "uvx",
      "args": [
        "--refresh-package",
        "webEvalAgent",
        "--from",
        "git+https://github.com/Operative-Sh/web-eval-agent.git",
        "webEvalAgent"
      ],
      "env": {
        "OPERATIVE_API_KEY": "<YOUR_KEY>"
      }
    }

Operative Discord Server

🛠️ Manual Installation (Mac + Cursor/Cline/Windsurf)

Get your API key at operative.sh/mcp
Install uv

curl -LsSf https://astral.sh/uv/install.sh | sh)

Install playwright:

npm install -g chromium playwright && uvx --with playwright playwright install --with-deps

Add below JSON to your relevant code editor with api key
Restart your code editor

Manual Installation (Windows + Cursor/Cline/Windsurf)

We're refining this, please open an issue if you have any issues!

Do all this in your code editor terminal
curl -LSf https://operative.sh/install.sh -o install.sh && bash install.sh && rm install.sh
Get your API key at operative.sh/mcp
Install uv (curl -LsSf https://astral.sh/uv/install.sh | sh)
uvx --from git+https://github.com/Operative-Sh/web-eval-agent.git playwright install
Restart code editor

🚨 Issues

Updates aren't being received in code editors, update or reinstall for latest version: Run uv cache clean for latest
Any issues feel free to open an Issue on this repo or in the discord!
5/5 - static apps without changes weren't screencasting, fixed! uv clean + restart to get fix

Changelog

4/29 - Agent overlay update - pause/play/stop agent run in the browser

📋 Example MCP Server Output Report

📊 Web Evaluation Report for http://localhost:5173 complete!
📝 Task: Test the API-key deletion flow by navigating to the API Keys section, deleting a key, and judging the UX.

🔍 Agent Steps
  📍 1. Navigate → http://localhost:5173
  📍 2. Click     "Login"        (button index 2)
  📍 3. Click     "API Keys"     (button index 4)
  📍 4. Click     "Create Key"   (button index 9)
  📍 5. Type      "Test API Key" (input index 2)
  📍 6. Click     "Done"         (button index 3)
  📍 7. Click     "Delete"       (button index 10)
  📍 8. Click     "Delete"       (confirm index 3)
🏁 Flow tested successfully – UX felt smooth and intuitive.

🖥️ Console Logs (10)
  1. [debug] [vite] connecting…
  2. [debug] [vite] connected.
  3. [info]  Download the React DevTools …
     …

🌐 Network Requests (10)
  1. GET /src/pages/SleepingMasks.tsx                   304
  2. GET /src/pages/MCPRegistryRegistry.tsx             304
     …

⏱️ Chronological Timeline
  01:16:23.293 🖥️ Console [debug] [vite] connecting…
  01:16:23.303 🖥️ Console [debug] [vite] connected.
  01:16:23.312 ➡️ GET /src/pages/SleepingMasks.tsx
  01:16:23.318 ⬅️ 304 /src/pages/SleepingMasks.tsx
     …
  01:17:45.038 🤖 🏁 Flow finished – deletion verified
  01:17:47.038 🤖 📋 Conclusion repeated above
👁️  See the "Operative Control Center" dashboard for live logs.

Star History

Built with <3 @ operative.sh