EurekAgent

Name: EurekAgent
Author: THU-Team-Eureka

Pending

EurekAgent: an autonomous research system for metric-driven tasks, built with Claude Code. Define the problem and metric. Get breakthrough results.

55stars

4forks

Python

Installation

# Add to your Claude Code skills
git clone https://github.com/THU-Team-Eureka/EurekAgent

Getting Started

Guides for using ai agents skills like EurekAgent.

Caveman: Cut Claude Token Use by 65%
How agent-side prompt compression works, when to use it, and when not to.
What is an AI Skills Marketplace?
Definitions, how marketplaces work, and how to choose between them in 2026.
Getting Started with AI Skills

README.md

Frequently Asked Questions

What is EurekAgent?

EurekAgent is an open-source ai agents skill for AI coding assistants such as Claude Code, Codex CLI, and ChatGPT, built by THU-Team-Eureka. EurekAgent: an autonomous research system for metric-driven tasks, built with Claude Code. Define the problem and metric. Get breakthrough results. It has 55 GitHub stars.

Is EurekAgent safe to use?

EurekAgent's catalog security scan is still queued. You can run an instant dependency and prompt-injection check now with the "Scan for vulnerabilities" button above.

How do I install EurekAgent?

Clone the repository with "git clone https://github.com/THU-Team-Eureka/EurekAgent" and add it to your Claude Code skills directory (see the Installation section above).

What programming language is EurekAgent written in?

EurekAgent is primarily written in Python. It is open-source under THU-Team-Eureka on GitHub, so you can review or fork the full source.

Are there alternatives to EurekAgent?

Yes. SkillsLLM lists many other AI Agents skills you can browse and compare side by side. Open the AI Agents category from the badge at the top of this page, or use the Related Skills and comparison links further down to weigh EurekAgent against similar tools.

Agentic AI for Beginners

Build your first AI agent from scratch - tool use, ReAct pattern, memory, deployment

41 minBeginner

Comments (0)

to leave a comment.

No comments yet. Be the first to share your thoughts!

Related Skills

superpowers

by obra

An agentic skills framework & software development methodology that works.

234,966

plumb-mcp not-fade-away

📰 News

2026/06/13 — EurekAgent has been accepted to the BAAI Agent4S workshop! Join us for our presentation at the BAAI conference on June 13th, 2026 in Beijing. Slides will be available soon.
2026/06/12 — v0.1.0 released!

🔍 Overview

We present EurekAgent, an agent system for metric-driven autonomous scientific discovery. Define your problem and evaluation criteria — EurekAgent coordinates off-the-shelf CLI agents to propose diverse approaches, implement them, run experiments, and iterate. Human intervention is optional but supported at every step.

https://github.com/user-attachments/assets/c5b45b20-7eec-454e-98c3-6880bcec878b

Highlights

Environment engineering first — provides strong CLI agents with the resources, constraints, artifacts, budgets, and human interfaces needed for reliable autonomous discovery.
End-to-end research loop — proposes approaches, implements code, evaluates submissions, and iterates toward better results.
Problem-defined evaluation — uses your INSTRUCTION.md, SUBMISSION_FORMAT.md, and private evaluate.py as the source of truth.
Isolated execution — runs agent work and grading in separate Docker containers for secure, sandboxed experiments.
Resumable long runs — flexibly interrupt and resume a run from persisted state.
User-friendly interfaces — optionally chat with agents through the TUI, and track live cost stats, score evolution, and full session logs in the web monitor.

🚀 Quick Start

1. Install Docker and Node.js 22+

Docker — follow the official guide for your platform. Then add your user to the docker group:

sudo usermod -aG docker $USER
# Check if the user is added to docker group
groups $USER

Node.js 22+ — the agent container is built on the node:22-bookworm image, so install a matching Node.js 22+ runtime on the host as well (from nodejs.org or via nvm) and confirm:

nvm install 22
node --version   # must be v22 or newer

2. Install Claude Code

EurekAgent drives the experiment loop through Claude Code. It runs both on your host (for the /generate-inputs skill and problem authoring) and inside the agent container (preinstalled by the Docker image below).

a) Install Claude Code on the host (requires Node.js 22+ from Step 2):

npm install -g @anthropic-ai/claude-code
claude --version   # sanity check

b) Authenticate and point Claude Code at your model endpoint. EurekAgent forwards these into the agent container, so configure them once in ~/.claude/settings.json under the "env" block:

{
  "env": {
    "ANTHROPIC_AUTH_TOKEN": "YOUR_KEY_HERE",
    "ANTHROPIC_BASE_URL": "YOUR_BASE_URL_HERE",
    "ANTHROPIC_DEFAULT_SONNET_MODEL": "glm-5.1",
    "API_TIMEOUT_MS": "3000000"
  },
  "model": "sonnet"
}

3. Install Python dependencies

# Install uv (if not already installed)
curl -LsSf https://astral.sh/uv/install.sh | sh
source ~/.bashrc

# Clone and enter the project
git clone https://github.com/THU-Team-Eureka/EurekAgent.git && cd EurekAgent

# Install uv-managed Python 3.12
uv python install 3.12.12

4. Pull the base image and build the container

docker pull node:22-bookworm
bash docker/build.sh

Verify the image is available:

docker images | grep eureka-agent

If you are behind a proxy or docker pull fails, see the Docker troubleshooting guide.

5. (Recommended) Configure MCP servers for web access

During a run the agent can search the web for problem context and read live pages. These MCP servers are optional — when absent, the agent falls back to Claude Code's built-in WebSearch. web-search-prime is intended for GLM users; users of other model providers can skip it or configure their preferred search MCP.

a) web-search-prime — structured web search for GLM users only

claude mcp add -s user -t http web-search-prime https://api.z.ai/api/mcp/web_search_prime/mcp --header "Authorization: Bearer YOUR_KEY_HERE"

b) playwright — fetch and read actual webpage content.:

claude mcp add playwright npx @playwright/mcp@latest
npx playwright install chromium        # pre-install the headless browser

EurekAgent ships a Playwright config at .claude/playwright-mcp.json (headless Chromium, sandbox flags, timeouts). It is mounted read-only into the agent container automatically — create or edit that file to match your network (e.g. add a proxy) if needed.

6. Run an example

bash examples/circle_packing/run.sh

🧠 Setting Up a New Problem

You can use the /generate-inputs skill in Claude Code to interactively generate all required files (INSTRUCTION.md, SUBMISSION_FORMAT.md, evaluate.py, run.sh) from a natural language description of your problem. Just type /generate-inputs and follow the prompts.

Each problem lives in its own directory under examples/. You need the following files:

Required Files

File	Purpose	Required?
`INSTRUCTION.md`	Problem description for the LLM agent	Yes
`SUBMISSION_FORMAT.md`	JSON schema for candidates + score semantics	Yes
`hidden_eval_dir/evaluate.py`	Private evaluator with `grade_submission` and `is_better`	Yes
`initial.py`	Starting code for the agent	Recommended
`run.sh`	Convenience script to launch a run	Recommended

evaluate.py Specification

The evaluator is the single source of truth for scoring and comparison. It must define two functions:

`grade_submission(submission_path: str, context: dict) -> dict`

Called by the secure grader server to score a candidate submission.

Parameters:
- submission_path: path to the JSON file the agent submitted
- context: dict with workspace_root, approach_id, metadata
Returns a dict with:
- score (float): the raw objective value. Do NOT negate. Return the value as-is (e.g., the C5 value for a minimization problem, or sum of radii for a maximization problem).
- valid (bool): whether the submission is valid
- message (str): human-readable feedback
- opt_target_met (bool, optional): whether an optimization target was met
- public_metrics (dict, optional): additional metrics for display
Invalid submissions: return a score that can never be "best". Use float("inf") for minimization problems, float("-inf") for maximization, or float("inf") for approach-target problems.

`is_better(new_score: float, old_score: float) -> bool`

Defines which score is better. Called by the system to compare scores for ranking, best-result tracking, and display.

Returns: True if new_score represents a better result than old_score
Examples:
- Minimization: return new_score < old_score
- Maximization: return new_score > old_score
- Approach target (e.g., π): return abs(new_score - 3.14159) < abs(old_score - 3.14159)

Both functions are required. The system will fail at startup if either is missing.

INSTRUCTION.md

Must clearly state:

The optimization objective and its direction (minimize, maximize, approach target, etc.)
Constraints and validation rules
Known best results (if any) or target score
The contract for the run() function

SUBMISSION_FORMAT.md

Must describe:

Required JSON keys and their types
Score semantics (e.g., "Score is the raw C5 value. Lower is better.")
Invalid submission behavior

run.sh

A convenience script. Must pass at minimum:

--problem: path to INSTRUCTION.md
--hidden-eval-dir: path to the directory co