👋 Overview

AutoControl Arena is an automated framework for frontier AI safety evaluation, designed to uncover latent risks in advanced models by stress-testing within executable, pressure-rich environments.

The core idea is logic-narrative decoupling:

Logic layer: deterministic state transitions, tools, and environment mechanics are grounded in real executable Python code.
Narrative layer: open-ended social/contextual dynamics are generated by LLMs, but always conditioned on the grounded logical state.

This design addresses a central trade-off in agent safety evaluation by combining executable environments with LLM-driven narrative flexibility:

Manual Benchmarks are realistic but expensive and slow to scale.
In-Context Simulators are scalable but vulnerable to logic hallucination.

AutoControl Arena can automatically reconstruct and reproduce expert-designed risk-evaluation scenarios and target risky behaviors from organizations such as Anthropic, OpenAI, and Apollo Research.

Curious about the paper, architecture, and key findings? Please see our 📖 Project Page for a full tour, or read our paper.

⚡ Quick Start

1. Interactive UI (`aca`)

aca

The interactive UI lets you browse scenarios, pick profiles, choose stress levels, enable parallel repeats, and inspect previous run history from a menu.

2. Run a batch JSON from the interactive UI

aca is not limited to manual selection. It also includes a Run from JSON flow that lets you:

browse configs/batch_configs/examples/ interactively,
select an existing batch JSON file,
choose serial vs parallel execution,

Use this when you want the convenience of the interactive UI but already have a batch JSON prepared.

3. Command-mode CLI (`autocontrol`)

autocontrol --help

Use autocontrol when you want explicit subcommands such as run, batch, profiles, or prepare-envs. Unlike aca, it does not open the interactive menu by default.

🌐 Frontend Usage

Local results viewer

The results viewer is a separate local web UI for browsing completed runs, artifacts, summaries, and live execution details from results/.

Start it with:

python tools/results_viewer/serve.py --port 8000

Then open:

http://127.0.0.1:8000/viewer/

Optional flags:

# Bind to a different host
python tools/results_viewer/serve.py --host 0.0.0.0 --port 8000

# Point the viewer at a specific results root
python tools/results_viewer/serve.py \
  --results-dir results/autocontrol_arena \
  --port 8000

Use the results viewer when you want a browser-based inspection surface for run trajectories and artifacts.

🗂️ Repository Layout

.
├── configs/
│   ├── batch_configs/examples/     # public example batch JSON configs
│   ├── profiles/templates/         # public model/provider profile templates
│   ├── scenarios/benchmark/        # X-Bench YAML scenario definitions
│   └── settings/                   # runtime hyperparameters
├── docs/                           # public docs plus internal planning / design notes
├── runners/                        # CLI entrypoints and interactive UI
├── src/                            # agents, orchestration, infra, and core logic
├── tests/                          # unit and regression tests
├── tools/                          # result management utilities
├── results/                        # generated outputs under results/autocontrol_arena/
├── runtime_envs/                   # generated environments (auto-generated)
├── .env.example
└── pyproject.toml

⚙️ Setup

Prerequisites: Python 3.10+, API credentials for the model providers you want to test.

Step 1 — Install Dependencies

# Create and activate a virtual environment
python -m venv .venv
source .venv/bin/activate

# Install all dependencies (includes LiteLLM, the default model backend)
pip install -e .

# Optional: dev tools (pytest, ruff, mypy)
pip install -e ".[dev]"

This installation exposes two shell commands:

aca — interactive operator entrypoint
autocontrol — command-mode CLI for scripting and automation

Step 2 — Set API credentials

cp .env.example .env

Edit .env and fill in only the providers you need. Here are some examples:

# OpenAI
OPENAI_API_KEY=sk-xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx

# Anthropic
ANTHROPIC_API_KEY=sk-xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx

# Qwen / DashScope
DASHSCOPE_API_KEY=sk-xxxxxxxxxxxxxxxxxxxxxxxx

Step 3 — Configure model profiles

Public profile templates live under configs/profiles/templates/ and are loaded by profile name. The repository currently ships template profiles for gpt-4o and claude-3-7-sonnet.

Here is what a profile looks like:

# configs/profiles/templates/gpt-4o.yaml
api_base_url: https://api.openai.com/v1
api_key: ${OPENAI_API_KEY}
model_name: gpt-4o
provider: litellm
temperature: 0.7
max_tokens: 10240

The profile filename (without .yaml) is the name used in all CLI commands. The api_key field references an environment variable — set that variable in .env and nothing else needs to change. To add a new model, copy any existing profile and edit the three fields (api_base_url, api_key, model_name).

Step 4 — (Optional) Tune runtime settings

Global hyperparameters are in configs/settings/hyperparameters.yaml. Defaults work for most evaluations.

🧭 Recommended Paths

First time here: start with aca, browse the benchmark scenarios, and run one single scenario end-to-end.
Running scripted experiments: use autocontrol run ... or autocontrol batch ....
Inspecting completed runs in a browser: start the local results viewer with python tools/results_viewer/serve.py --port 8000.
Reading the system design: open docs/ARCHITECTURE.md.
Typical workflow:
1. Start aca
2. Choose a scenario and target profile
3. Select stress / temptation / environment-complexity settings
4. Choose single-run, repeat-run, or batch-oriented flow
5. Watch execution in the terminal surface
6. Open the results viewer if you want browser-based artifact inspection

👋 Overview

AutoControl Arena is an automated framework for frontier AI safety evaluation, designed to uncover latent risks in advanced models by stress-testing within executable, pressure-rich environments.

The core idea is logic-narrative decoupling:

Logic layer: deterministic state transitions, tools, and environment mechanics are grounded in real executable Python code.
Narrative layer: open-ended social/contextual dynamics are generated by LLMs, but always conditioned on the grounded logical state.

This design addresses a central trade-off in agent safety evaluation by combining executable environments with LLM-driven narrative flexibility:

Manual Benchmarks are realistic but expensive and slow to scale.
In-Context Simulators are scalable but vulnerable to logic hallucination.

Curious about the paper, architecture, and key findings? Please see our 📖 Project Page for a full tour, or read our paper.

⚡ Quick Start

1. Interactive UI (`aca`)

aca

The interactive UI lets you browse scenarios, pick profiles, choose stress levels, enable parallel repeats, and inspect previous run history from a menu.

2. Run a batch JSON from the interactive UI

aca is not limited to manual selection. It also includes a Run from JSON flow that lets you:

browse configs/batch_configs/examples/ interactively,
select an existing batch JSON file,
choose serial vs parallel execution,

Use this when you want the convenience of the interactive UI but already have a batch JSON prepared.

3. Command-mode CLI (`autocontrol`)

autocontrol --help

Use autocontrol when you want explicit subcommands such as run, batch, profiles, or prepare-envs. Unlike aca, it does not open the interactive menu by default.

🌐 Frontend Usage

Local results viewer

The results viewer is a separate local web UI for browsing completed runs, artifacts, summaries, and live execution details from results/.

Start it with:

python tools/results_viewer/serve.py --port 8000

Then open:

http://127.0.0.1:8000/viewer/

Optional flags:

# Bind to a different host
python tools/results_viewer/serve.py --host 0.0.0.0 --port 8000

# Point the viewer at a specific results root
python tools/results_viewer/serve.py \
  --results-dir results/autocontrol_arena \
  --port 8000

Use the results viewer when you want a browser-based inspection surface for run trajectories and artifacts.

🗂️ Repository Layout

.
├── configs/
│   ├── batch_configs/examples/     # public example batch JSON configs
│   ├── profiles/templates/         # public model/provider profile templates
│   ├── scenarios/benchmark/        # X-Bench YAML scenario definitions
│   └── settings/                   # runtime hyperparameters
├── docs/                           # public docs plus internal planning / design notes
├── runners/                        # CLI entrypoints and interactive UI
├── src/                            # agents, orchestration, infra, and core logic
├── tests/                          # unit and regression tests
├── tools/                          # result management utilities
├── results/                        # generated outputs under results/autocontrol_arena/
├── runtime_envs/                   # generated environments (auto-generated)
├── .env.example
└── pyproject.toml

⚙️ Setup

Prerequisites: Python 3.10+, API credentials for the model providers you want to test.

Step 1 — Install Dependencies

# Create and activate a virtual environment
python -m venv .venv
source .venv/bin/activate

# Install all dependencies (includes LiteLLM, the default model backend)
pip install -e .

# Optional: dev tools (pytest, ruff, mypy)
pip install -e ".[dev]"

This installation exposes two shell commands:

aca — interactive operator entrypoint
autocontrol — command-mode CLI for scripting and automation

Step 2 — Set API credentials

cp .env.example .env

Edit .env and fill in only the providers you need. Here are some examples:

# OpenAI
OPENAI_API_KEY=sk-xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx

# Anthropic
ANTHROPIC_API_KEY=sk-xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx

# Qwen / DashScope
DASHSCOPE_API_KEY=sk-xxxxxxxxxxxxxxxxxxxxxxxx

Step 3 — Configure model profiles

Public profile templates live under configs/profiles/templates/ and are loaded by profile name. The repository currently ships template profiles for gpt-4o and claude-3-7-sonnet.

Here is what a profile looks like:

# configs/profiles/templates/gpt-4o.yaml
api_base_url: https://api.openai.com/v1
api_key: ${OPENAI_API_KEY}
model_name: gpt-4o
provider: litellm
temperature: 0.7
max_tokens: 10240

Step 4 — (Optional) Tune runtime settings

Global hyperparameters are in configs/settings/hyperparameters.yaml. Defaults work for most evaluations.

🧭 Recommended Paths

First time here: start with aca, browse the benchmark scenarios, and run one single scenario end-to-end.
Running scripted experiments: use autocontrol run ... or autocontrol batch ....
Inspecting completed runs in a browser: start the local results viewer with python tools/results_viewer/serve.py --port 8000.
Reading the system design: open docs/ARCHITECTURE.md.
Typical workflow:
1. Start aca
2. Choose a scenario and target profile
3. Select stress / temptation / environment-complexity settings
4. Choose single-run, repeat-run, or batch-oriented flow
5. Watch execution in the terminal surface
6. Open the results viewer if you want browser-based artifact inspection

AutoControl-Arena

👋 Overview

⚡ Quick Start

1. Interactive UI (`aca`)

2. Run a batch JSON from the interactive UI

3. Command-mode CLI (`autocontrol`)

🌐 Frontend Usage

Local results viewer

🗂️ Repository Layout

⚙️ Setup

Step 1 — Install Dependencies

Step 2 — Set API credentials

Step 3 — Configure model profiles

Step 4 — (Optional) Tune runtime settings

🧭 Recommended Paths

Related Skills

AutoControl-Arena

👋 Overview

⚡ Quick Start

1. Interactive UI (`aca`)

2. Run a batch JSON from the interactive UI

3. Command-mode CLI (`autocontrol`)

🌐 Frontend Usage

Local results viewer

🗂️ Repository Layout

⚙️ Setup

Step 1 — Install Dependencies

Step 2 — Set API credentials

Step 3 — Configure model profiles

Step 4 — (Optional) Tune runtime settings

🧭 Recommended Paths

Related Skills

AutoControl-Arena

👋 Overview

⚡ Quick Start

1. Interactive UI (aca)

2. Run a batch JSON from the interactive UI

3. Command-mode CLI (autocontrol)

🌐 Frontend Usage

Local results viewer

🗂️ Repository Layout

⚙️ Setup

Step 1 — Install Dependencies

Step 2 — Set API credentials

Step 3 — Configure model profiles

Step 4 — (Optional) Tune runtime settings

🧭 Recommended Paths

Related Skills

AutoControl-Arena

👋 Overview

⚡ Quick Start

1. Interactive UI (aca)

2. Run a batch JSON from the interactive UI

3. Command-mode CLI (autocontrol)

🌐 Frontend Usage

Local results viewer

🗂️ Repository Layout

⚙️ Setup

Step 1 — Install Dependencies

Step 2 — Set API credentials

Step 3 — Configure model profiles

Step 4 — (Optional) Tune runtime settings

🧭 Recommended Paths

Related Skills

1. Interactive UI (`aca`)

3. Command-mode CLI (`autocontrol`)

1. Interactive UI (`aca`)

3. Command-mode CLI (`autocontrol`)