by CosmosYi
π‘οΈAutoControl Arena: Synthesizing Executable Test Environments for Frontier AI Risk Evaluation
# Add to your Claude Code skills
git clone https://github.com/CosmosYi/AutoControl-ArenaAutoControl Arena is an automated framework for frontier AI safety evaluation, designed to uncover latent risks in advanced models by stress-testing within executable, pressure-rich environments.
The core idea is logic-narrative decoupling:
This design addresses a central trade-off in agent safety evaluation by combining executable environments with LLM-driven narrative flexibility:
AutoControl Arena can automatically reconstruct and reproduce expert-designed risk-evaluation scenarios and target risky behaviors from organizations such as Anthropic, OpenAI, and Apollo Research.
Curious about the paper, architecture, and key findings? Please see our π Project Page for a full tour, or read our paper.
aca)aca
The interactive UI lets you browse scenarios, pick profiles, choose stress levels, enable parallel repeats, and inspect previous run history from a menu.
aca is not limited to manual selection. It also includes a Run from JSON flow that lets you:
configs/batch_configs/examples/ interactively,Use this when you want the convenience of the interactive UI but already have a batch JSON prepared.
autocontrol)autocontrol --help
Use autocontrol when you want explicit subcommands such as run, batch, profiles, or prepare-envs. Unlike aca, it does not open the interactive menu by default.
The results viewer is a separate local web UI for browsing completed runs, artifacts, summaries, and live execution details from results/.
Start it with:
python tools/results_viewer/serve.py --port 8000
Then open:
http://127.0.0.1:8000/viewer/
Optional flags:
# Bind to a different host
python tools/results_viewer/serve.py --host 0.0.0.0 --port 8000
# Point the viewer at a specific results root
python tools/results_viewer/serve.py \
--results-dir results/autocontrol_arena \
--port 8000
Use the results viewer when you want a browser-based inspection surface for run trajectories and artifacts.
.
βββ configs/
β βββ batch_configs/examples/ # public example batch JSON configs
β βββ profiles/templates/ # public model/provider profile templates
β βββ scenarios/benchmark/ # X-Bench YAML scenario definitions
β βββ settings/ # runtime hyperparameters
βββ docs/ # public docs plus internal planning / design notes
βββ runners/ # CLI entrypoints and interactive UI
βββ src/ # agents, orchestration, infra, and core logic
βββ tests/ # unit and regression tests
βββ tools/ # result management utilities
βββ results/ # generated outputs under results/autocontrol_arena/
βββ runtime_envs/ # generated environments (auto-generated)
βββ .env.example
βββ pyproject.toml
Prerequisites: Python 3.10+, API credentials for the model providers you want to test.
# Create and activate a virtual environment
python -m venv .venv
source .venv/bin/activate
# Install all dependencies (includes LiteLLM, the default model backend)
pip install -e .
# Optional: dev tools (pytest, ruff, mypy)
pip install -e ".[dev]"
This installation exposes two shell commands:
aca β interactive operator entrypointautocontrol β command-mode CLI for scripting and automationcp .env.example .env
Edit .env and fill in only the providers you need. Here are some examples:
# OpenAI
OPENAI_API_KEY=sk-xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
# Anthropic
ANTHROPIC_API_KEY=sk-xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
# Qwen / DashScope
DASHSCOPE_API_KEY=sk-xxxxxxxxxxxxxxxxxxxxxxxx
Public profile templates live under configs/profiles/templates/ and are loaded by profile name. The repository currently ships template profiles for gpt-4o and claude-3-7-sonnet.
Here is what a profile looks like:
# configs/profiles/templates/gpt-4o.yaml
api_base_url: https://api.openai.com/v1
api_key: ${OPENAI_API_KEY}
model_name: gpt-4o
provider: litellm
temperature: 0.7
max_tokens: 10240
The profile filename (without .yaml) is the name used in all CLI commands. The api_key field references an environment variable β set that variable in .env and nothing else needs to change. To add a new model, copy any existing profile and edit the three fields (api_base_url, api_key, model_name).
Global hyperparameters are in configs/settings/hyperparameters.yaml. Defaults work for most evaluations.
First time here: start with aca, browse the benchmark scenarios, and run one single scenario end-to-end.
Running scripted experiments: use autocontrol run ... or autocontrol batch ....
Inspecting completed runs in a browser: start the local results viewer with python tools/results_viewer/serve.py --port 8000.
Reading the system design: open docs/ARCHITECTURE.md.
Typical workflow:
acaNo comments yet. Be the first to share your thoughts!