by CosmosYi
🛡️AutoControl Arena: Synthesizing Executable Test Environments for Frontier AI Risk Evaluation
# Add to your Claude Code skills
git clone https://github.com/CosmosYi/AutoControl-ArenaGuides for using ai agents skills like AutoControl-Arena.
Last scanned: 5/30/2026
{
"issues": [],
"status": "PASSED",
"scannedAt": "2026-05-30T16:35:56.160Z",
"npmAuditRan": true,
"pipAuditRan": true
}AutoControl-Arena is an open-source ai agents skill for AI coding assistants such as Claude Code, Codex CLI, and ChatGPT, built by CosmosYi. 🛡️AutoControl Arena: Synthesizing Executable Test Environments for Frontier AI Risk Evaluation. It has 112 GitHub stars.
Yes. AutoControl-Arena passed SkillsLLM's automated security scan — a dependency vulnerability audit plus prompt-injection heuristics — with no high-severity issues. You can read the full report in the Security Report section on this page.
Clone the repository with "git clone https://github.com/CosmosYi/AutoControl-Arena" and add it to your Claude Code skills directory (see the Installation section above).
AutoControl-Arena is primarily written in Python. It is open-source under CosmosYi on GitHub, so you can review or fork the full source.
Yes. SkillsLLM lists many other AI Agents skills you can browse and compare side by side. Open the AI Agents category from the badge at the top of this page, or use the Related Skills and comparison links further down to weigh AutoControl-Arena against similar tools.
No comments yet. Be the first to share your thoughts!
AutoControl Arena is an automated framework for frontier AI safety evaluation, designed to uncover latent risks in advanced models by stress-testing within executable, pressure-rich environments.
The core idea is logic-narrative decoupling:
This design addresses a central trade-off in agent safety evaluation by combining executable environments with LLM-driven narrative flexibility:
AutoControl Arena can automatically reconstruct and reproduce expert-designed risk-evaluation scenarios and target risky behaviors from organizations such as Anthropic, OpenAI, and Apollo Research.
Curious about the paper, architecture, and key findings? Please see our 📖 Project Page for a full tour, or read our paper.
aca)aca
The interactive UI lets you browse scenarios, pick profiles, choose stress levels, enable parallel repeats, and inspect previous run history from a menu.
aca is not limited to manual selection. It also includes a Run from JSON flow that lets you:
configs/batch_configs/examples/ interactively,Use this when you want the convenience of the interactive UI but already have a batch JSON prepared.
autocontrol)autocontrol --help
Use autocontrol when you want explicit subcommands such as run, batch, profiles, or prepare-envs. Unlike aca, it does not open the interactive menu by default.
The results viewer is a separate local web UI for browsing completed runs, artifacts, summaries, and live execution details from results/.
Start it with:
python tools/results_viewer/serve.py --port 8000
Then open:
http://127.0.0.1:8000/viewer/
Optional flags:
# Bind to a different host
python tools/results_viewer/serve.py --host 0.0.0.0 --port 8000
# Point the viewer at a specific results root
python tools/results_viewer/serve.py \
--results-dir results/autocontrol_arena \
--port 8000
Use the results viewer when you want a browser-based inspection surface for run trajectories and artifacts.
.
├── configs/
│ ├── batch_configs/examples/ # public example batch JSON configs
│ ├── profiles/templates/ # public model/provider profile templates
│ ├── scenarios/benchmark/ # X-Bench YAML scenario definitions
│ └── settings/ # runtime hyperparameters
├── docs/ # public docs plus internal planning / design notes
├── runners/ # CLI entrypoints and interactive UI
├── src/ # agents, orchestration, infra, and core logic
├── tests/ # unit and regression tests
├── tools/ # result management utilities
├── results/ # generated outputs under results/autocontrol_arena/
├── runtime_envs/ # generated environments (auto-generated)
├── .env.example
└── pyproject.toml
Prerequisites: Python 3.10+, API credentials for the model providers you want to test.
# Create and activate a virtual environment
python -m venv .venv
source .venv/bin/activate
# Install all dependencies (includes LiteLLM, the default model backend)
pip install -e .
# Optional: dev tools (pytest, ruff, mypy)
pip install -e ".[dev]"
This installation exposes two shell commands:
aca — interactive operator entrypointautocontrol — command-mode CLI for scripting and automationcp .env.example .env
Edit .env and fill in only the providers you need. Here are some examples:
# OpenAI
OPENAI_API_KEY=sk-xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
# Anthropic
ANTHROPIC_API_KEY=sk-xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
# Qwen / DashScope
DASHSCOPE_API_KEY=sk-xxxxxxxxxxxxxxxxxxxxxxxx
Public profile templates live under configs/profiles/templates/ and are loaded by profile name. The repository currently ships template profiles for gpt-4o and claude-3-7-sonnet.
Here is what a profile looks like:
# configs/profiles/templates/gpt-4o.yaml
api_base_url: https://api.openai.com/v1
api_key: ${OPENAI_API_KEY}
model_name: gpt-4o
provider: litellm
temperature: 0.7
max_tokens: 10240
The profile filename (without .yaml) is the name used in all CLI commands. The api_key field references an environment variable — set that variable in .env and nothing else needs to change. To add a new model, copy any existing profile and edit the three fields (api_base_url, api_key, model_name).
Global hyperparameters are in configs/settings/hyperparameters.yaml. Defaults work for most evaluations.
First time here: start with aca, browse the benchmark scenarios, and run one single scenario end-to-end.
Running scripted experiments: use autocontrol run ... or autocontrol batch ....
Inspecting completed runs in a browser: start the local results viewer with python tools/results_viewer/serve.py --port 8000.
Reading the system design: open docs/ARCHITECTURE.md.
Typical workflow:
aca