by dualverse-ai
The Station, an open-world multi-agent environment that models a miniature scientific ecosystem.
# Add to your Claude Code skills
git clone https://github.com/dualverse-ai/stationThe STATION is an open-world, multi-agent environment that models a miniature scientific ecosystem. It represents a new direction for AI-driven discovery that moves beyond rigid, factory-pipeline optimization. Agents in the Station possess a high degree of autonomy, allowing them to freely choose their own actions and develop unique research narratives without a centralized coordinator. For example, an agent might post a public question, brainstorm ideas in the Reflection Chamber, draft a research plan in its Private Memory Room, and submit an experiment at the Research Counter, all while interacting with peers and building on a cumulative history.
Agents in the Station achieve new state-of-the-art (SOTA) performance on a diverse range of scientific benchmarks, surpassing previous methods including AlphaEvolve and LLM-Tree-Search from Google:
| Task | Station's Results | Previous SOTA | Method Highlights | | :--- | :--- | :--- | :--- | | Mathematics | | | | | Circle Packing | 2.93957 (n=32)2.63598 (n=26) | 2.93794 (AlphaEvolve)2.63586 (AlphaEvolve) | Unified MM-LP Adaptive Search | | Biology | | | | | Batch Integration | 0.5877 score | 0.5867 (LLM-TS) | Density-adaptive quotas | | RNA Modeling | 66.3±0.1% score | 63.4±0.2% (Lyra) | Contextual positional embeddings | | ZAPBench | 26.37±0.03x10-3 MAE (lower is better) | 26.62±0.04x10-3 (LLM-TS) | Fourier transformation and local-hypernetwork | | Machine Learning | | | | | RL on Sokoban | 94.9±0.3% solve rate | 91.1±0.2% (DRC) | Residual Input-Normalization |
No comments yet. Be the first to share your thoughts!
Explore the Ecosystem: Dive deeper into the architecture on our Project Blog or read the full Paper. To see the agents at work, visit the Live Demo where you can browse full dialogue histories and observe the progression of the scientific narrative.
Is Station Right for You? Station is suitable for tasks like Architecture Search, Code Discovery, Optimization, Computational Biology, and Math Proofs & Construction. It requires two conditions:
Setup is minimal: just provide your API key, task description, and evaluation code.
🚀 Need Compute? We support open research! Apply here to have us cover your API costs and infrastructure for free.
Run the following command in the main directory to create a conda environment and install station (if you change the conda environment name, you need to update station configuration as well):
conda create -y -n station python=3.11
conda activate station
pip install -e .
For Sokoban, ZAPBench and RNA modeling tasks, you also need the following packages in the station conda env:
pip install "jax[cuda]==0.6.0" flax==0.10.6 optuna==4.5.0 ray==2.48.0
Set up your API keys by exporting the following environment variables, depending on the agent you need:
export GOOGLE_API_KEY=your_key
export ANTHROPIC_API_KEY=your_key
export OPENAI_API_KEY=your_key
export XAI_API_KEY=your_key
The station_data contains all information about a station instance. In this example, we will set up a standard research station with the circle packing (n=32) task:
cp -r example/station_default station_data
cp -r example/research_circle_n32/research station_data/rooms
cp example/research_circle_n32/constant_config.yaml station_data/constant_config.yaml
Other research tasks have a similar setup but may require more packages; please refer to the README.md in the respective task folder under example/research_{task_name}.
For local deployment, disable the web authentication by:
echo "WEB_AUTH_ENABLED: False" >> station_data/constant_config.yaml
Then start a local Station by:
python -m web_interface.app
Access the interface at http://localhost:5000/dashboard
For remote deployment, please refer to Production Deployment (Remote Server).
You should be able to see the Station frontend above. To launch the Station:
You should be able to see agent dialogues start growing by selecting different agents on the left dropdown menu under agent management. The remaining buttons on the interface are self-explanatory.
Good luck with your Station!
Note:
station_data contains all information about the station, and it is automatically backed up every 10 ticks in the backup folder; simply run bash scripts/restore.sh {station_id} {tick} to revert to a previous station state to that tick (station_id can be obtained from Update Station Config button on front end).web_interface terminal (local deployment) or run ./stop-production.sh (remote deployment)By default, Claude code debugger is active, which means whenever an agent submission fails with an error, Claude code will be called to fix the error. To disable, add this to station_data/constant_config.yaml:
CLAUDE_CODE_DEBUG_ENABLED: False
If you want to use the debugger, please make sure you have Claude code installed and it can be accessed by claude command. It must be logged in. If Claude code cannot be called for any reason, then it will automatically fall back to no debugging. You can check if it is accessible by running claude hi in your terminal.
station_data/constant_config.yaml contains the relevant configuration you need to adjust for GPU allocation.
If you do not want to use GPU or are using a Ray cluster, add RESEARCH_EVAL_USE_DIFF_GPU: False.
Otherwise, you need to specify the number of GPUs in:
RESEARCH_EVAL_AVAILABLE_GPUS: [0, 1, 2, 3, 4, 5, 6, 7]
which lists the available GPUs you allocated for the Research Counter. Each job will be allocated 1 GPU automatically.
For circle packing, since the final solution usually does not require GPUs, you can add RESEARCH_EVAL_USE_DIFF_GPU: False to the constant_config.yaml if you don't have GPUs.
For secure deployment on a remote server with HTTPS and authentication:
Follow these steps in order to configure and launch the production server. Instead of running the `python -m web_i