by ZhuMorris
Causal decision and audit tool for AI agents. A/B testing and Difference-in-Differences analysis.
# Add to your Claude Code skills
git clone https://github.com/ZhuMorris/agent-causal-decision-toolSource: https://github.com/ZhuMorris/agent-causal-decision-tool
Agent Causal Decision Tool helps you and your AI agents answer one question from experiment data: "should we ship this change, keep running the test, or roll it back?" It takes in simple A/B or rollout summaries and returns a structured JSON decision, key statistics, and an audit record you can store or review later.
Rather than being a full experimentation platform, it is a decision engine. You bring the data (from your logs, BI tool, or CSV); it handles the stats, decision logic, and audit trail.
In many teams, experiment decisions happen in ad hoc spreadsheets or dashboards. People glance at lift, argue about whether the sample size is enough, and sometimes ship features based on noisy or biased results. Agents make this worse if they are wired to react to any small uplift they see.
This tool wraps a few standard methods into one consistent, agent‑friendly interface:
The goal is not to replace your analytics stack, but to give agents a small, reliable decision block they can call inside workflows.
No comments yet. Be the first to share your thoughts!
Use this tool whenever you or your agents have experiment or rollout results and need a decision you can defend:
# Via pip
pip install agent-causal-decision-tool
# Via GitHub
pip install git+https://github.com/ZhuMorris/agent-causal-decision-tool.git
# Via OpenClaw (clawhub)
clawhub install agent-causal
plan)Estimate required sample size, duration, and feasibility before running an experiment:
cd ~/clawd/agent-causal-decision-tool
PYTHONPATH=. python3 -m src.cli plan --baseline 0.02 --mde 10 --traffic 5000
PYTHONPATH=. python3 -m src.cli plan --baseline 0.02 --mde 5 --traffic 500 --format text
# Custom traffic allocation
PYTHONPATH=. python3 -m src.cli plan --baseline 0.02 --mde 10 --traffic 5000 --allocation custom --allocation-ratio 0.3/0.7
Parameters: --baseline, --mde, --traffic, --confidence (default 0.95), --power (default 0.8), --allocation, --allocation-ratio
Feasibility: feasible ≤14 days | slow 15–60 days | not_recommended >60 days
ab)PYTHONPATH=. python3 -m src.cli ab --control 100/5000 --variant 130/5000
PYTHONPATH=. python3 -m src.cli ab --control 100/5000 --variant 70/5000 --format text
# Auto-save to history
PYTHONPATH=. python3 -m src.cli ab --control 100/5000 --variant 130/5000 --save
bayes)Beta-Binomial conjugate model with Jeffreys prior — no p-value, returns P(variant wins):
PYTHONPATH=. python3 -m src.cli bayes --control 100/5000 --variant 130/5000
PYTHONPATH=. python3 -m src.cli bayes --control 80/5000 --variant 85/5000 --format text
# Adjust Monte Carlo samples
PYTHONPATH=. python3 -m src.cli bayes --control 100/5000 --variant 130/5000 --samples 50000 --save
Decision thresholds: P(variant wins) ≥ 0.95 → ship | ≤ 0.05 → reject
did)For non-randomized experiments where parallel groups exist:
PYTHONPATH=. python3 -m src.cli did --pre-control 1000 --post-control 1100 --pre-treated 900 --post-treated 1150
PYTHONPATH=. python3 -m src.cli did --pre-control 1000 --post-control 1100 --pre-treated 900 --post-treated 1150 --save
# List recent experiments
PYTHONPATH=. python3 -m src.cli history
PYTHONPATH=. python3 -m src.cli history --mode ab_test --limit 10
# Compare multiple experiments by ID
PYTHONPATH=. python3 -m src.cli compare 1 2 3
# Save a prior JSON result to history
PYTHONPATH=. python3 -m src.cli save /tmp/result.json --name "checkout-v3-test"
# Save a result first
PYTHONPATH=. python3 -m src.cli ab --control 100/5000 --variant 130/5000 > /tmp/result.json
# Human-readable audit
PYTHONPATH=. python3 -m src.cli audit /tmp/result.json --format text
# Audit with experiment maturity score (0–100)
PYTHONPATH=. python3 -m src.cli audit /tmp/result.json --maturity
# JSON audit with maturity
PYTHONPATH=. python3 -m src.cli audit /tmp/result.json --maturity --format json
Maturity labels: mature ≥90 | adequate ≥70 | immature ≥50 | inadequate <50
All commands return structured JSON:
{
"version": "1.0",
"mode": "ab_test",
"recommendation": {
"decision": "ship|keep_running|reject|escalate",
"confidence": "high|medium|low",
"summary": "..."
},
"statistics": {...},
"traffic_stats": {...},
"warnings": [...],
"next_steps": [...],
"audit": {
"decision_path": [
{"step": "...", "passed": true, "details": {...}}
]
}
}
| Decision | Meaning | Trigger |
|----------|---------|---------|
| ship | Deploy variant | p < 0.05 + positive lift (frequentist), P(better) ≥ 0.95 (Bayesian) |
| keep_running | Continue experiment | Trending positive but inconclusive |
| reject | Do not deploy | p < 0.05 + negative lift, or P(better) ≤ 0.05 (Bayesian) |
| escalate | Human review needed | Inconclusive or critical warnings |
import sys
sys.path.insert(0, '~/clawd/agent-causal-decision-tool')
from src.ab_test import calculate_ab
from src.bayes import calculate_bayes_ab
from src.did import calculate_did
from src.planning import calculate_plan
# Frequentist A/B
result = calculate_ab({
"control_conversions": 100, "control_total": 5000,
"variant_conversions": 130, "variant_total": 5000
})
if result.recommendation.decision == "ship":
pass # Deploy
# Bayesian A/B
result = calculate_bayes_ab({
"control_conversions": 100, "control_total": 5000,
"variant_conversions": 130, "variant_total": 5000
})
if result["recommendation"]["decision"] == "ship":
pass # Deploy
# Planning
result = calculate_plan({
"baseline_conversion_rate": 0.02, "mde_pct": 10,
"daily_traffic": 5000, "confidence_level": 0.95, "power": 0.8,
"allocation": "equal", "allocation_ratio": None
})
git clone https://github.com/ZhuMorris/agent-causal-decision-tool.git
cd agent-causal-decision-tool
pip install -e .
pip install click scipy numpy pydantic pytest
# Run tests
pytest tests/ -v
# Run CLI
PYTHONPATH=. python3 -m src.cli --help
PYTHONPATH=. python3 -m src.cli plan --baseline 0.02 --mde 5 --traffic 5000
Copyright 2026 ZHU YUMING. Apache License 2.0.