OpenJudge: A Unified Framework for Holistic Evaluation and Quality Rewards
# Add to your Claude Code skills
git clone https://github.com/agentscope-ai/OpenJudgeπ Website | π Try Online | π Documentation | π€ Contributing | δΈζ
</div>OpenJudge is an open-source evaluation framework for AI applications (e.g., AI agents or chatbots) designed to evaluate quality and drive continuous application optimization.
In practice, application excellence depends on a trustworthy evaluation workflow: Collect test data β Define graders β Run evaluation at scale β Analyze weaknesses β Iterate quickly.
OpenJudge provides ready-to-use graders and supports generating scenario-specific rubrics (as graders), making this workflow , , and into your workflow. It can also convert grading results into to help you and optimize your application.
No comments yet. Be the first to share your thoughts!
π Try it now! Visit openjudge.me/app to use graders online β no installation required. Test built-in graders, build custom rubrics, and explore evaluation results directly in your browser.
2026-02-12 - π Reference Hallucination Arena - Benchmark for evaluating LLM academic reference hallucination. π Documentation | π Leaderboard
2026-01-27 - π Paper Review - Automatically review academic papers using LLM-powered evaluation. π Documentation
2026-01-27 - π₯οΈ OpenJudge UI - A Streamlit-based visual interface for grader testing and Auto Arena. π [Try Online...