Evaluates how well AI agents explain their reasoning, cite sources accurately, acknowledge uncertainty, justify decisions, and diagnose errors. 50 tests across 5 categories with multi-dimensional scoring per test.
Loading...