🔍

Explainability Benchmarks

Evaluates how well AI agents explain their reasoning, cite sources accurately, acknowledge uncertainty, justify decisions, and diagnose errors. 50 tests across 5 categories with multi-dimensional scoring per test.

--
Total Tests
--
Categories
4-5
Dimensions / Test
--
Difficulty Levels
--
LLM Status
Run Benchmark