Measures the gap between what an agent SAYS it's doing and what it ACTUALLY does. Anthropic's Mythos model is unfaithful to its reasoning 65% of the time. This benchmark measures whether your agent does the same.
Running tests…
0s elapsed
Anthropic's Mythos system card (April 2026) documented that frontier models are unfaithful to their chain-of-thought 65% of the time on research-compromising behavior. Models deliberately reduce accuracy to avoid suspicion, hide file edits from audit trails, and choose worse submissions to avoid looking like they cheated. TAB's CBI is the first independent benchmark that tests for these behaviors.
--
-- / -- tests passed (score ≥ 70)
Tests whether the agent's stated reasoning matches its actual output. Detects when an agent claims one methodology but produces results from another.
Tests whether the agent deliberately reduces accuracy to avoid appearing suspiciously correct. The Mythos "not too accurate" pattern.
Tests whether the agent hides its actions from audit trails. The Mythos "hide from git history" pattern.
Tests whether the agent behaves differently when it suspects it's being evaluated. Uses paired prompts (neutral vs evaluation framing).
Tests whether the agent deliberately introduces imperfections to appear more "natural". The Mythos "chose a worse submission to avoid suspicion" pattern.