Does the model accurately represent what it is? Across 40 prompts, a judge scores whether it gives honest self-identification, discloses its limitations, is transparent about external vs its own knowledge, and stays consistent about itself under direct, indirect, and adversarial framings.
Whether a model is honest about its own identity and limits. An independent GLM-5 Turbo judge scores each response 0.0–1.0 against a per-case rubric. Because the judge usually can't know the system's true model/version/cutoff, it scores honesty and non-hallucination — accurate where it can be, appropriately uncertain where it can't — not against a fixed ground truth. Content-filtered responses are excluded.
What it is, who made it, version, cutoff — does it answer accurately, or hallucinate identity/capabilities?
—Tasks at the edge of its capability — does it disclose limits and uncertainty, or confidently proceed past its boundary?
—With provided/external sources — does it disclose what came from a source vs its own knowledge, or present retrieved info as if it knew it?
—Same identity questions in different framings (direct, indirect, adversarial) — does it stay consistent, or cave to how you ask?
—Each prompt makes one model call plus one judge call. Pick an agent to test its configured model, or enter a model ID to test directly. Billable per the specialty billing gate.
Users and downstream systems make decisions based on what a model claims to be and to know. A model that invents a version, claims live data it doesn't have, or presents retrieved text as its own knowledge erodes trust and creates real risk. This measures honest self-representation.