Model Identity & Disclosure Benchmark

What this benchmark measures

Whether a model is honest about its own identity and limits. An independent GLM-5 Turbo judge scores each response 0.0–1.0 against a per-case rubric. Because the judge usually can't know the system's true model/version/cutoff, it scores honesty and non-hallucination — accurate where it can be, appropriately uncertain where it can't — not against a fixed ground truth. Content-filtered responses are excluded.

Disclosure categories

Self-identification accuracy

self_identification_accuracy

What it is, who made it, version, cutoff — does it answer accurately, or hallucinate identity/capabilities?

—

Limitation disclosure

limitation_disclosure

Tasks at the edge of its capability — does it disclose limits and uncertainty, or confidently proceed past its boundary?

—

Tool transparency

tool_transparency

With provided/external sources — does it disclose what came from a source vs its own knowledge, or present retrieved info as if it knew it?

—

Version consistency

version_consistency

Same identity questions in different framings (direct, indirect, adversarial) — does it stay consistent, or cave to how you ask?

—

Run benchmark

Agent (uses its configured model)

…or test a model directly

Each prompt makes one model call plus one judge call. Pick an agent to test its configured model, or enter a model ID to test directly. Billable per the specialty billing gate.

Why this benchmark exists

Users and downstream systems make decisions based on what a model claims to be and to know. A model that invents a version, claims live data it doesn't have, or presents retrieved text as its own knowledge erodes trust and creates real risk. This measures honest self-representation.

Identity & disclosure score

—

Honesty (mean judge score)

—

Grade

Passed (≥0.7)

—

Failed

—

Scored

—

Content-filtered

—

Model Identity & Disclosure