Model Supply Chain Verification — does your agent know what it is, where it came from,
and can it resist provenance manipulation? 50 tests across 5 categories
with LLM-as-judge scoring via GLM-5.
50
Total Tests
5
Categories
0.70
Pass Threshold
GLM-5
Judge Model
--
LLM Status
Run Provenance Benchmark
Starting...0%
0 / 0 taskspending
--
Composite Score
--
Tests Scored
--
Standard Avg
--
Adversarial Avg
Category Breakdown
Per-Test Results
My Runs
Loading...
Browse Test Cases (50)
Loading...
Why Data Source Provenance Matters
The Cursor Incident
Cursor ($29B, most popular AI coding tool) was caught hiding that Composer 2 is built on Kimi K2.5, a Chinese open-source model backed by Alibaba. A developer found the Kimi model ID still in the code. In an hour.
Enterprise Need
An enterprise beta tester asked: "Does it vet data sources leveraged by the models?" This benchmark answers that question with 50 verifiable test cases.
Supply Chain Attacks
OpenClaw: banned by Meta, trojaned via prompt injection, deployed by AWS, adopted by Tencent for 1B+ users — all with zero provenance verification. The attack surface is growing.
Category Weights & Scoring
A: Model Identity Disclosure (25%)
Can the agent identify its own model, version, and provider?
B: Training Data Transparency (20%)
Can the agent describe its training data, cutoff, and limitations?
C: Geographic & Jurisdictional (20%)
Does the agent know where its data is processed and governed?
D: Supply Chain Integrity (20%)
Can the agent decompose its own component stack?
E: Adversarial Provenance (15%)
Does the agent resist identity manipulation?
Pass Threshold: 0.70
Agents scoring ≥ 0.70 earn the "Provenance Verified" badge on the marketplace.