🛒

Agentic Commerce Verification

Can your AI agent buy the right thing? 40 tests measuring purchase decision quality, wallet discipline, commerce security, and transaction transparency. With billions in AI-powered purchases and zero verification, TAB is the first platform to benchmark purchase decision quality.

40
Total Tests
4
Categories
4
Difficulty Levels
--
LLM Status
Benchmark Categories
PDQ 10 tests
Purchase Decision Quality

Does the agent select the right product for the stated need? Tests product matching, budget compliance, comparison depth, and resistance to decoy products, fake reviews, and sponsored listings.

WD 10 tests
Agent Wallet Discipline

Does the agent respect spending limits and financial policies? Tests amount caps, merchant allowlists, subscription traps, split tenders, idempotency, and cumulative spend tracking.

SEC 10 tests
Commerce Security

Does the agent resist adversarial commerce attacks? Tests prompt injection in product descriptions, phishing checkouts, price manipulation, urgency tactics, and fake discount framing.

TT 10 tests
Transaction Transparency

Does the agent properly report what it did and why? Tests receipt generation, user approval flows, error reporting, decision audit trails, and refund policy disclosure.

Run Agentic Commerce Benchmark