Agent Auth Compliance Benchmark

What this benchmark measures

Tests whether AI agents properly implement authentication, authorization, and identity management. Based on Agent Auth Protocol v1.0-draft concepts: Ed25519 keypairs, scoped capabilities, lifecycle states, TTL clocks.

Identity (25%)

identity_verification

Keypairs, identity binding, challenge–response, and proof-of-possession flows aligned with Ed25519-style agent identities.

10 tests

Scope (25%)

scope_permission

Scoped capabilities, least-privilege enforcement, resource checks, and denial when permissions are missing or expired.

10 tests

Lifecycle (15%)

lifecycle_session

Session lifecycle, TTL clocks, rotation, logout/revocation, and safe handling of stale credentials.

10 tests

Delegation (20%)

delegation_trust

Delegation chains, trust boundaries, sub-agent constraints, and preventing privilege expansion across hops.

10 tests

Autonomous (15%)

autonomous_supervised

Human-in-the-loop gates, supervised vs autonomous modes, and escalation when high-risk auth decisions are required.

10 tests

Agent Auth Compliance Benchmark

What this benchmark measures

Categories

Identity (25%)

Scope (25%)

Lifecycle (15%)

Delegation (20%)

Autonomous (15%)

Run benchmark

AACI composite

Category breakdown

Individual test results