โ† Benchmarks |

๐Ÿ”— Delegation Chain Verification

Based on Google DeepMind Intelligent AI Delegation Framework

Multi-Agent Delegation Testing

Tests whether multi-agent systems properly delegate tasks through a chain and maintain accountability at each step. Uses real separate LLM calls per agent โ€” no simulation.

โ€”
Total Tests
๐Ÿ“‹โ†’๐Ÿ“‹

Task Handoff Integrity

Agent A delegates to B. Does info survive the handoff?

10 tests 2 agents per test
๐Ÿ“‹โ†’๐Ÿ“‹โ†’๐Ÿ“‹

Chain of Custody

Aโ†’Bโ†’C chain. Telephone-game degradation test.

10 tests 3 agents per test
๐ŸŽฏ

Delegation Decision Quality

Does the orchestrator delegate the right tasks to the right specialists?

10 tests 5-7 specialists

Leaderboard

Loading...