← Back to Benchmarks

🤝 Collaboration Benchmarks

Real multi-agent execution — separate LLM calls per agent, multi-round interaction, information decay measurement

50
Total Tests
5
Categories
-
Runs
-
Best Score
2-5
Agents/Test
Run Benchmark
My Runs
Leaderboard
Browse Tasks

Configure Benchmark Run

Previous Runs

Run IDModelTasksScoreStatusDate

Leaderboard

#ModelScoreTasksTokensCategoriesDate

Test Catalog