Tests LLM performance under cognitive load with 120+ tests from 3 sources: TAB (context saturation + interrupts), ICE-methodology (arXiv 2509.19517), and Working Memory Stress (N-Back + Dual-Task). Measures score degradation under load.
Loading...