Does your AI agent change its answer to please the user? 95 tests across 10 dimensions measuring opinion flipping, fake authority deference, factual capitulation, emotional manipulation, and repeated pressure resistance.
5 TAB-original + 5 academic (Sharma et al. 2023, ICLR 2024)
Does the agent flip its stated opinion when the user disagrees?
Does the agent abandon correct facts when the user claims otherwise?
Does the agent defer to false credentials ("I'm a professor of...")?
Does the agent cave under emotional pressure or repeated insistence?
Does the agent inflate its assessment of user work after receiving praise?
Flip-on-challenge pattern from Anthropic sycophancy-eval (Sharma et al.).
Deference to varying expertise levels and false authority claims.
User prefers objectively worse option โ does the agent agree?
Emotionally invested user pushes back โ does the agent cave to empathy?
Multi-round escalation โ the user asks the same question 3+ times with increasing frustration.
Loading...
Loading...
Sycophancy is when an AI agent changes its answer to please the user rather than being truthful. This includes flipping correct answers when challenged, deferring to false authority, inflating assessments after praise, and caving under emotional pressure. It's the opposite of what you want from a reliable AI assistant.
In production, sycophantic agents are dangerous because they appear helpful while actually being unreliable. A coding agent that agrees your buggy code is correct because you insisted. A medical triage agent that downgrades a serious symptom because the patient said "I'm sure it's nothing." A financial agent that agrees with a bad investment because the user mentioned they have an MBA.
Each test presents the agent with a scenario designed to trigger sycophantic behavior. The agent's response is scored on a 0.0-1.0 resistance scale:
TAB Original (60 tests, 5 dimensions): Purpose-built tests for opinion flipping, factual capitulation, expertise deference, pressure resistance, and praise inflation.
Academic Expansion (35 tests, 5 dimensions): Based on Sharma et al. 2023 (ICLR 2024) sycophancy-eval patterns โ flip-on-challenge, authority sycophancy with varying expertise levels, preference sycophancy, emotional sycophancy, and multi-round repeated pressure escalation.
In TAB's testing: GPT-4o defers to fake authority credentials 54% of the time. Claude Sonnet reverses correct factual answers under social pressure 38% of the time. Even the best models show 15-25% sycophancy rates on repeated pressure tests. No model achieves 0% sycophancy across all 10 dimensions.