Agent Health is a single score from 0 to 100 that shows how ready your AI agent is for real-world use. Think of it like a credit score, but for AI agents.
The Trust Seal measures benchmark performance only — how well your agent answers questions and completes tasks. Agent Health is broader — it also considers security, freshness, ease of deployment, quality harnesses, protocol compliance, and output quality. An agent can have a great Trust Seal but poor health if it fails security screening.
Your health score is a weighted combination of these seven factors. Each one measures a different aspect of agent readiness, but security is a gate: serious security failures cap the final score.
Security screening measures whether your agent resists PII leakage, prompt injection, data exfiltration, and unsafe tool behavior. Security is weighted at least twice any other single component and cannot be averaged away by strong performance elsewhere.
Security floor: below 50 caps health at 60; below 30 caps health at 40; below 20 caps health at 25. Agents below 50 show a Security Critical flag.
How to improve: Run the free security screening and fix security failures before optimizing any other benchmark.
How well does your agent actually perform on benchmarks? Strong benchmark performance matters, but it cannot compensate for unsafe security behavior. Directly uses your Trust Seal score.
How to improve: Run more benchmarks. Fix failures using the Failure Diagnosis reports. Improve your system prompt.
How recently was your agent verified? An agent verified yesterday is more trustworthy than one verified 3 months ago. If you've changed your agent since the last benchmark, freshness drops significantly because the scores may no longer be accurate.
How to improve: Re-run benchmarks after any changes to your agent. Aim to verify at least monthly.
How easy is your agent to set up and use? Agents that need fewer API keys, less configuration, and simpler infrastructure score higher. Buyers prefer agents they can start using quickly. Based on your AICI (Agent Integration Complexity Index) score.
How to improve: Reduce dependencies. Offer a free model edition. Simplify configuration requirements.
How many quality-improvement harnesses are attached to your agent? Harnesses are modules that improve your agent's security, accuracy, and reliability — like seatbelts for AI. More harnesses = more protection for buyers.
How to improve: Add recommended harnesses from the Harness Efficacy dashboard. Start with sycophancy_resistance (recommended for all agents).
Does your agent follow the MCP (Model Context Protocol) standard? MCP is how AI agents connect to tools and services. Good compliance means reliable connections. If your agent doesn't use MCP, you get a neutral score (70/100) — you're not penalized.
How to improve: Run the MCP Compliance benchmark. Fix any protocol violations. Add the MCP Compliance harness.
Are your agent's produced files and outputs production-ready? Code that compiles, JSON that's valid, configs that work. Buyers pay for usable outputs, not just correct answers. If not tested, you get a neutral score (50/100).
How to improve: Run the Artifact Output benchmark. Add the artifact_quality harness. Ensure your agent produces complete, valid outputs.
No. But agents below 50 on security are flagged as Security Critical and their health score is capped, so they should not appear as top healthy agents until security issues are fixed.
Automatically after every benchmark run, agent update, or harness change. You can also manually recalculate from your developer portal.
No. Agents that don't use MCP get a neutral score (70/100) on protocol compliance. You're not penalized for features you don't need.
Trust Seal measures benchmark performance only (how well your agent answers questions and completes tasks). Health Score is broader — it includes security, freshness, deployment ease, harness coverage, protocol compliance, and output quality. An agent can have a great Trust Seal but poor health if it has serious security failures.
Yes. If your security screening drops below 50, hard caps apply immediately. Health can also fall when benchmarks become stale, harness coverage drops, or output quality regresses.
Check your current scores and see exactly what to improve.
Go to Developer Portal