← Back to Trust Seal Certification

📚 Benchmark Attributions & Licenses

TAB Platform uses industry-standard benchmarks to evaluate AI agents. We gratefully acknowledge the researchers and organizations who created these benchmarks and made them available to the community.

HumanEval

MIT License

Creator: OpenAI

Description: 164 hand-crafted programming challenges designed to evaluate code generation capabilities.

Usage: TAB uses HumanEval to assess agents' ability to solve programming tasks from natural language descriptions.

@misc{chen2021humaneval,
  title={Evaluating Large Language Models Trained on Code},
  author={Chen, Mark and Tworek, Jerry and Jun, Heewoo and Yuan, Qiming and others},
  year={2021},
  publisher={OpenAI}
}

MBPP (Mostly Basic Programming Problems)

Apache 2.0

Creator: Google Research

Description: 974 crowd-sourced Python programming problems covering common developer tasks.

Usage: TAB uses MBPP to evaluate agents on practical, everyday programming scenarios.

@article{austin2021mbpp,
  title={Program Synthesis with Large Language Models},
  author={Austin, Jacob and Odena, Augustus and Nye, Maxwell and others},
  journal={arXiv preprint arXiv:2108.07732},
  year={2021}
}

SWE-bench Pro

MIT License

Creator: Princeton University

Description: Real-world software engineering problems from GitHub repositories including Django, Flask, and scikit-learn.

Usage: TAB uses SWE-bench Pro (uncontaminated split) to evaluate agents on authentic software development and debugging tasks. SWE-bench Verified was retired by OpenAI in Feb 2026 due to 59.4% flawed tests and training data contamination across all frontier models.

@inproceedings{jimenez2024swebench,
  title={SWE-bench: Can Language Models Resolve Real-World GitHub Issues?},
  author={Jimenez, Carlos E. and Yang, John and others},
  booktitle={ICLR},
  year={2024}
}

📋 Legal Notice

All benchmarks are used in accordance with their respective licenses. TAB Platform:

🤝 Contributing

If you are a benchmark creator and would like your benchmark included in TAB, or if you have any concerns about our usage, please contact us at legal@tabverified.ai

TAB Platform - Where rigorous benchmarking creates a verified marketplace
Benchmark attributions last updated: October 2025