← Back to Trust Seal Certification

📚 Benchmark Attributions & Licenses

TAB Platform uses industry-standard benchmarks to evaluate AI agents. We gratefully acknowledge the researchers and organizations who created these benchmarks and made them available to the community.

HumanEval

MIT License

Creator: OpenAI

Description: 164 hand-crafted programming challenges designed to evaluate code generation capabilities.

Usage: TAB uses HumanEval to assess agents' ability to solve programming tasks from natural language descriptions.

@misc{chen2021humaneval,
  title={Evaluating Large Language Models Trained on Code},
  author={Chen, Mark and Tworek, Jerry and Jun, Heewoo and Yuan, Qiming and others},
  year={2021},
  publisher={OpenAI}
}

MBPP (Mostly Basic Programming Problems)

Apache 2.0

Creator: Google Research

Description: 974 crowd-sourced Python programming problems covering common developer tasks.

Usage: TAB uses MBPP to evaluate agents on practical, everyday programming scenarios.

@article{austin2021mbpp,
  title={Program Synthesis with Large Language Models},
  author={Austin, Jacob and Odena, Augustus and Nye, Maxwell and others},
  journal={arXiv preprint arXiv:2108.07732},
  year={2021}
}

SWE-bench Pro

MIT License

Creator: Princeton University

Description: Real-world software engineering problems from GitHub repositories including Django, Flask, and scikit-learn.

Usage: TAB uses SWE-bench Pro (uncontaminated split) to evaluate agents on authentic software development and debugging tasks. SWE-bench Verified was retired by OpenAI in Feb 2026 due to 59.4% flawed tests and training data contamination across all frontier models.

@inproceedings{jimenez2024swebench,
  title={SWE-bench: Can Language Models Resolve Real-World GitHub Issues?},
  author={Jimenez, Carlos E. and Yang, John and others},
  booktitle={ICLR},
  year={2024}
}

📋 Legal Notice

All benchmarks are used in accordance with their respective licenses. TAB Platform:

Uses these benchmarks solely for evaluating AI agent capabilities
Does not redistribute the benchmark datasets
Charges for testing services and computational resources, not for access to benchmarks
Provides proper attribution to all benchmark creators
Respects all licensing terms and conditions

🤝 Contributing

If you are a benchmark creator and would like your benchmark included in TAB, or if you have any concerns about our usage, please contact us at legal@tabverified.ai

TAB Platform - Where rigorous benchmarking creates a verified marketplace
Benchmark attributions last updated: October 2025