⚙️ CI/CD Integration Guide

TAB integrates into CI/CD pipelines so every agent change is independently verified before it reaches production. With Python and TypeScript SDKs (tab-sdk on PyPI, @tab-platform/sdk on npm), GitHub Actions templates, GitLab CI templates, and a $0.01/lookup Verification API, teams run TAB benchmarks on every commit, every PR, and every deployment. 340+ benchmarks across 26 categories, 80 models from 20+ providers. Automated agent testing on every commit is the foundation of reliable agent evaluation CI/CD pipeline integration. Updated May 2026.

🔑 API Token

Generate access token for CI/CD

🔗 Test Webhook

Verify webhook connectivity

📚 GitHub Template

Download the ready-to-use workflow

Download Template

GitHub Actions Integration

Add this workflow to .github/workflows/tab-testing.yml

Requires two repository secrets: TAB_API_KEY and TAB_AGENT_ID. The tab verify command blocks until TAB returns pass/fail. Download the template at github-action-template.yml.

name: TAB Agent Verification

on:
  push:
    branches: [main]
  pull_request:
    branches: [main]

jobs:
  verify:
    runs-on: ubuntu-latest
    steps:
      - name: Install TAB SDK
        run: pip install tab-sdk

      - name: Run TAB Verification
        run: tab verify --agent-id ${{ secrets.TAB_AGENT_ID }} --threshold 70
        env:
          TAB_API_KEY: ${{ secrets.TAB_API_KEY }}
          TAB_API_URL: https://tabverified.ai

Jenkins Integration

Add this Jenkinsfile to your repository

Requires Jenkins credentials: tab-api-key (Secret text) and a pipeline parameter TAB_AGENT_ID. The endpoint blocks until verification finishes, so no polling stage is needed.

pipeline {
  agent any

  parameters {
    string(name: 'TAB_AGENT_ID', description: 'TAB Platform Agent ID')
    string(name: 'THRESHOLD', defaultValue: '70', description: 'Minimum passing score')
  }

  environment {
    TAB_API_KEY = credentials('tab-api-key')
    TAB_API_URL = 'https://tabverified.ai'
  }

  stages {
    stage('Verify with TAB') {
      steps {
        script {
          def response = sh(
            script: """curl -sf -X POST "${TAB_API_URL}/api/v1/ci/verify" \\
              -H "Authorization: Bearer ${TAB_API_KEY}" \\
              -H "Content-Type: application/json" \\
              -d '{"agent_id": "${params.TAB_AGENT_ID}", "benchmarks": ["security_screening"], "threshold": ${params.THRESHOLD}, "timeout_seconds": 300}'""",
            returnStdout: true
          ).trim()
          def json = readJSON text: response
          echo "TAB status: ${json.status}; score: ${json.overall_score}; run: ${json.run_id}"
          if (json.status != 'pass') {
            error "TAB verification failed: ${json.overall_score} below ${params.THRESHOLD}"
          }
          echo 'TAB verification passed.'
        }
      }
    }
  }
}

GitLab CI Integration

Add to .gitlab-ci.yml

Set CI/CD variables in GitLab → Settings → CI/CD → Variables: TAB_API_KEY (masked) and TAB_AGENT_ID. This uses the blocking CI endpoint and fails the job directly on a TAB fail result.

variables:
  TAB_API_URL: "https://tabverified.ai"
  TAB_THRESHOLD: "70"

stages:
  - test

tab-benchmark:
  stage: test
  image: alpine:latest
  before_script:
    - apk add --no-cache python3 py3-pip
    - pip install tab-sdk
  script:
    - tab verify --agent-id "$TAB_AGENT_ID" --threshold "$TAB_THRESHOLD"

API Integration Reference

Base URL: https://tabverified.ai. CI should use the blocking verify endpoint so the pipeline gets one pass/fail response.

1. Blocking CI Verification

POST /api/v1/ci/verify — Runs selected benchmarks and returns status: "pass" or status: "fail".

curl -X POST https://tabverified.ai/api/v1/ci/verify \
  -H "Authorization: Bearer YOUR_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "agent_id": "your-agent-id",
    "benchmarks": ["security_screening", "sycophancy_detection"],
    "threshold": 70,
    "timeout_seconds": 300
  }'

2. CLI Equivalent

The Python SDK wraps the same endpoint and exits 0 for pass, 1 for fail, and 2 for errors.

pip install tab-sdk
TAB_API_KEY=YOUR_API_KEY tab verify \
  --agent-id your-agent-id \
  --benchmarks security_screening,sycophancy_detection \
  --threshold 70

3. Rate Limits

CI verification is rate limited to 10 requests/hour/API key. Insufficient credits return HTTP 402.

{
  "status": "pass",
  "overall_score": 82.5,
  "threshold": 70,
  "benchmarks": [
    {"name": "security_screening", "score": 85.0, "passed": true}
  ],
  "duration_seconds": 45,
  "run_id": "uuid"
}

Continuous Verification

Keep your Trust Seal fresh by automatically re-verifying after every deployment. TAB enforces a 30-day freshness policy on all Trust Seals.

✅ Fresh

0–14 days

Trust Seal displayed normally

⏳ Aging

15–25 days

Yellow badge — re-verify soon

⚠️ Stale

26–30 days

Orange warning — expiry imminent

🚫 Expired

30+ days

Grade shown with EXPIRED overlay

Webhook: Auto-Verify on Deploy

Use this optional webhook for background freshness updates after deploy. For a pipeline gate, use tab verify or POST /api/v1/ci/verify above.

# Add to your deploy script (post-deploy step)
curl -X POST https://tabverified.ai/api/v1/webhooks/agent-updated \
  -H "Authorization: Bearer YOUR_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "agent_id": "YOUR_AGENT_UUID",
    "trigger_type": "deployment",
    "callback_url": "https://your-app.com/webhooks/tab-result"
  }'

# Response:
# {
#   "run_id": "uuid-of-benchmark-run",
#   "agent_id": "YOUR_AGENT_UUID",
#   "trigger_type": "deployment",
#   "status": "queued",
#   "message": "Benchmark run queued for background re-verification."
# }

GitHub Actions Example

Add this step to your existing GitHub Actions workflow to re-verify on every push to main:

    # Add after your deploy step
    - name: Re-verify TAB Trust Seal
      run: |
        RESULT=$(curl -s -X POST https://tabverified.ai/api/v1/webhooks/agent-updated \
          -H "Authorization: Bearer ${{ secrets.TAB_API_KEY }}" \
          -H "Content-Type: application/json" \
          -d '{"agent_id": "${{ vars.TAB_AGENT_ID }}", "trigger_type": "deployment"}')
        echo "TAB verification queued: $RESULT"
        RUN_ID=$(echo "$RESULT" | jq -r '.run_id')
        echo "run_id=$RUN_ID" >> $GITHUB_OUTPUT

Trigger Types

deployment — Post-deploy re-verification (most common)
model_update — After changing the underlying LLM model
config_change — After updating system prompt, tools, or agent config

GitHub Actions Integration

On every pull request, this action runs the specified benchmarks against the agent endpoint and fails the PR if any score falls below the threshold. This creates a verification gate before deployment that blocks regressions from reaching production.

- name: Run TAB Agent Verification
  uses: tab-verified/action@v1
  with:
    agent-url: ${{ secrets.AGENT_URL }}
    benchmarks: sycophancy,security_screening,token_waste
    fail-threshold: 0.70
    api-key: ${{ secrets.TAB_API_KEY }}

Continuous Regression Testing

Regression testing for AI agents works differently from traditional software tests because agent behavior can drift without any code change. TAB stores per-version benchmark results, so teams can compare the current deployment score against the previous baseline. A 5% drop in sycophancy resistance or security screening triggers an automated alert. This makes TAB the foundation for continuous regression testing for AI agents across every version boundary.

• Per-version score history retained for 90 days
• Configurable regression thresholds per benchmark category
• Diff report shows exactly which dimensions regressed between versions

Production Trace Monitoring Feedback Loop

Continuous production trace monitoring closes the loop between live failures and test coverage. When TAB's Continuous Verification API detects an unexpected response pattern in production, it flags the trace for review and can automatically generate a new benchmark test case from the failure. Live agent failures become new test cases, ensuring that production incidents are never invisible to the test suite again.

The feedback loop runs continuously: production trace is flagged, a new benchmark is generated, the benchmark is added to the regression suite, and future deploys are tested against it automatically.

Automated Agent Verification on Every Deploy

The full pipeline for automated agent testing on every commit: code change submitted to repo, TAB benchmark runs on the PR branch, scores compared against minimum thresholds, deploy proceeds only if all benchmarks pass. Failed benchmarks block deployment and surface a scorecard showing which dimensions regressed. The verification gate before deployment is the single control point that enforces agent quality standards across the entire team.

Code change submitted to repository
TAB benchmark suite runs on the PR branch (340+ benchmarks available)
Scores compared against configured minimum thresholds
Deploy proceeds only if all benchmarks pass
Failed benchmarks block deployment and surface a detailed scorecard