📋 TAB Production Deployment Playbook
Version: 2.0 | Last Updated: February 2026 | Classification: Internal
Phase 1: Pre-Deployment Checklist
1.1 Infrastructure Readiness
- PostgreSQL database provisioned and migrations applied (
alembic upgrade head)
- Redis instance available for caching and rate limiting
- Environment variables configured (see
.env.example)
- SSL/TLS certificates installed and verified
- CDN configured for static assets (
/static/*)
- File storage (S3/GCS) configured for agent uploads
1.2 Security Configuration
- API keys rotated and stored in secrets manager
- CORS origins restricted to production domains
- Rate limiting configured per tier (Free: 10/min, Pro: 100/min, Enterprise: 1000/min)
- Spider-Sense screening enabled and patterns reviewed (
config/spider_sense_patterns.json)
- Tool Gateway validation active with JSONL audit logging
- JWT secret key generated (min 256-bit)
- Admin panel access restricted to allowlisted IPs
1.3 AI Provider Keys
- OpenAI API key configured (GPT-4.1, GPT-4.1-Mini, GPT-4.1-Nano, O3, O3-Mini, O4-Mini)
- Anthropic API key configured (Claude Sonnet 4, Claude Opus 4)
- Provider fallback chain tested
- Cost budgets and alerts configured per-user and per-agent
Phase 2: Database & Migrations
- Run all pending migrations:
python -m alembic upgrade head
- Verify
spider_sense_events table exists with 11 columns
- Verify all 66 active models (400+ available via OpenRouter) registered in ORM
- Seed benchmark data:
python seed_benchmarks.py
- Verify marketplace categories populated
- Create admin user account
Phase 3: Application Deployment
3.1 Build & Deploy
- Install dependencies:
pip install -r requirements.txt
- Run syntax validation:
python -m py_compile main.py
- Start application:
uvicorn main:app --host 0.0.0.0 --port 8000 --workers 4
- Verify health endpoint:
GET /api/health returns 200
- Verify static files served at
/static/developer-portal.html
3.2 Benchmark Engine Verification
- Verify unified benchmark executor loads all categories
- Run smoke test: 3 test cases on ToolMaster Pro agent
- Verify citation-grounding benchmarks (ALCE, AttrScore, HAGRID) available
- Verify function-calling benchmarks (BFCL, K-Interop, K-Interop-Enterprise) available
- Confirm Trust Seal computation completes within 30s
3.3 Security Systems Verification
- Spider-Sense L1 pattern matching: test safe + block patterns
- Spider-Sense L2 anomaly scoring: verify threshold at score ≥ 50
- Spider-Sense L3 LLM deep scan: verify fail-open on LLM errors
- Tool Gateway: verify schema validation and receipt generation
- Permission kernel: verify confirmation flow for destructive actions
Phase 4: External Agent Support
- Sandbox execution environment configured (Docker/Firecracker)
- File upload limits set (max 50MB for .py/.zip)
- Remote agent HTTP endpoint timeout configured (default: 30s)
- BYOA webhook signature verification enabled
- Agent isolation: network policies restrict sandbox → internal APIs
Phase 5: Monitoring & Observability
- Structured logging to stdout (JSON format)
- JSONL audit logs rotated daily, retained 90 days
- Error alerting configured (PagerDuty/Slack)
- Dashboard endpoints verified:
- Security Dashboard:
/static/security-dashboard.html
- Spider-Sense:
/static/spider-sense-dashboard.html
- Tool Gateway:
/static/tool-gateway-dashboard.html
- Analytics:
/static/analytics-dashboard.html
- Uptime monitoring on
/api/health (interval: 60s)
Phase 6: Go-Live Checklist
⚠️ Final checks before enabling public traffic:
- All Phase 1-5 items verified ✓
- Load test completed (target: 100 concurrent benchmark runs)
- Backup strategy tested (DB snapshots every 6h)
- Rollback procedure documented and tested
- DNS cutover planned (TTL lowered to 60s 24h prior)
- On-call rotation established
- Incident response runbook reviewed by team
Appendix: Key Configuration Reference
| Setting | Value | Notes |
| Spider-Sense L2 threshold | 50 | Suspicion score triggering L2 escalation |
| Spider-Sense sliding window | 50 actions | Per-agent behavioral window |
| Rate limit (Free tier) | 10 req/min | Per-user API rate limit |
| Rate limit (Pro tier) | 100 req/min | Per-user API rate limit |
| JWT expiry | 24h | Access token lifetime |
| Max upload size | 50 MB | Agent code upload limit |
| LLM timeout | 30s | Spider-Sense L3 / semantic validation |
| Reproducibility threshold | ≥ 85 | Score needed for Reproducible badge |
📌 Note: This playbook should be reviewed and updated before each major release. Keep a copy in your team's runbook repository.