Skip to main content

Model Evaluation & Release Gates

Every AI-powered product feature MUST pass explicit release gates before production rollout.

Evaluation Pack Requirements

Each release candidate MUST include:

  • Intended use, non-goals, and risk tier
  • Evaluation dataset definition and sampling method
  • Offline metrics by critical segment
  • Known failure modes and mitigations
  • Human fallback or override plan

Required Gate Checks

GateMinimum RequirementBlocker
Quality gateMeets defined quality thresholds for target use casesYes
Safety gateNo unresolved severe harmful-output scenariosYes
Reliability gateLatency and error budgets within SLO targetYes
Governance gateDocumentation, approvals, and evidence completeYes
Rollback readinessVersion rollback plan testedYes

Risk-Tiered Standards

TierExampleRequired Rigor
Tier 1 (Low)Non-critical summarizationStandard offline eval + canary
Tier 2 (Medium)User-facing recommendationsSegment-level eval + shadow testing
Tier 3 (High)Decision support in regulated workflowsExtended eval set, formal sign-off, human fallback mandatory

Release Decision Template

Use the following decision format:

AI Release Decision: APPROVE | CONDITIONAL | REJECT
Feature: <name>
Version: <model/prompt/runtime version>
Risk Tier: <1|2|3>
Quality Result: <pass/fail + metrics>
Safety Result: <pass/fail + critical issues>
Reliability Result: <pass/fail + SLO evidence>
Approvers: <product owner, tech lead, security/governance>
Date: <ISO 8601>

Deployment Pattern

  • Start with canary or limited audience rollout
  • Compare against control baseline where possible
  • Auto-halt rollout on threshold breach