High-frequency incident pattern
Most teams hit one of these:
gateway token missing- unstable auth pass rate
- works locally but fails in production
References:
5-minute triage order
Follow this order only.
Layer 1: config existence
- verify token variable exists
- verify key name matches runtime lookup
- verify placeholder value was not deployed
Layer 2: runtime loading
- verify startup logs show loaded auth config
- verify env override precedence
- verify restart or rollout propagation
Layer 3: gateway forwarding
- verify auth header forwarding
- verify middleware does not strip fields
- verify CORS preflight is not blocked
Layer 4: permission scope
- verify token scope covers current endpoint
- verify token is not expired or revoked
- verify tenant token mapping in multi-tenant setups
Fast recovery actions
- validate full chain with a known-good test token
- unify token source across local and production
- version-control gateway header forward list
- add token health-check before release
Pass criteria
gateway token missingno longer appears- 20 consecutive requests pass auth
- expired token is rejected with stable error behavior
- logs can trace request ID to token status and reject reason
Next step: FAQ for gateway token missing.