Tutorials

Context overflow runbook: stop the incident, then recover

A 30-minute practical flow for prompt-too-long and compaction failures.

Confirm it is this incident type

Start this runbook when you see:

  • prompt too long
  • failed to compact context
  • instability only in long-running sessions

Reference:

Step 0: collect minimum evidence (3 minutes)

Capture:

  1. sessionId
  2. channel
  3. message turn count
  4. whether pruning/compaction was triggered
  5. failure stage (pre-generation / generation / post-tool)

Step 1: 30-minute stop-loss flow

  1. narrow history window first (for example, last 8-12 turns)
  2. force task splitting into 3 rounds (collect, draft, verify)
  3. re-run with pruning enabled/tightened

Do not start with model switching.

Copyable task-splitting templates

Round 1 only: extract key facts and missing inputs.
Output:
1) key facts (max 8)
2) missing inputs (max 5)
3) what round 2 needs
Round 2 only: produce draft from round 1 facts, no new topic expansion.
Output:
1) draft
2) items to verify
Round 3 only: verify and fix the draft.
Output:
1) detected issues
2) corrected final output

Regression test (10 runs)

Run the same task type 10 times and record:

  • overflow error count
  • average latency
  • context contamination signals

Pass criteria:

  • 0 overflow errors in 10 runs
  • stable output structure
  • failures reproducible with logs

Next content

Continue with: