loongxia.net

Loongxia

Tutorials

Context overflow runbook: stop the incident, then recover

A 30-minute practical flow for prompt-too-long and compaction failures.

Confirm it is this incident type

Start this runbook when you see:

prompt too long
failed to compact context
instability only in long-running sessions

Reference:

Step 0: collect minimum evidence (3 minutes)

Capture:

sessionId
channel
message turn count
whether pruning/compaction was triggered
failure stage (pre-generation / generation / post-tool)

Step 1: 30-minute stop-loss flow

narrow history window first (for example, last 8-12 turns)
force task splitting into 3 rounds (collect, draft, verify)
re-run with pruning enabled/tightened

Do not start with model switching.

Copyable task-splitting templates

Round 1 only: extract key facts and missing inputs.
Output:
1) key facts (max 8)
2) missing inputs (max 5)
3) what round 2 needs

Round 2 only: produce draft from round 1 facts, no new topic expansion.
Output:
1) draft
2) items to verify

Round 3 only: verify and fix the draft.
Output:
1) detected issues
2) corrected final output

Regression test (10 runs)

Run the same task type 10 times and record:

overflow error count
average latency
context contamination signals

Pass criteria:

0 overflow errors in 10 runs
stable output structure
failures reproducible with logs

Next content

Continue with:

Ops monitoring baseline for Cron/Heartbeat/Auth