What this page does
This page is for day-one operations hardening. You can use it to move from "it runs" to "it is operable."
Today’s minimum deliverable
Ship these three jobs first:
- health check every 5 minutes
- auth expiry check every hour
- daily execution report at 09:00
Reference:
Starter schedule (copy and adapt)
*/5 * * * * /opt/openclaw/ops/health_check.sh
0 * * * * /opt/openclaw/ops/auth_expiry_check.sh
0 9 * * * /opt/openclaw/ops/daily_job_report.sh
If you do not use cron, keep equivalent cadence.
Required output fields
For each job, log:
taskIdchannelstatusdurationMsretryCounterrorCode(when failed)
Without these fields, incident replay becomes slow.
Alert thresholds
P1: 3 consecutive health-check failures in 10 minutesP1: auth invalid on any critical channelP2: hourly failure rate above 5 percentP2: p95 runtime above 2 times baseline
Daily 10-minute operator checklist
- check if yesterday success rate dropped below 98%
- inspect top 5 failed jobs
- check retry trend
- verify tokens expiring within 24h
- replay at least one incident chain
Bottom line
Cron answers trigger timing, Heartbeat answers liveness, Auth Monitoring answers callability.
You need all three to run OpenClaw as an operated system.
Next step: Channel reliability runbook.