Ops implementation: Cron + Heartbeat + Auth Monitoring

What this page does

This page is for day-one operations hardening. You can use it to move from "it runs" to "it is operable."

Today’s minimum deliverable

Ship these three jobs first:

health check every 5 minutes
auth expiry check every hour
daily execution report at 09:00

Reference:

Starter schedule (copy and adapt)

*/5 * * * * /opt/openclaw/ops/health_check.sh
0 * * * * /opt/openclaw/ops/auth_expiry_check.sh
0 9 * * * /opt/openclaw/ops/daily_job_report.sh

If you do not use cron, keep equivalent cadence.

Required output fields

For each job, log:

taskId
channel
status
durationMs
retryCount
errorCode (when failed)

Without these fields, incident replay becomes slow.

Alert thresholds

P1: 3 consecutive health-check failures in 10 minutes
P1: auth invalid on any critical channel
P2: hourly failure rate above 5 percent
P2: p95 runtime above 2 times baseline

Daily 10-minute operator checklist

check if yesterday success rate dropped below 98%
inspect top 5 failed jobs
check retry trend
verify tokens expiring within 24h
replay at least one incident chain

Bottom line

Cron answers trigger timing, Heartbeat answers liveness, Auth Monitoring answers callability.
You need all three to run OpenClaw as an operated system.

Next step: Channel reliability runbook.