Prometheus metrics

Expose OpenClaw diagnostics as Prometheus text metrics through the diagnostics-prometheus plugin

OpenClaw can expose diagnostics metrics through the official diagnostics-prometheus plugin. It listens to trusted internal diagnostics and renders a Prometheus text endpoint at:

GET /api/diagnostics/prometheus

Content type is text/plain; version=0.0.4; charset=utf-8, the standard Prometheus exposition format.

The route uses Gateway authentication (operator scope). Do not expose it as a public unauthenticated `/metrics` endpoint. Scrape it through the same auth path you use for other operator APIs.

For traces, logs, OTLP push, and OpenTelemetry GenAI semantic attributes, see [OpenTelemetry export](/docs/openclaw-docs/gateway/opentelemetry.

Quick start

```bash openclaw plugins install clawhub:@openclaw/diagnostics-prometheus ``` ```json5 { plugins: { allow: ["diagnostics-prometheus"], entries: { "diagnostics-prometheus": { enabled: true }, }, }, diagnostics: { enabled: true, }, } ``` ```bash openclaw plugins enable diagnostics-prometheus ``` The HTTP route is registered at plugin startup, so reload after enabling. Send the same gateway auth your operator clients use:
```bash
curl -H "Authorization: Bearer $OPENCLAW_GATEWAY_TOKEN" \
  http://127.0.0.1:18789/api/diagnostics/prometheus
```
```yaml # prometheus.yml scrape_configs: - job_name: openclaw scrape_interval: 30s metrics_path: /api/diagnostics/prometheus authorization: credentials_file: /etc/prometheus/openclaw-gateway-token static_configs: - targets: ["openclaw-gateway:18789"] ```
`diagnostics.enabled: true` is required. Without it, the plugin still registers the HTTP route but no diagnostic events flow into the exporter, so the response is empty.

Metrics exported

MetricTypeLabels
openclaw_run_completed_totalcounterchannel, model, outcome, provider, trigger
openclaw_run_duration_secondshistogramchannel, model, outcome, provider, trigger
openclaw_model_call_totalcounterapi, error_category, model, outcome, provider, transport
openclaw_model_call_duration_secondshistogramapi, error_category, model, outcome, provider, transport
openclaw_model_tokens_totalcounteragent, channel, model, provider, token_type
openclaw_gen_ai_client_token_usagehistogrammodel, provider, token_type
openclaw_model_cost_usd_totalcounteragent, channel, model, provider
openclaw_tool_execution_totalcountererror_category, outcome, params_kind, tool
openclaw_tool_execution_duration_secondshistogramerror_category, outcome, params_kind, tool
openclaw_harness_run_totalcounterchannel, error_category, harness, model, outcome, phase, plugin, provider
openclaw_harness_run_duration_secondshistogramchannel, error_category, harness, model, outcome, phase, plugin, provider
openclaw_message_processed_totalcounterchannel, outcome, reason
openclaw_message_processed_duration_secondshistogramchannel, outcome, reason
openclaw_message_delivery_started_totalcounterchannel, delivery_kind
openclaw_message_delivery_totalcounterchannel, delivery_kind, error_category, outcome
openclaw_message_delivery_duration_secondshistogramchannel, delivery_kind, error_category, outcome
openclaw_talk_event_totalcounterbrain, event_type, mode, provider, transport
openclaw_talk_event_duration_secondshistogrambrain, event_type, mode, provider, transport
openclaw_talk_audio_byteshistogrambrain, event_type, mode, provider, transport
openclaw_queue_lane_sizegaugelane
openclaw_queue_lane_wait_secondshistogramlane
openclaw_session_state_totalcounterreason, state
openclaw_session_queue_depthgaugestate
openclaw_session_recovery_totalcounteraction, active_work_kind, state, status
openclaw_session_recovery_age_secondshistogramaction, active_work_kind, state, status
openclaw_memory_bytesgaugekind
openclaw_memory_rss_byteshistogramnone
openclaw_memory_pressure_totalcounterlevel, reason
openclaw_telemetry_exporter_totalcounterexporter, reason, signal, status
openclaw_prometheus_series_dropped_totalcounternone

Label policy

Prometheus labels stay bounded and low-cardinality. The exporter does not emit raw diagnostic identifiers such as `runId`, `sessionKey`, `sessionId`, `callId`, `toolCallId`, message IDs, chat IDs, or provider request IDs.
Label values are redacted and must match OpenClaw's low-cardinality character policy. Values that fail the policy are replaced with `unknown`, `other`, or `none`, depending on the metric.
The exporter caps retained time series in memory at **2048** series across counters, gauges, and histograms combined. New series beyond that cap are dropped, and `openclaw_prometheus_series_dropped_total` increments by one each time.
Watch this counter as a hard signal that an attribute upstream is leaking high-cardinality values. The exporter never lifts the cap automatically; if it climbs, fix the source rather than disabling the cap.
- prompt text, response text, tool inputs, tool outputs, system prompts - Talk transcripts, audio payloads, call ids, room ids, handoff tokens, turn ids, and raw session ids - raw provider request IDs (only bounded hashes, where applicable, on spans — never on metrics) - session keys and session IDs - hostnames, file paths, secret values

PromQL recipes

# Tokens per minute, split by provider
sum by (provider) (rate(openclaw_model_tokens_total[1m]))

# Spend (USD) over the last hour, by model
sum by (model) (increase(openclaw_model_cost_usd_total[1h]))

# 95th percentile model run duration
histogram_quantile(
  0.95,
  sum by (le, provider, model)
    (rate(openclaw_run_duration_seconds_bucket[5m]))
)

# Queue wait time SLO (95p under 2s)
histogram_quantile(
  0.95,
  sum by (le, lane) (rate(openclaw_queue_lane_wait_seconds_bucket[5m]))
) < 2

# Dropped Prometheus series (cardinality alarm)
increase(openclaw_prometheus_series_dropped_total[15m]) > 0
Prefer `gen_ai_client_token_usage` for cross-provider dashboards: it follows the OpenTelemetry GenAI semantic conventions and is consistent with metrics from non-OpenClaw GenAI services.

Choosing between Prometheus and OpenTelemetry export

OpenClaw supports both surfaces independently. You can run either, both, or neither.

- **Pull** model: Prometheus scrapes `/api/diagnostics/prometheus`. - No external collector required. - Authenticated through normal Gateway auth. - Surface is metrics only (no traces or logs). - Best for stacks already standardized on Prometheus + Grafana. - **Push** model: OpenClaw sends OTLP/HTTP to a collector or OTLP-compatible backend. - Surface includes metrics, traces, and logs. - Bridges to Prometheus through an OpenTelemetry Collector (`prometheus` or `prometheusremotewrite` exporter) when you need both. - See [OpenTelemetry export](/docs/openclaw-docs/gateway/opentelemetry for the full catalog.

Troubleshooting

- Check `diagnostics.enabled: true` in config. - Confirm the plugin is enabled and loaded with `openclaw plugins list --enabled`. - Generate some traffic; counters and histograms only emit lines after at least one event. The endpoint requires the Gateway operator scope (`auth: "gateway"` with `gatewayRuntimeScopeSurface: "trusted-operator"`). Use the same token or password Prometheus uses for any other Gateway operator route. There is no public unauthenticated mode. A new attribute is exceeding the **2048**-series cap. Inspect recent metrics for an unexpectedly high-cardinality label and fix it at the source. The exporter intentionally drops new series instead of silently rewriting labels. The plugin keeps state in memory only. After a Gateway restart, counters reset to zero and gauges restart at their next reported value. Use PromQL `rate()` and `increase()` to handle resets cleanly.
  • [Diagnostics export](/docs/openclaw-docs/gateway/diagnostics — local diagnostics zip for support bundles
  • [Health and readiness](/docs/openclaw-docs/gateway/health — /healthz and /readyz probes
  • [Logging](/docs/openclaw-docs/logging — file-based logging
  • [OpenTelemetry export](/docs/openclaw-docs/gateway/opentelemetry — OTLP push for traces, metrics, and logs