Skip to main content

Documentation Index

Fetch the complete documentation index at: https://docs.flowx.ai/llms.txt

Use this file to discover all available pages before exploring further.

Alerts close the loop between observability and action. When a metric breaches the threshold you set, Observatory raises an event, optionally notifies an external system, and tracks SLA against acknowledgement and resolution times.

Anatomy of an alert

Two records back this:
  • AlertRule — the user-defined rule. Metric, operator, threshold, cooldown, notification channel.
  • AlertEvent — one occurrence of a rule firing. Carries acknowledged-at and resolved-at timestamps.

Supported metrics

MetricDescription
error_rateShare of errored runs in the window.
p50_latencyMedian latency in seconds.
p95_latency95th-percentile latency in seconds.
cost_per_hourAggregated cost across runs in the last hour.
token_volumeTotal tokens in the window.
drift_compositeComposite drift score from Drift Monitor.
policy_violationsCount of policy evaluations marked as violated.
feedback_negativeCount of negative feedback events.

Operators

Pick the comparison that matches the metric:
OperatorUse for
>”Above threshold” — most common, used with latency, error rate, cost.
<”Below threshold” — used with feedback scores, success rate.
>=Inclusive variants of the above.
<=

Creating a rule

1

Open Alerts → Rules

Click Add rule.
2

Pick a metric and threshold

For example, p95_latency > 8 seconds.
3

Set the cooldown

Default 15 minutes. The same rule won’t fire again inside the cooldown window even if the metric stays breached. This is what prevents flapping.
4

Choose the destination

Email, webhook, or both. The webhook payload mirrors the AlertEvent shape.
5

Save and test

Use the Evaluate now button on the rule row to fire a one-shot evaluation against current data, without touching the cooldown.

API

EndpointUse
GET /api/alerts/rulesList rules.
POST /api/alerts/rulesCreate a rule.
PUT /api/alerts/rules/{id}Update a rule.
DELETE /api/alerts/rules/{id}Delete a rule.
POST /api/alerts/rules/{id}/evaluateOne-shot evaluation.
GET /api/alerts/eventsList historical events.
POST /api/alerts/events/{id}/ackAcknowledge an event (starts SLA clock).
POST /api/alerts/events/{id}/resolveResolve an event (stops SLA clock).

SLA tracking

When an event fires, two timers start: time-to-acknowledge and time-to-resolve. The Alerts page shows current values and historical compliance against the SLA targets you set per rule. Use this to:
  • Prove operational readiness to auditors
  • Spot rules that fire too often (noise) or never get acknowledged (ignored)

Drift Monitor

The source of the drift_composite metric.

Audit Trail

Every ack and resolve is captured in the audit log.
Last modified on June 2, 2026