Alerts close the loop between observability and action. When a metric breaches the threshold you set, Observatory raises an event, optionally notifies an external system, and tracks SLA against acknowledgement and resolution times.Documentation Index
Fetch the complete documentation index at: https://docs.flowx.ai/llms.txt
Use this file to discover all available pages before exploring further.
Anatomy of an alert
Two records back this:- AlertRule — the user-defined rule. Metric, operator, threshold, cooldown, notification channel.
- AlertEvent — one occurrence of a rule firing. Carries acknowledged-at and resolved-at timestamps.
Supported metrics
| Metric | Description |
|---|---|
| error_rate | Share of errored runs in the window. |
| p50_latency | Median latency in seconds. |
| p95_latency | 95th-percentile latency in seconds. |
| cost_per_hour | Aggregated cost across runs in the last hour. |
| token_volume | Total tokens in the window. |
| drift_composite | Composite drift score from Drift Monitor. |
| policy_violations | Count of policy evaluations marked as violated. |
| feedback_negative | Count of negative feedback events. |
Operators
Pick the comparison that matches the metric:| Operator | Use for |
|---|---|
> | ”Above threshold” — most common, used with latency, error rate, cost. |
< | ”Below threshold” — used with feedback scores, success rate. |
>= | Inclusive variants of the above. |
<= |
Creating a rule
Set the cooldown
Default 15 minutes. The same rule won’t fire again inside the cooldown window even if the metric stays breached. This is what prevents flapping.
API
| Endpoint | Use |
|---|---|
GET /api/alerts/rules | List rules. |
POST /api/alerts/rules | Create a rule. |
PUT /api/alerts/rules/{id} | Update a rule. |
DELETE /api/alerts/rules/{id} | Delete a rule. |
POST /api/alerts/rules/{id}/evaluate | One-shot evaluation. |
GET /api/alerts/events | List historical events. |
POST /api/alerts/events/{id}/ack | Acknowledge an event (starts SLA clock). |
POST /api/alerts/events/{id}/resolve | Resolve an event (stops SLA clock). |
SLA tracking
When an event fires, two timers start: time-to-acknowledge and time-to-resolve. The Alerts page shows current values and historical compliance against the SLA targets you set per rule. Use this to:- Prove operational readiness to auditors
- Spot rules that fire too often (noise) or never get acknowledged (ignored)
Related resources
Drift Monitor
The source of the
drift_composite metric.Audit Trail
Every ack and resolve is captured in the audit log.

