Skip to main content

Documentation Index

Fetch the complete documentation index at: https://docs.flowx.ai/llms.txt

Use this file to discover all available pages before exploring further.

Aggregate dashboards catch the obvious. Drift Monitor catches the slow, distributional changes that aggregate charts miss — a small but steady increase in tool-call retries, a quiet shift in response length, a gradually rising refusal rate.

What “drift” means here

Drift Monitor compares the distribution of a metric in a recent window against a baseline distribution. When the shape of the distribution moves enough, it’s flagged. The composite drift score combines movement across all five tracked metrics.

The five metrics

MetricWhat it captures
Latency distributionWall-clock duration of each call.
Token-count distributionTotal tokens (input + output) per call.
Cost distributionCost per call.
Error rateShare of errored calls in the window.
Output-length distributionCharacter or token count of completions.
Each contributes to the composite score with a configurable weight.

Running a drift check

Drift snapshots can run on a schedule or be triggered ad hoc through the API.
POST /api/drift/compute
{
  "app_id": "...",
  "baseline_window": { "days": 30 },
  "recent_window": { "days": 7 }
}
Each computation produces a DriftSnapshot and one DriftResult per metric. The snapshot powers the timeline chart; the per-metric results power the distribution histograms.

Reading the results

1

Open the timeline

The timeline plots the composite score over time. Spikes are the moments worth investigating.
2

Drill into one snapshot

Click a snapshot. The detail view splits the score into the five per-metric contributions.
3

Compare distributions

For each metric, the histogram overlays the baseline and recent distributions so the shift is visible.
4

Jump to the runs

From the snapshot detail, link out to the affected runs in LLM Calls for ground-truth inspection.

When to wire drift in

After a model swap

Compare 7 days before vs 7 days after to confirm the new model behaves consistently.

After a prompt change

Use the deployment timestamp as the boundary between baseline and recent.

As a regression backstop

Schedule daily drift checks against a 30-day baseline. Spikes flag silent regressions.

Before regulatory reviews

Auditors care about stability. Drift evidence is one of the artefacts they ask for.

Alerts

Trigger alerts when the composite score breaches a threshold.

Compliance heatmap

Drift evidence feeds into the operational-control views.
Last modified on June 2, 2026