Guardian provides real-time monitoring for AI models in production. This guide covers setting up comprehensive monitoring for drift detection, performance tracking, and alerting.
Guardian needs a baseline distribution to detect drift:
# Option 1: Use historical datamonitor = client.guardian.create_monitor( model_id="my-model", name="Production Monitor", metrics=["prediction_drift", "data_drift"], baseline={ "data_url": "s3://my-bucket/baseline-data.parquet" })# Option 2: Use rolling window (learns from recent production data)monitor = client.guardian.create_monitor( model_id="my-model", name="Production Monitor", metrics=["prediction_drift", "data_drift"], baseline={ "window": "30d" # Use last 30 days as baseline })
# Get current metricsmetrics = client.guardian.get_metrics( monitor_id="mon_abc123", start_time="2026-02-01T00:00:00Z", end_time="2026-02-01T12:00:00Z", granularity="hour")for point in metrics.data: print(f"{point.timestamp}: PSI={point.prediction_drift:.3f}, P99={point.latency_p99}ms")
# Acknowledge alert (stops repeat notifications)client.guardian.acknowledge_alert( alert_id="alert_xyz789", acknowledged_by="jane@company.com", note="Investigating - may be related to data pipeline issue")
# Resolve alert with root causeclient.guardian.resolve_alert( alert_id="alert_xyz789", resolved_by="jane@company.com", resolution="Rolled back model to v2 due to training data issue", root_cause="data_quality" # For analytics)
Add more granular metrics as you learn your model’s failure modes.
Set appropriate thresholds
Start with conservative thresholds (more alerts)
Tune based on false positive rate
Different models may need different thresholds
Use async logging
Never block your serving path with synchronous logging. Use the AsyncLogger or batch endpoints.
Log ground truth when available
If you can obtain ground truth labels later:
# Log predictionlogger.log(inference_id="inf_123", prediction=pred)# Later, when ground truth is availableclient.guardian.update_inference( inference_id="inf_123", actual=actual_outcome)