Skip to main content

Overview

Guardian provides real-time monitoring for AI models in production. This guide covers setting up comprehensive monitoring for drift detection, performance tracking, and alerting.

Quick Start

from rotavision import Rotavision

client = Rotavision()

# Create a monitor
monitor = client.guardian.create_monitor(
    model_id="recommendation-v3",
    name="Prod Recommendations",
    metrics=["prediction_drift", "data_drift", "latency_p99", "error_rate"],
    alerts=[
        {"metric": "prediction_drift", "threshold": 0.1, "severity": "warning"},
        {"metric": "prediction_drift", "threshold": 0.2, "severity": "critical"},
        {"metric": "error_rate", "threshold": 0.01, "severity": "critical"},
    ]
)

print(f"Monitor created: {monitor.id}")

Integrating with Your Serving Code

Basic Integration

# In your model serving code
import time
from rotavision import Rotavision

client = Rotavision()
MONITOR_ID = "mon_abc123"

def predict(features):
    start = time.time()

    try:
        prediction = model.predict(features)
        latency_ms = (time.time() - start) * 1000

        # Log to Guardian
        client.guardian.log_inference(
            monitor_id=MONITOR_ID,
            input_data=features,
            prediction=prediction,
            latency_ms=latency_ms
        )

        return prediction

    except Exception as e:
        # Log error
        client.guardian.log_inference(
            monitor_id=MONITOR_ID,
            input_data=features,
            error={"code": type(e).__name__, "message": str(e)}
        )
        raise
For production workloads, use async logging to minimize latency impact:
from rotavision.logging import AsyncLogger

# Create async logger
logger = AsyncLogger(
    api_key="rv_live_...",
    monitor_id="mon_abc123",
    batch_size=100,       # Send in batches of 100
    flush_interval_ms=1000  # Or flush every second
)

def predict(features):
    start = time.time()
    prediction = model.predict(features)
    latency_ms = (time.time() - start) * 1000

    # Non-blocking log
    logger.log(
        input_data=features,
        prediction=prediction,
        latency_ms=latency_ms
    )

    return prediction

# Flush on shutdown
import atexit
atexit.register(logger.flush)

Setting Up Drift Detection

Establishing Baseline

Guardian needs a baseline distribution to detect drift:
# Option 1: Use historical data
monitor = client.guardian.create_monitor(
    model_id="my-model",
    name="Production Monitor",
    metrics=["prediction_drift", "data_drift"],
    baseline={
        "data_url": "s3://my-bucket/baseline-data.parquet"
    }
)

# Option 2: Use rolling window (learns from recent production data)
monitor = client.guardian.create_monitor(
    model_id="my-model",
    name="Production Monitor",
    metrics=["prediction_drift", "data_drift"],
    baseline={
        "window": "30d"  # Use last 30 days as baseline
    }
)

Drift Metrics

MetricDescriptionTypical Threshold
PSIPopulation Stability IndexWarning: 0.1, Critical: 0.2
KL DivergenceKullback-Leibler divergenceWarning: 0.1, Critical: 0.2
JS DistanceJensen-Shannon distanceWarning: 0.1, Critical: 0.15
KS StatisticKolmogorov-Smirnov testWarning: 0.05, Critical: 0.1

Configuring Alerts

Alert Channels

monitor = client.guardian.create_monitor(
    model_id="my-model",
    name="Production Monitor",
    metrics=["prediction_drift", "error_rate"],
    alerts=[
        {
            "metric": "prediction_drift",
            "threshold": 0.2,
            "severity": "critical",
            "window": "1h"  # Evaluate over 1 hour
        }
    ],
    notifications={
        "email": ["[email protected]", "[email protected]"],
        "slack_webhook": "https://hooks.slack.com/services/...",
        "pagerduty_key": "your-pagerduty-key"
    }
)

Alert Severity Levels

SeverityUse CaseResponse
infoFYI notificationsReview when convenient
warningPotential issuesInvestigate within 24h
criticalImmediate attentionPage on-call engineer

Viewing Metrics

Dashboard

Access the monitoring dashboard at:
https://dashboard.rotavision.com/monitors/{monitor_id}

API

# Get current metrics
metrics = client.guardian.get_metrics(
    monitor_id="mon_abc123",
    start_time="2026-02-01T00:00:00Z",
    end_time="2026-02-01T12:00:00Z",
    granularity="hour"
)

for point in metrics.data:
    print(f"{point.timestamp}: PSI={point.prediction_drift:.3f}, P99={point.latency_p99}ms")

Handling Alerts

Acknowledging

# Acknowledge alert (stops repeat notifications)
client.guardian.acknowledge_alert(
    alert_id="alert_xyz789",
    acknowledged_by="[email protected]",
    note="Investigating - may be related to data pipeline issue"
)

Resolving

# Resolve alert with root cause
client.guardian.resolve_alert(
    alert_id="alert_xyz789",
    resolved_by="[email protected]",
    resolution="Rolled back model to v2 due to training data issue",
    root_cause="data_quality"  # For analytics
)

Best Practices

Begin with essential metrics:
  • prediction_drift - Catches distribution shifts
  • latency_p99 - Performance degradation
  • error_rate - System health
Add more granular metrics as you learn your model’s failure modes.
  • Start with conservative thresholds (more alerts)
  • Tune based on false positive rate
  • Different models may need different thresholds
Never block your serving path with synchronous logging. Use the AsyncLogger or batch endpoints.
If you can obtain ground truth labels later:
# Log prediction
logger.log(inference_id="inf_123", prediction=pred)

# Later, when ground truth is available
client.guardian.update_inference(
    inference_id="inf_123",
    actual=actual_outcome
)

Next Steps