Model Monitoring

Model Monitoring Categories

Data Quality & Drift

  • Data Drift: Input feature distribution changes
  • Concept Drift: Relationship between features and target changes
  • Schema Validation: Data type and format compliance
  • Missing Data: Null value rates and patterns
  • Outlier Detection: Anomalous input values

Model Performance

  • Prediction Accuracy: Precision, recall, F1, RMSE
  • Prediction Drift: Output distribution changes
  • Confidence Scores: Model certainty tracking
  • Business Metrics: Revenue, conversion, engagement
  • Comparative Analysis: vs baseline/champion model

Infrastructure & System

  • Latency: P50, P95, P99 response times
  • Throughput: Requests per second
  • Error Rates: HTTP errors, exceptions
  • Resource Usage: CPU, memory, GPU utilization
  • Cost Metrics: Inference cost per request

Model Health

  • Model Version: Deployed version tracking
  • Dependency Health: Feature store, DBs connectivity
  • Data Freshness: Feature recency
  • Retraining Status: Last training date
  • A/B Test Progress: Traffic splits and metrics

Drift Detection Methods

MethodTypeDescriptionBest For
KL DivergenceStatisticalMeasures difference between two probability distributionsContinuous features, distribution comparison
Kolmogorov-Smirnov TestStatisticalTwo-sample test comparing distributionsContinuous variables, univariate drift
Population Stability Index (PSI)StatisticalMeasures distribution shift in binned dataCredit scoring, finance applications
Chi-Square TestStatisticalTests independence of categorical variablesCategorical features
Adversarial ValidationML-basedTrain classifier to distinguish train vs production dataMultivariate drift, complex patterns
Domain ClassifierML-basedModel predicting if data is from training or productionHigh-dimensional data, deep learning
CUSUMSequentialCumulative sum control chart for change detectionTime-series data, gradual drift
monitoring

ML Monitoring Tools

Evidently AI

Open-source ML monitoring tool for data drift, model performance, and target drift detection with interactive reports.

Key Features
  • Data drift detection
  • Model performance monitoring
  • Test suites and reports
  • Integration with ML platforms
  • Custom metrics support
Use Cases
  • Production ML monitoring
  • Data quality validation
  • A/B test analysis
  • Model debugging
Alternatives
WhyLabsArizeFiddlerNannyML
WhyLabs

AI observability platform with data and ML monitoring, providing privacy-preserving model and data health insights.

Key Features
  • Data profiling and drift
  • Model performance tracking
  • Privacy-preserving monitoring
  • Anomaly detection
  • Integration with MLflow, SageMaker
Use Cases
  • Enterprise ML monitoring
  • Regulated industries
  • Privacy-sensitive applications
  • Multi-model observability
Alternatives
EvidentlyArizeFiddlerArthur
Arize AI

ML observability platform for monitoring, explaining, and troubleshooting production ML models with drift detection.

Key Features
  • Model performance monitoring
  • Drift and data quality
  • Explainability (SHAP)
  • Embedding analysis
  • Automated troubleshooting
Use Cases
  • Production model monitoring
  • NLP and CV models
  • Embedding drift tracking
  • Model debugging
Alternatives
WhyLabsFiddlerArthurEvidently
Fiddler AI

Enterprise MLOps platform for monitoring, explaining, and analyzing ML models with focus on responsible AI.

Key Features
  • Model monitoring and alerts
  • Explainable AI
  • Fairness and bias detection
  • Performance tracking
  • Root cause analysis
Use Cases
  • Enterprise ML governance
  • Regulated industries
  • Responsible AI programs
  • High-stakes predictions
Alternatives
ArizeWhyLabsArthurDataRobot
Weights & Biases (Production)

Extends W&B experiment tracking with production monitoring for deployed models and real-time performance tracking.

Key Features
  • Production model tracking
  • Performance dashboards
  • Alerting on degradation
  • Integration with W&B ecosystem
  • Custom metrics
Use Cases
  • End-to-end ML lifecycle
  • Experiment to production
  • Team collaboration
  • Research to deployment
Alternatives
MLflowNeptuneCometEvidently
Grafana + Prometheus

Open-source monitoring stack for metrics collection, visualization, and alerting, adaptable for ML monitoring.

Key Features
  • Time-series metrics storage
  • Custom dashboards
  • Alerting rules
  • Large ecosystem
  • Self-hosted option
Use Cases
  • Custom ML metrics
  • Infrastructure monitoring
  • Cost-conscious teams
  • Full control deployments
Alternatives
DataDogNew RelicELK StackSplunk

Alerting Best Practices

Alert Severity Levels

P0 - Critical: Model serving failures, major accuracy drops (>20%)

P1 - High: Moderate drift (>10%), latency spikes (2x)

P2 - Medium: Minor drift (5-10%), warning thresholds

P3 - Low: Informational, trends, scheduled reports

Alert Routing

PagerDuty/Opsgenie: P0/P1 on-call escalation

Slack/Teams: P2 team channels

Email: P3 daily/weekly digests

Dashboards: All severities visible

Alert Fatigue Prevention

Aggregation: Group similar alerts

Thresholds: Tune to reduce false positives

Rate Limiting: Max alerts per time window

Auto-resolution: Clear when conditions normalize

Retraining Triggers & Strategies

Trigger TypeConditionFrequencyPros/Cons
Time-BasedFixed schedule (daily, weekly, monthly)Regular intervals✓ Predictable, simple
✗ May retrain unnecessarily
Performance-BasedAccuracy drops below thresholdOn demand✓ Reactive to issues
✗ May be too late
Data DriftDistribution shift detectedOn demand✓ Proactive
✗ Requires drift detection
Data VolumeN new samples accumulatedVariable✓ Data-driven
✗ May miss temporal patterns
HybridCombination of aboveAdaptive✓ Flexible, comprehensive
✗ More complex
Continuous LearningOnline learning, constant updatesReal-time✓ Always current
✗ Resource intensive, stability risk

Key Dashboard Metrics

speed

Performance SLIs

  • Latency: P50, P95, P99
  • Throughput (RPS)
  • Error rate
  • Availability %
analytics

Model Metrics

  • Accuracy/F1/RMSE
  • Prediction distribution
  • Confidence scores
  • Drift scores
account_balance

Business KPIs

  • Conversion rate
  • Revenue impact
  • User engagement
  • Cost per prediction