Time Series Forecasting
Classical Time Series Methods
Classic statistical model for univariate time series. AR (autoregression on lags), I (differencing for stationarity), MA (moving average on errors). Parameter selection via ACF, PACF plots and AIC/BIC. Assumes linear relationships and stationarity. Good baseline, interpretable, works well for short-term forecasts with clear patterns.
Weighted average of past observations with exponentially decreasing weights. Simple, Holt (trend), Holt-Winters (trend + seasonality). Automatic state space modeling. Less sensitive to hyperparameters than ARIMA. Fast and robust for many business forecasting tasks with trend and seasonality.
Additive model with trend, yearly/weekly/daily seasonality, and holidays. Robust to missing data and outliers. Intuitive hyperparameters for non-experts. Handles multiple seasonalities and changepoints. Excellent for business forecasting with strong seasonal patterns. Limited for complex non-linear relationships.
Handles complex seasonality including multiple seasonal periods and non-integer cycles. Box-Cox transformation, ARMA errors, Fourier terms for seasonality. Automatic model selection. Good for data with multiple seasonal patterns (hourly data with daily and weekly cycles). More complex and slower than ETS.
Multivariate time series model capturing dependencies between multiple series. Each variable regressed on lags of itself and other variables. Granger causality tests for relationships. Used in econometrics and finance. Requires stationarity and many observations. Alternative: VECM (Vector Error Correction Model) for cointegrated series.
Decompose time series into trend, seasonal, and residual components. STL (Seasonal and Trend decomposition using Loess) for additive or multiplicative decomposition. Useful for understanding patterns and anomaly detection. Can forecast components separately and recombine. Foundation for hybrid approaches combining decomposition + ML.
Deep Learning for Time Series
Sequential models capturing long-term dependencies with gating mechanisms. LSTM (Long Short-Term Memory) with forget, input, output gates. GRU (Gated Recurrent Unit) simpler with fewer parameters. Encoder-decoder for multi-step forecasting. Attention mechanisms improve interpretability. Requires large data and careful tuning.
Self-attention mechanisms for capturing long-range dependencies. Temporal Fusion Transformer (TFT) with multi-horizon forecasting and interpretability. Informer, Autoformer for long sequence forecasting efficiency. Pretrained models like TimeGPT for zero-shot forecasting. Requires substantial data but SOTA on many benchmarks.
Pure deep learning architecture without RNNs or attention. Interpretable (trend and seasonality blocks) or generic. Stacks of fully connected layers with residual connections. Backcast (reconstruct past) and forecast outputs. Strong empirical performance on M4 competition. Simple yet effective for univariate forecasting.
Probabilistic forecasting with autoregressive RNNs. Outputs parameters of distribution (mean, std) not just point forecast. Trained on multiple related time series (global model). Handles cold-start with item features. Quantile forecasts for uncertainty. Used in production at Amazon for demand forecasting.
1D convolutions with causal padding and dilated convolutions for large receptive fields. Parallelizable unlike RNNs, faster training. Residual connections and layer normalization. Good balance between performance and efficiency. Fewer parameters than LSTMs with competitive accuracy on many tasks.
Large pretrained models for zero-shot or few-shot forecasting. TimeGPT trained on diverse time series datasets. Lag-Llama open-source foundation model. Transfer learning from pretraining to specific tasks. Reduces need for large per-task datasets. Emerging paradigm similar to LLMs for NLP.
Time Series Forecasting Libraries
Python library for statistical modeling including comprehensive time series analysis. ARIMA, SARIMAX, VAR, exponential smoothing, seasonal decomposition. Statistical tests (ADF, KPSS for stationarity). ACF, PACF plots for diagnostics. Gold standard for classical econometric methods. Essential for understanding statistical foundations.
Unified interface for time series ML in Python (scikit-learn compatible). Forecasting, classification, regression, clustering for time series. Composable pipelines with transformers and forecasters. Integrates classical (ARIMA, ETS) and ML (sklearn) methods. Reduction strategies for converting forecasting to regression. Excellent for research and prototyping.
User-friendly library for forecasting from Unit8 (forecasting company). Supports classical (ARIMA, ETS), ML (regression, random forest), and deep learning (RNN, Transformer). Backtesting utilities and multiple series handling. PyTorch-based deep models with GPU support. Anomaly detection and probabilistic forecasting. Great for production applications.
Toolkit for probabilistic time series modeling with PyTorch and MXNet. DeepAR, Transformer, N-BEATS, and more out-of-box. Designed for large-scale forecasting across many series. Built-in evaluation and backtesting. Flexible for custom models. Production-ready with strong AWS integration. Research and industry standard.
State-of-art deep learning time series models in PyTorch. Temporal Fusion Transformer, N-BEATS, DeepAR implementations. Supports covariates, multiple series, and categorical embeddings. PyTorch Lightning integration for training. Interpretability tools and attention visualization. Excellent for complex multi-variate forecasting.
Neural network implementation of Prophet with PyTorch backend. Retains Prophet's interpretability with added flexibility. Autoregression (AR-Net), lagged regressors, future regressors. Supports local modeling per series and global modeling. Easier to extend than original Prophet. Good middle ground between statistical and deep learning.
Time Series Feature Engineering
Temporal Features
- Hour, day of week, month, quarter, year
- Is weekend, is holiday, is business day
- Time since/until event (season start, etc.)
- Cyclical encoding (sin/cos for hour, month)
- Relative time features (days in month, week of year)
Lag & Window Features
- Lagged values (t-1, t-7, t-30)
- Rolling statistics (mean, std, min, max)
- Expanding windows for cumulative metrics
- Exponential weighted moving average (EWMA)
- Differences and percentage changes
Transformations
- Log, Box-Cox for variance stabilization
- Differencing for stationarity
- Seasonal decomposition (trend, seasonal, residual)
- Fourier features for seasonality
- Normalization, standardization per series
External Regressors
- Weather data (temperature, precipitation)
- Economic indicators (CPI, unemployment)
- Event flags (promotions, campaigns, releases)
- Related time series (correlated products/metrics)
- Embedding categorical metadata (store, region)
Time Series Validation Strategies
| Strategy | Description | Pros | Cons | When to Use |
|---|---|---|---|---|
| Train-Test Split | Single split with training on past, testing on future period | Simple, fast, mimics production deployment | Single test period may not be representative | Initial model development, sufficient data |
| Time Series Cross-Validation (Rolling Origin) | Multiple train-test splits with expanding or sliding window | Multiple evaluation points, robust estimate | Computationally expensive, correlated folds | Model selection, hyperparameter tuning |
| Blocked Cross-Validation | Leave gaps between train and test to reduce leakage | Prevents leakage from temporal correlation | Loses some data, still approximation | Short-range dependencies, cautious evaluation |
| Walk-Forward Validation | Incrementally add test data to training set and forecast next | Realistic online learning simulation | Very slow, many retraining iterations | Evaluating online adaptation, limited data scenarios |
| Backtesting | Historical simulation of forecast generation and evaluation | Real-world scenario testing, business-relevant metrics | Requires careful setup, potential overfitting to test | Production readiness, business case validation |
Forecasting Evaluation Metrics
| Metric | Formula Concept | Scale | Advantages | Use Case |
|---|---|---|---|---|
| MAE (Mean Absolute Error) | Average absolute difference | Same as target | Interpretable, robust to outliers | General purpose, when outliers present |
| RMSE (Root Mean Squared Error) | Square root of mean squared error | Same as target | Penalizes large errors more than MAE | When large errors are particularly costly |
| MAPE (Mean Absolute Percentage Error) | Average absolute percentage difference | Percentage | Scale-independent, intuitive | Comparing across different series, business reporting |
| SMAPE (Symmetric MAPE) | Symmetric percentage error | Percentage (0-200%) | Symmetric, bounded unlike MAPE | Avoid MAPE issues with small actuals |
| MASE (Mean Absolute Scaled Error) | MAE scaled by naive forecast error | Unitless | Scale-independent, interpretable baseline | Comparing models across datasets |
| Quantile Loss (Pinball Loss) | Asymmetric loss for probabilistic forecasts | Depends on quantile | Optimizes specific quantiles for risk management | Probabilistic forecasting, inventory optimization |
| CRPS (Continuous Ranked Probability Score) | Proper scoring rule for probabilistic forecasts | Same as target | Evaluates entire forecast distribution | Probabilistic models, uncertainty calibration |
Production Forecasting Architecture
Data Pipeline
- Ingestion: Stream or batch from sources (databases, APIs, logs)
- Storage: Time-series databases (InfluxDB, TimescaleDB, Prometheus)
- Preprocessing: Resampling, interpolation, outlier removal
- Feature Store: Precomputed lag and window features
- Versioning: Track data versions for reproducibility
Model Training
- Orchestration: Airflow, Prefect for scheduled retraining
- Backtesting: Historical simulation on multiple periods
- Hyperparameter Tuning: Optuna, Ray Tune with time-aware CV
- Ensemble: Combine multiple models for robustness
- Model Registry: MLflow, W&B for versioning
Inference & Serving
- Batch Forecasting: Daily/hourly forecast generation for all series
- Real-Time: Low-latency API for on-demand forecasts
- Caching: Store recent forecasts for repeated queries
- Scaling: Parallel processing for many time series
- Fallback: Simple models if main model fails
Monitoring & Alerting
- Accuracy Tracking: Monitor MAE, RMSE on incoming actuals
- Drift Detection: Distribution shift in input features
- Anomaly Alerts: Unexpected forecast deviations
- Dashboards: Grafana, Tableau for forecast visualization
- Feedback Loop: Retrain when performance degrades
Time Series Forecasting Best Practices
Data Quality
- Handle missing values: forward fill, interpolation, or imputation models
- Outlier detection and treatment (IQR, Z-score, domain knowledge)
- Ensure consistent granularity (hourly, daily, weekly)
- Check for data leakage from future to past
- Validate timestamp alignment and timezone consistency
- Document data collection changes that affect distribution
Model Selection
- Start simple (ARIMA, ETS) before complex deep learning
- Consider forecast horizon: short (ML) vs long (statistical)
- Multiple series: global model vs per-series models
- Probabilistic forecasting for uncertainty quantification
- Ensemble diverse model types for robustness
- Interpretability requirements may favor classical methods
Operational Considerations
- Automate retraining schedule based on data volume and drift
- A/B test new models against production baseline
- Provide prediction intervals, not just point forecasts
- Document assumptions and limitations for stakeholders
- Build human-in-the-loop workflows for overrides
- Plan for concept drift and model degradation over time
