Recommendation Systems
Core Recommendation Algorithms
Recommend based on similar users or items. User-based: find similar users, recommend what they liked. Item-based: find similar items to what user liked. Works well with implicit feedback (views, clicks). Challenges include sparsity and cold start. Simple baseline with strong performance for many use cases.
Decompose user-item interaction matrix into low-rank factors. Learn latent representations for users and items. Dot product of embeddings predicts ratings. ALS (Alternating Least Squares) for implicit feedback at scale. Foundation for many modern approaches. Handles sparsity better than raw collaborative filtering.
Recommend items similar to what user liked based on item features. Use item metadata (genre, tags, description) to compute similarity. TF-IDF for text, embeddings for images. No cold start problem for new users. Struggles with overspecialization without diversity mechanisms. Good for explainability.
Separate encoders for users and items producing embeddings. Efficient retrieval via ANN search on item embeddings. Train with contrastive loss (in-batch negatives, hard negatives). Scalable to billions of items. Used in YouTube, Pinterest, Facebook. Balance between expressiveness and retrieval efficiency.
Neural networks for learning complex user-item interactions. Wide & Deep (memorization + generalization), DeepFM (factorization machines + deep), DCN (Deep & Cross Network for feature crosses). Multi-task learning for CTR, conversion, engagement. Feature engineering with embeddings for categorical features. State-of-art accuracy but requires large data.
Model temporal dynamics and user intent within session. RNNs, GRU, LSTM for sequence modeling. Transformers (BERT4Rec, SASRec) for attention over history. Next-item prediction objective. Captures short-term interests vs long-term profile. Important for e-commerce, streaming platforms with session context.
Recommendation Libraries & Frameworks
Python library for traditional recommender systems. Implements SVD, SVD++, NMF, KNN collaborative filtering. Train-test split, cross-validation, hyperparameter tuning utilities. Easy to use for prototyping and benchmarking. Limited to classical methods, not deep learning. Good starting point for learning recommendation algorithms.
Hybrid recommendation algorithm combining collaborative and content-based approaches. Handles both implicit and explicit feedback. Metadata for users and items as side information. Logistic loss for implicit, WARP loss for ranking. Fast Cython implementation. Ideal for cold start scenarios with rich metadata.
Fast Python library for implicit feedback datasets. ALS (Alternating Least Squares) optimized for large-scale data. BPR (Bayesian Personalized Ranking) for ranking. GPU acceleration via CuPy. Does not require explicit ratings, works with views, clicks, purchases. Industry standard for implicit collaborative filtering.
TensorFlow library for building deep learning recommendation models. Two-tower retrieval models with efficient serving. Ranking models with feature crosses and embeddings. Integrated with TFX for production pipelines. Scalable training on TPUs and GPUs. Supports multi-task learning and advanced architectures. Good integration with TensorFlow ecosystem.
Comprehensive PyTorch library with 70+ recommendation algorithms. General, sequential, context-aware, and knowledge-based models. Standardized evaluation protocols and datasets. Modular architecture for research and experimentation. Supports both research (flexibility) and production (efficiency). Excellent for comparing algorithms on same data.
End-to-end GPU-accelerated recommender system framework. NVTabular for ETL, Transformers4Rec for models, Triton for serving. Handles billion-scale datasets with GPU dataframes. Multi-GPU and multi-node training. Production-grade performance for real-time inference. Integrated pipeline from data to deployment. Best for large-scale industrial applications.
Retrieval & Ranking Architecture
Candidate Generation
Retrieve broad set of candidates from catalog using fast approximate methods. ANN search on embeddings (FAISS, ScaNN). Collaborative filtering for similar items/users. Content-based filtering by attributes. Multiple candidate sources merged.
Ranking
Score and rank candidates with ML model. Features: user history, item attributes, context, cross-features. Neural ranking models (Wide & Deep, DeepFM, DCN). Pointwise, pairwise, or listwise loss. Optimize for CTR, conversion, engagement metrics.
Re-Ranking
Apply business rules and diversity constraints. Remove already consumed items. Boost new or promoted content. Diversify by category, creator, or attributes. Fairness and bias mitigation. Position-aware scoring adjustments.
A/B Testing
Experiment framework for evaluating algorithms. Online metrics: CTR, conversion, time spent, revenue. Statistical significance testing. Multi-armed bandits for exploration. Interleaving for comparing rankers. Holdout validation groups.
Evaluation Metrics
| Metric | Type | Description | When to Use | Limitations |
|---|---|---|---|---|
| Precision@K | Offline | Fraction of top-K recommendations that are relevant | Top-K recommendation lists, focus on accuracy | Ignores ranking order, not suitable for all relevant items |
| Recall@K | Offline | Fraction of relevant items found in top-K recommendations | Ensuring coverage of user interests | Doesn't penalize irrelevant items in top-K |
| NDCG (Normalized Discounted Cumulative Gain) | Offline | Ranking quality metric considering position and relevance grading | Graded relevance, position-aware evaluation | Requires relevance labels, sensitive to label quality |
| MAP (Mean Average Precision) | Offline | Average precision across all recall levels | Binary relevance, ranking quality evaluation | Binary relevance only, hard to interpret |
| CTR (Click-Through Rate) | Online | Percentage of recommendations clicked by users | A/B testing, measuring user engagement | Doesn't capture downstream conversion or satisfaction |
| Conversion Rate | Online | Percentage of recommendations leading to desired action | E-commerce, subscription, revenue-driven objectives | Delayed signal, affected by many non-model factors |
| Diversity | Both | Variety of categories or attributes in recommendations | Avoiding filter bubbles, improving user discovery | May conflict with accuracy metrics |
| Coverage | Both | Percentage of catalog items ever recommended | Ensuring long-tail items get exposure | High coverage may reduce personalization quality |
Production Recommendation Architecture
Offline Training
- Batch Pipeline: Airflow, Spark for large-scale feature engineering
- Model Training: TensorFlow, PyTorch on GPU clusters
- Embedding Generation: Item and user embeddings computed offline
- Index Building: ANN indexes (FAISS, ScaNN) for retrieval
- Schedule: Daily or weekly retraining cadence
Real-Time Serving
- Feature Store: Feast, Tecton for low-latency features
- Model Serving: TensorFlow Serving, TorchServe, Triton
- Caching: Redis for user profiles, item metadata
- API Gateway: Fast ranking API with SLA guarantees
- Latency Target: p99 under 100ms for responsiveness
Data Collection
- Event Tracking: Impressions, clicks, conversions, dwell time
- Stream Processing: Kafka, Flink for real-time aggregation
- User Profiles: Incrementally update with new interactions
- Negative Sampling: Capture what was shown but not clicked
- Privacy: GDPR compliance, user consent management
Monitoring
- Online Metrics: CTR, conversion rate, revenue per user
- Model Performance: Prediction latency, error rates
- Data Quality: Feature distribution drift detection
- Business KPIs: Engagement, retention, satisfaction
- Alerts: Anomaly detection on key metrics
Recommendation System Best Practices
Cold Start Strategies
- Content-based recommendations for new items using metadata
- Onboarding flow to collect initial user preferences
- Popular/trending items as fallback for new users
- Transfer learning from similar users or items
- Multi-armed bandits for exploration vs exploitation
- Gradual transition from content to collaborative filtering
Diversity & Serendipity
- Avoid filter bubbles with diversity constraints
- Inject exploratory recommendations for discovery
- Diversify by category, creator, time period
- Maximal Marginal Relevance (MMR) for result diversity
- Serendipity metrics to measure unexpected relevance
- Balance between accuracy and diversity based on context
Scalability & Performance
- ANN algorithms (HNSW, IVF, PQ) for billion-scale retrieval
- Distributed training with data and model parallelism
- Feature precomputation and caching strategies
- Async model updates without service interruption
- Multi-level caching (CDN, application, database)
- Load testing and capacity planning for peak traffic
