Dimensional Modeling
Fact Tables
Records individual business events at the most atomic grain. One row per transaction (sale, order, click). Captures the 'who, what, when, where, how much' of each event. Most common fact table type. Supports detailed drill-down analysis. Can grow very large but provides maximum flexibility for analysis.
- One row per atomic event
- Highest level of detail
- Foreign keys to all dimensions
- Additive measures (quantity, amount)
- Timestamp for event occurrence
- Point-of-sale transactions
- Order line items
- Clickstream events
- Financial transactions
Captures state at regular intervals (daily, weekly, monthly). One row per period per entity being measured. Shows cumulative or point-in-time metrics. Ideal for inventory, account balances, pipeline status. Predictable row count based on periods and entities. Complements transaction facts for trend analysis.
- Fixed time intervals
- Point-in-time measurements
- Predictable table growth
- Semi-additive measures common
- Period dimension required
- Daily inventory levels
- Monthly account balances
- Weekly pipeline snapshots
- End-of-day positions
Tracks lifecycle of a process with defined milestones. Single row updated as entity progresses through stages. Multiple date foreign keys for each milestone. Lag calculations between milestones. Row is updated (not inserted) as process advances. Best for processes with clear start/end and measurable stages.
- Multiple milestone dates per row
- Row updates as process progresses
- Lag metrics between stages
- Workflow/process tracking
- Defined lifecycle stages
- Order fulfillment pipeline
- Insurance claim processing
- Student enrollment lifecycle
- Loan application workflow
Records events or relationships without numeric measures. Captures occurrences or coverage. Two types: event tracking (student attendance) and coverage (product promotions). Foreign keys only, no measures. COUNT(*) is the implicit measure. Essential for tracking what didn't happen or eligibility.
- No numeric measures
- Foreign keys only
- Event occurrence tracking
- Coverage/eligibility recording
- COUNT as implicit measure
- Student class attendance
- Promotion coverage by product
- Employee training completion
- What products were NOT sold
Grain & Granularity
The most critical design decision - defines what one row represents. Must be declared before choosing dimensions or facts. Expressed as 'one row per X'. Determines what questions can be answered. Too coarse loses detail; too fine wastes storage. Cannot mix grains in same fact table.
- Defines row meaning precisely
- Set before dimension/fact selection
- Constrains all design decisions
- Documented in data dictionary
- Immutable once data loaded
- One row per order line item
- One row per day per product per store
- One row per customer per month
- One row per click per session
Lowest level of detail captured by source system. Most granular data possible. Provides maximum analytical flexibility. Supports any aggregation or drill-down. Often required for compliance and audit. Higher storage costs but future-proof for unknown requirements.
- Finest level of detail
- Maximum query flexibility
- Supports all aggregations
- Required for drill-through
- Source system fidelity
- Transaction-level analysis
- Audit and compliance
- Ad-hoc exploration
- Machine learning features
Pre-summarized data at higher level than atomic. Improves query performance for common analyses. Must be derived from atomic facts. Aggregate navigation in BI tools. Separate aggregate fact tables or materialized views. Trade-off: faster queries vs storage and ETL complexity.
- Pre-computed summaries
- Faster query response
- Derived from atomic grain
- Maintained via ETL or materialized views
- Aggregate awareness in BI tools
- Daily sales summaries
- Monthly rollups
- Category-level aggregates
- Dashboard performance
Combining different grains in same fact table - a design flaw. Causes incorrect aggregations and confusing results. Example: mixing order header and line items. Creates NULL foreign keys and broken joins. Always separate into distinct fact tables. Consolidate in views or reporting layer if needed.
- Multiple row meanings
- NULL dimension keys
- Incorrect SUM/COUNT results
- Confusing for analysts
- ETL complexity
- Recognize and refactor
- Split into separate facts
- Create header/line fact pairs
- Document grain violations
Dimension Types
Dimension shared across multiple fact tables and subject areas. Single definition of business entity (customer, product, date). Enables cross-process analysis and enterprise reporting. Bus architecture foundation. Same keys, attributes, and hierarchies everywhere. Master data management integration.
- Shared across fact tables
- Single source of truth
- Enables cross-functional analysis
- Consistent hierarchies
- Enterprise bus architecture
- Customer dimension across sales/service
- Product dimension across inventory/sales
- Date dimension for all facts
- Geography dimension enterprise-wide
Same physical dimension used multiple times in one fact with different meanings. Common for date dimension (order date, ship date, due date). Implemented as views or aliases. Each role has different foreign key in fact. Avoids dimension duplication while supporting multiple perspectives.
- Single physical dimension
- Multiple logical roles
- Views/aliases for each role
- Separate FK per role in fact
- Common with Date dimension
- Order date vs ship date vs due date
- Ship-from vs ship-to location
- Buyer vs seller customer
- Primary vs secondary contact
Combines low-cardinality flags and indicators into single dimension. Avoids cluttering fact table with multiple flag columns. Cartesian product of all flag combinations. Typically small (dozens to hundreds of rows). Improves fact table compression. Named 'junk' but actually good design.
- Low-cardinality attributes
- Flags and indicators combined
- Small dimension table
- Improves fact table design
- Pre-computed combinations
- Yes/No flags (is_returned, is_promoted)
- Status codes (new, approved, rejected)
- Transaction type indicators
- Order flags combination
Dimension key stored in fact table with no corresponding dimension table. Typically transaction identifiers (order number, invoice number). No attributes to store in dimension. Provides drill-through to source. Occupies dimension position but has no join. Common in transaction fact tables.
- Key only, no dimension table
- Transaction identifiers
- Stored directly in fact
- Enables source drill-through
- No attributes to describe
- Order number
- Invoice number
- Transaction ID
- Control numbers
Secondary dimension joined to primary dimension (not fact). Normalizes dimension hierarchy. Snowflake extension of star schema. Used when dimension attribute has its own attributes. Date dimension as outrigger to other dimensions. Trade-off: normalization vs query complexity.
- Dimension joins to dimension
- Hierarchy normalization
- Reduces dimension redundancy
- Adds join complexity
- Snowflake characteristic
- Date attributes on dimension changes
- Geography hierarchy normalized
- Reference data linked to dimension
- Shared sub-dimension
Resolves many-to-many relationships between fact and dimension. Sits between fact and multi-valued dimension. Contains weighting factors for allocation. Examples: multiple diagnoses per patient, multiple categories per product. Enables M:M without violating star schema.
- Many-to-many resolution
- Weighting factors
- Between fact and dimension
- Group keys for filtering
- Allocation percentages
- Multiple diagnoses per visit
- Multiple sales reps per deal
- Product in multiple categories
- Multi-valued attributes
Slowly Changing Dimensions (SCD)
Never update dimension attributes. Original value preserved forever. Simplest approach - no change handling needed. Used for attributes that should never change or where original value matters. Often used for original customer segment or first order date.
- No updates ever
- Original value preserved
- Simplest implementation
- No history tracking
- Insert-only pattern
- Original customer acquisition channel
- First order date
- Initial credit score
- Registration source
Replace old value with new value. No history preserved. Current state only. Simplest change handling. Used when history is irrelevant or corrections to bad data. Historical reports will show new values retroactively. Fast and space-efficient.
- Simple UPDATE statement
- No history preserved
- Current value only
- Historical reports affected
- Corrections and fixes
- Error corrections
- Name spelling fixes
- Attributes where history irrelevant
- Reference data updates
Insert new row for each change, preserving full history. Surrogate key changes, natural key stays same. Effective date range (start/end dates) or current flag. Most common for historical analysis. Enables point-in-time reporting. Dimension table grows with changes.
- Full history preserved
- New surrogate key per version
- Effective date ranges
- Current record flag
- Point-in-time analysis
- Customer address history
- Product price changes
- Employee department transfers
- Status changes over time
Add column to store previous value. Limited history (typically current and previous only). Same row, new columns. Quick access to both values. Simple queries comparing before/after. Limited to small number of changes tracked.
- Previous value column
- Limited history depth
- Same row updated
- Easy before/after comparison
- Simple implementation
- Previous and current address
- Before/after reorganization
- Last price vs current price
- Simple A/B comparisons
Rapidly changing attributes split into separate mini-dimension. Keeps main dimension stable. Mini-dimension has its own surrogate key in fact. Used for attributes that change frequently (age bands, income ranges). Reduces Type 2 row explosion.
- Separate rapidly changing attributes
- Stable main dimension
- Mini-dim FK in fact table
- Reduces row proliferation
- Band/range attributes
- Customer age bands
- Credit score ranges
- Income brackets
- Loyalty tier levels
Combines Types 1, 2, and 3. New row for history (Type 2), current value column on all rows (Type 1 behavior), previous value column (Type 3). Maximum flexibility for analysis. Complex to implement and maintain. Also called 'Type 2+' or 'Hybrid SCD'.
- Full history (Type 2)
- Current value on all rows
- Previous value column
- Maximum query flexibility
- Complex ETL logic
- Current vs historical analysis
- Trend analysis with current state
- Flexible reporting requirements
- Analytical data warehouses
Measures & Metrics
Can be summed across all dimensions. Most common and useful measure type. Examples: sales amount, quantity, cost. Freely aggregated by any dimension. Foundation of most business metrics. Sum, count operations always valid.
- Sum across all dimensions
- Most flexible for analysis
- Standard aggregations work
- Revenue, quantity, cost
- Foundation metrics
- Total sales amount
- Units sold
- Total cost
- Transaction counts
Can be summed across some dimensions but not all (typically not time). Balance and inventory measures. Snapshot values that shouldn't be summed over time. Use AVG, MIN, MAX, or LAST for time dimension. Common in periodic snapshot facts.
- Additive across most dimensions
- Not additive across time
- Balance/inventory measures
- Snapshot point-in-time values
- Special time aggregation needed
- Account balances
- Inventory levels
- Headcount snapshots
- Pipeline values
Cannot be summed across any dimension. Ratios, percentages, unit prices. Must be calculated from additive components. Store components, calculate ratio in reporting. Examples: profit margin, conversion rate, average price.
- Cannot be summed
- Ratios and percentages
- Calculate from components
- Store numerator/denominator
- Derived measures
- Profit margin percentage
- Conversion rates
- Unit prices
- Averages and ratios
Computed from other measures at query time. Not stored in fact table. Business logic in BI tool or view. Examples: profit (revenue - cost), margin (profit/revenue). Keeps fact table clean. Trade-off between storage and compute.
- Calculated at query time
- Not physically stored
- Business logic in reports
- Derived from stored measures
- Flexible definitions
- Profit = Revenue - Cost
- Year-over-year growth
- Moving averages
- Variance calculations
