Will Percey — Portfolio

Dimensional Modeling

> > Updated Dec 2025

table_chart

Fact Tables

Transaction Fact Table

Records individual business events at the most atomic grain. One row per transaction (sale, order, click). Captures the 'who, what, when, where, how much' of each event. Most common fact table type. Supports detailed drill-down analysis. Can grow very large but provides maximum flexibility for analysis.

Key Features

One row per atomic event
Highest level of detail
Foreign keys to all dimensions
Additive measures (quantity, amount)
Timestamp for event occurrence

Use Cases

Point-of-sale transactions
Order line items
Clickstream events
Financial transactions

Related Concepts

Periodic SnapshotAccumulating SnapshotFactless Fact

Periodic Snapshot Fact Table

Captures state at regular intervals (daily, weekly, monthly). One row per period per entity being measured. Shows cumulative or point-in-time metrics. Ideal for inventory, account balances, pipeline status. Predictable row count based on periods and entities. Complements transaction facts for trend analysis.

Key Features

Fixed time intervals
Point-in-time measurements
Predictable table growth
Semi-additive measures common
Period dimension required

Use Cases

Daily inventory levels
Monthly account balances
Weekly pipeline snapshots
End-of-day positions

Related Concepts

Transaction FactAccumulating SnapshotAggregate Table

Accumulating Snapshot Fact Table

Tracks lifecycle of a process with defined milestones. Single row updated as entity progresses through stages. Multiple date foreign keys for each milestone. Lag calculations between milestones. Row is updated (not inserted) as process advances. Best for processes with clear start/end and measurable stages.

Key Features

Multiple milestone dates per row
Row updates as process progresses
Lag metrics between stages
Workflow/process tracking
Defined lifecycle stages

Use Cases

Order fulfillment pipeline
Insurance claim processing
Student enrollment lifecycle
Loan application workflow

Related Concepts

Transaction FactPeriodic SnapshotProcess Mining

Factless Fact Table

Records events or relationships without numeric measures. Captures occurrences or coverage. Two types: event tracking (student attendance) and coverage (product promotions). Foreign keys only, no measures. COUNT(*) is the implicit measure. Essential for tracking what didn't happen or eligibility.

Key Features

No numeric measures
Foreign keys only
Event occurrence tracking
Coverage/eligibility recording
COUNT as implicit measure

Use Cases

Student class attendance
Promotion coverage by product
Employee training completion
What products were NOT sold

Related Concepts

Bridge TableTransaction FactCoverage Table

grain

Grain & Granularity

Grain Declaration

The most critical design decision - defines what one row represents. Must be declared before choosing dimensions or facts. Expressed as 'one row per X'. Determines what questions can be answered. Too coarse loses detail; too fine wastes storage. Cannot mix grains in same fact table.

Key Features

Defines row meaning precisely
Set before dimension/fact selection
Constrains all design decisions
Documented in data dictionary
Immutable once data loaded

Use Cases

One row per order line item
One row per day per product per store
One row per customer per month
One row per click per session

Related Concepts

Aggregate GrainAtomic GrainDerived Grain

Atomic Grain

Lowest level of detail captured by source system. Most granular data possible. Provides maximum analytical flexibility. Supports any aggregation or drill-down. Often required for compliance and audit. Higher storage costs but future-proof for unknown requirements.

Key Features

Finest level of detail
Maximum query flexibility
Supports all aggregations
Required for drill-through
Source system fidelity

Use Cases

Transaction-level analysis
Audit and compliance
Ad-hoc exploration
Machine learning features

Related Concepts

Summary GrainAggregated GrainRolled-up Grain

Aggregate Grain

Pre-summarized data at higher level than atomic. Improves query performance for common analyses. Must be derived from atomic facts. Aggregate navigation in BI tools. Separate aggregate fact tables or materialized views. Trade-off: faster queries vs storage and ETL complexity.

Key Features

Pre-computed summaries
Faster query response
Derived from atomic grain
Maintained via ETL or materialized views
Aggregate awareness in BI tools

Use Cases

Daily sales summaries
Monthly rollups
Category-level aggregates
Dashboard performance

Related Concepts

Atomic GrainOLAP CubeMaterialized View

Mixed Grain (Anti-pattern)

Combining different grains in same fact table - a design flaw. Causes incorrect aggregations and confusing results. Example: mixing order header and line items. Creates NULL foreign keys and broken joins. Always separate into distinct fact tables. Consolidate in views or reporting layer if needed.

Key Features

Multiple row meanings
NULL dimension keys
Incorrect SUM/COUNT results
Confusing for analysts
ETL complexity

Use Cases

Recognize and refactor
Split into separate facts
Create header/line fact pairs
Document grain violations

Related Concepts

Separate Fact TablesHeader/Detail FactsProper Grain Design

Dimension Types

Conformed Dimension

Dimension shared across multiple fact tables and subject areas. Single definition of business entity (customer, product, date). Enables cross-process analysis and enterprise reporting. Bus architecture foundation. Same keys, attributes, and hierarchies everywhere. Master data management integration.

Key Features

Shared across fact tables
Single source of truth
Enables cross-functional analysis
Consistent hierarchies
Enterprise bus architecture

Use Cases

Customer dimension across sales/service
Product dimension across inventory/sales
Date dimension for all facts
Geography dimension enterprise-wide

Related Concepts

Local DimensionRole-playing DimensionOutrigger

Role-Playing Dimension

Same physical dimension used multiple times in one fact with different meanings. Common for date dimension (order date, ship date, due date). Implemented as views or aliases. Each role has different foreign key in fact. Avoids dimension duplication while supporting multiple perspectives.

Key Features

Single physical dimension
Multiple logical roles
Views/aliases for each role
Separate FK per role in fact
Common with Date dimension

Use Cases

Order date vs ship date vs due date
Ship-from vs ship-to location
Buyer vs seller customer
Primary vs secondary contact

Related Concepts

Separate DimensionsJunk DimensionBridge Table

Junk Dimension

Combines low-cardinality flags and indicators into single dimension. Avoids cluttering fact table with multiple flag columns. Cartesian product of all flag combinations. Typically small (dozens to hundreds of rows). Improves fact table compression. Named 'junk' but actually good design.

Key Features

Low-cardinality attributes
Flags and indicators combined
Small dimension table
Improves fact table design
Pre-computed combinations

Use Cases

Yes/No flags (is_returned, is_promoted)
Status codes (new, approved, rejected)
Transaction type indicators
Order flags combination

Related Concepts

Flags in FactSeparate Mini-dimensionsDegenerate Dimension

Degenerate Dimension

Dimension key stored in fact table with no corresponding dimension table. Typically transaction identifiers (order number, invoice number). No attributes to store in dimension. Provides drill-through to source. Occupies dimension position but has no join. Common in transaction fact tables.

Key Features

Key only, no dimension table
Transaction identifiers
Stored directly in fact
Enables source drill-through
No attributes to describe

Use Cases

Order number
Invoice number
Transaction ID
Control numbers

Related Concepts

Mini-dimensionOutriggerSeparate Dimension

Outrigger Dimension

Secondary dimension joined to primary dimension (not fact). Normalizes dimension hierarchy. Snowflake extension of star schema. Used when dimension attribute has its own attributes. Date dimension as outrigger to other dimensions. Trade-off: normalization vs query complexity.

Key Features

Dimension joins to dimension
Hierarchy normalization
Reduces dimension redundancy
Adds join complexity
Snowflake characteristic

Use Cases

Date attributes on dimension changes
Geography hierarchy normalized
Reference data linked to dimension
Shared sub-dimension

Related Concepts

Denormalized in DimensionBridge TableStar Schema

Bridge Table

Resolves many-to-many relationships between fact and dimension. Sits between fact and multi-valued dimension. Contains weighting factors for allocation. Examples: multiple diagnoses per patient, multiple categories per product. Enables M:M without violating star schema.

Key Features

Many-to-many resolution
Weighting factors
Between fact and dimension
Group keys for filtering
Allocation percentages

Use Cases

Multiple diagnoses per visit
Multiple sales reps per deal
Product in multiple categories
Multi-valued attributes

Related Concepts

Junk DimensionFactless FactArray in Dimension

history

Slowly Changing Dimensions (SCD)

SCD Type 0 - Retain Original

Never update dimension attributes. Original value preserved forever. Simplest approach - no change handling needed. Used for attributes that should never change or where original value matters. Often used for original customer segment or first order date.

Key Features

No updates ever
Original value preserved
Simplest implementation
No history tracking
Insert-only pattern

Use Cases

Original customer acquisition channel
First order date
Initial credit score
Registration source

Related Types

Type 1Type 2Type 3

SCD Type 1 - Overwrite

Replace old value with new value. No history preserved. Current state only. Simplest change handling. Used when history is irrelevant or corrections to bad data. Historical reports will show new values retroactively. Fast and space-efficient.

Key Features

Simple UPDATE statement
No history preserved
Current value only
Historical reports affected
Corrections and fixes

Use Cases

Error corrections
Name spelling fixes
Attributes where history irrelevant
Reference data updates

Related Types

Type 0Type 2Type 6

SCD Type 2 - Add New Row

Insert new row for each change, preserving full history. Surrogate key changes, natural key stays same. Effective date range (start/end dates) or current flag. Most common for historical analysis. Enables point-in-time reporting. Dimension table grows with changes.

Key Features

Full history preserved
New surrogate key per version
Effective date ranges
Current record flag
Point-in-time analysis

Use Cases

Customer address history
Product price changes
Employee department transfers
Status changes over time

Related Types

Type 1Type 3Type 6

SCD Type 3 - Add New Column

Add column to store previous value. Limited history (typically current and previous only). Same row, new columns. Quick access to both values. Simple queries comparing before/after. Limited to small number of changes tracked.

Key Features

Previous value column
Limited history depth
Same row updated
Easy before/after comparison
Simple implementation

Use Cases

Previous and current address
Before/after reorganization
Last price vs current price
Simple A/B comparisons

Related Types

Type 2Type 1Type 6

SCD Type 4 - Mini-Dimension

Rapidly changing attributes split into separate mini-dimension. Keeps main dimension stable. Mini-dimension has its own surrogate key in fact. Used for attributes that change frequently (age bands, income ranges). Reduces Type 2 row explosion.

Key Features

Separate rapidly changing attributes
Stable main dimension
Mini-dim FK in fact table
Reduces row proliferation
Band/range attributes

Use Cases

Customer age bands
Credit score ranges
Income brackets
Loyalty tier levels

Related Types

Type 2Type 6Junk Dimension

SCD Type 6 - Hybrid (1+2+3)

Combines Types 1, 2, and 3. New row for history (Type 2), current value column on all rows (Type 1 behavior), previous value column (Type 3). Maximum flexibility for analysis. Complex to implement and maintain. Also called 'Type 2+' or 'Hybrid SCD'.

Key Features

Full history (Type 2)
Current value on all rows
Previous value column
Maximum query flexibility
Complex ETL logic

Use Cases

Current vs historical analysis
Trend analysis with current state
Flexible reporting requirements
Analytical data warehouses

Related Types

Type 2Type 3Type 4

functions

Measures & Metrics

Additive Measures

Can be summed across all dimensions. Most common and useful measure type. Examples: sales amount, quantity, cost. Freely aggregated by any dimension. Foundation of most business metrics. Sum, count operations always valid.

Key Features

Sum across all dimensions
Most flexible for analysis
Standard aggregations work
Revenue, quantity, cost
Foundation metrics

Use Cases

Total sales amount
Units sold
Total cost
Transaction counts

Related Concepts

Semi-additiveNon-additiveCalculated Measures

Semi-Additive Measures

Can be summed across some dimensions but not all (typically not time). Balance and inventory measures. Snapshot values that shouldn't be summed over time. Use AVG, MIN, MAX, or LAST for time dimension. Common in periodic snapshot facts.

Key Features

Additive across most dimensions
Not additive across time
Balance/inventory measures
Snapshot point-in-time values
Special time aggregation needed

Use Cases

Account balances
Inventory levels
Headcount snapshots
Pipeline values

Related Concepts

AdditiveNon-additivePeriod-end Values

Non-Additive Measures

Cannot be summed across any dimension. Ratios, percentages, unit prices. Must be calculated from additive components. Store components, calculate ratio in reporting. Examples: profit margin, conversion rate, average price.

Key Features

Cannot be summed
Ratios and percentages
Calculate from components
Store numerator/denominator
Derived measures

Use Cases

Profit margin percentage
Conversion rates
Unit prices
Averages and ratios

Related Concepts

AdditiveSemi-additiveCalculated Metrics

Derived/Calculated Measures

Computed from other measures at query time. Not stored in fact table. Business logic in BI tool or view. Examples: profit (revenue - cost), margin (profit/revenue). Keeps fact table clean. Trade-off between storage and compute.

Key Features

Calculated at query time
Not physically stored
Business logic in reports
Derived from stored measures
Flexible definitions

Use Cases

Profit = Revenue - Cost
Year-over-year growth
Moving averages
Variance calculations

Related Concepts

Stored MeasuresPre-calculatedETL-computed