Friday, March 6, 2015

Dimensional Modelling Best Practices

Fact Tables Design

  1. Fact tables should be in third normal form. (Skinny & Long).
  2. They should have only surrogate keys (Dimension IDs) and measures.
  3. Fact tables should have a clearly defined grain (What each row of fact table represents).
  4. The facts should be true to the grain of fact table. Don't mix facts at different grains.
  5. The facts should be additive across all the dimensions. If they are not, then they are probably at different grain and belong to a different fact tables.
  6. To combine information from multiple fact tables (drill across) group the facts to same level of detail and then combine them, CTE or temp tables can be a good aid.
  7. Transaction fact tables can contain transaction identifiers to provide traceability to the source. These are also called degenerate dimensions.
  8. There should be one fact table per business process.

Use the Right Type of Fact Table.


Transaction Fact  
Capture individual transactions/encounters. e.g. Orders, Invoices, Shipments. These fact tables are normally not updated.

Accumulating Snapshot
One fact record per entity that is updated as the entity moves through its lifecycle.
e.g. One Row per Mortgage application. The record is updated as the application moves through different stages. Allows easy determination for time spent in various stages.
One Row per policy, the record is updated to reflect the latest policy status.
Accumulating Snapshot Fact can also be derived from an existing Transaction Fact.

Periodic Snapshot
Calculated once per period. Reflects activity for the period.
e.g. Month end balances for bank customers. Month end inventory levels at a Warehouse.
Periodic snapshot have one semi-additive dimension reflecting snapshot time. The facts cannot be aggregated across the semi-additive dimension.
Periodic snapshot  can be derived from Transaction Fact table. e.g. Month End balance snapshot can be derived by adding all the debit and credit transactions from Transaction Fact.

Factless Fact
Contain no measures.
Provide counts. e.g. number of students enrolled in a class.
Provide coverage. e.g. all the products on sale in a given week.


Dimension Tables Design

  1. Dimension tables should be in second normal form.
  2. Avoid the temptation to create snowflakes (Third normal form). They complicate the model and analyst queries.
  3. Add rich set of attributes to dimension tables. Store the calculated/derived attributes in dimension tables rather can calculating on the fly in reports.
  4. Use conformed dimensions to allow drill across fact tables.
  5. When relation between dimension attributes changes with context, place them in different dim tables. e.g. MD & Hospital relation can occur in multiple context, Location Manager  & Hospital  relation occurs in only one context.   So MD belongs to a different table. Location Manager & Hospital belong to same table.

Use Right Type of Dimension Table


Type 1 Slowly Changing Dimension Table
Type 2 Slowly Changing Dimension Table
Junk Dimension
Behavioral Dimension
Degenerate Dimension


1 comment:

Bhupesh said...

Hi Shakeel,

Your post is wonderful!!. This is the first time luckily got into your post. Feels happy!
seeing more informative details on dwh and best practices.
Could you please publish on how to start developing a DWH project from Scratch for a beginner like me!!.

Thanks a lot!

-Bhupesh