Tuesday, January 20, 2015

Indexing Tips for SQL Server Data Warehouse.


We would be using the following simplified schema to illustrate the indexing tips.

























Fact Tables

Create clustered index on primary date column. Most of your queries and reports will be filtering on primary date based on date range, so clustering it will give maximum performance. For the FactInvoice table above cluster on InvoiceDateSK.

Create non-clustered primary key or unique index on columns that define the grain. For example in FactInvoice table, if each row represents invoice line item (Invoice, Product), then create non-clustered primary key on InvoiceNumber and ProductSK.

Alternatively, if date is part of primary key, then make date as the first column in the primary key and cluster it. This will allow queries/reports which use date filter to do clustered index seek.  In the example above InvoiceDateSK, ProductSk and InvoiceNumber could be your clustered primary key.

Keep the fact table index light. Other indexes are not necessary.

Data warehouse queries typically involve large scans of data and aggregation. Very selective seeks are not common for reports from a DW. Therefore, nonclustered indexes on fact tables generally don’t help much.

Dimension Tables.

Create clustered index on Dimension Key, also known as Surrogate Key.
For the Product Dimension (DimProduct), create clustered primary key on ProductSK which is surrogate key.

Create unique index on business key for Type 1 Dimensions. For DimProduct Create a unique index on SourceProductID which is the business key.

Create unique index on business key, Effective Start date and Effective End Date for Type 2 Dimensions. For DimCustomer create a unique index on SourceCustomerID, EffectiveStartDate and EffectiveEndDate.

Create non-unique indexes on other columns if their selectivity is high and if they would be used in reports.



No comments: