Wednesday, January 28, 2015

Simulate Multiple Clustered Indexes on a Fact Table

You can use the following trick  if you have two widely used date values in fact table.
We will use the example of Revenue Fact from Hospital system to illustrate the concept.


The  RevenueFact table has service date (ServiceDateID) and Payment post date (PostedDateID )fields.  It is clustered on PostedDateId. Some reports are driven by service date and other by posted date. Since the fact table is clustered on posted date, the reports driven off of service date would be slow.

To fix the problem
  • Create an indexed view on the RevenueFact with ServiceDateID being the first column of the clustered index.
  • Limit the data returned by the view (e.g. last 12 months). This way most of the operational reports/queries which hit recent data would end up using this indexed view.
  • Limit the columns in the view to the ones that your reports/queries would be using.

By following the last two guidelines, you wouldn't end up creating a copy of the fact table, and you wouldn't be slowing your data loads too much.
You don't have to make any changes to the reports, since the optimizer would use the indexed view whenever applicable.


Tuesday, January 20, 2015

Indexing Tips for SQL Server Data Warehouse.


We would be using the following simplified schema to illustrate the indexing tips.

























Fact Tables

Create clustered index on primary date column. Most of your queries and reports will be filtering on primary date based on date range, so clustering it will give maximum performance. For the FactInvoice table above cluster on InvoiceDateSK.

Create non-clustered primary key or unique index on columns that define the grain. For example in FactInvoice table, if each row represents invoice line item (Invoice, Product), then create non-clustered primary key on InvoiceNumber and ProductSK.

Alternatively, if date is part of primary key, then make date as the first column in the primary key and cluster it. This will allow queries/reports which use date filter to do clustered index seek.  In the example above InvoiceDateSK, ProductSk and InvoiceNumber could be your clustered primary key.

Keep the fact table index light. Other indexes are not necessary.

Data warehouse queries typically involve large scans of data and aggregation. Very selective seeks are not common for reports from a DW. Therefore, nonclustered indexes on fact tables generally don’t help much.

Dimension Tables.

Create clustered index on Dimension Key, also known as Surrogate Key.
For the Product Dimension (DimProduct), create clustered primary key on ProductSK which is surrogate key.

Create unique index on business key for Type 1 Dimensions. For DimProduct Create a unique index on SourceProductID which is the business key.

Create unique index on business key, Effective Start date and Effective End Date for Type 2 Dimensions. For DimCustomer create a unique index on SourceCustomerID, EffectiveStartDate and EffectiveEndDate.

Create non-unique indexes on other columns if their selectivity is high and if they would be used in reports.