Friday, March 27, 2015

SSAS Database/Cube Processing Best Practices.

Small Databases
If the SSAS database is relatively small & you have a big enough load window do a full process on the database.

Large Databases
If the SSAS database is big & needs to be updated regularly (daily) & load window is not enough to do full process.
  1. Do process update on all the dimensions. 
  2. Do Process Data on the small measure groups.
  3. For big measure groups - Do a process Data on latest partition. ( Current Year or Month).
  4. Finally do process index on the cube.  Process Index rebuilds the aggregations and bitmap indexes on partitions.  Note:   If dimension members are changed/deleted, Process Update (step 1) can drop aggregations & bitmap indexes of effected partitions , the Process Index will rebuild them.
  5. Once a week or whenever you have a big enough load window do a process full on the database. This will make cube & tables in synch.

All the above can be easily accomplished by SSIS Process Cube task or XMLA scripts executed by SSIS.




Friday, March 20, 2015

SSAS (Multi Dimensional) Best Practices.

Dimension Design
  • Do not include attributes that are not needed. They slow down cube processing & also generate unnecessary aggregates which slow down cube performance.
  • Set AttributeHierarchyEnabled = False for attributes that will not be used in queries and will be used as member properties. e.g. SSN, Phone No, Address. Also set it to false for members that have high cardinality or 1-1 relationship with key attribute.
  • Set AttributeHierarchyVisible = False for attributes used in user defined hierarchy - avoids confusion to users.
  • Define appropriate attribute relationship. This enhances query as well as processing performance. 
  • Set attribute attribute RelationshipType = Rigid for attributes that will not change. The aggregates for these attributes will not be reprocessed when cube is updated.
  • Create appropriate user defined hierarchies. This will improve the usability of cube.


Measure Group/ Aggregate Design

  • Run the Aggregation Design Wizard to create the preliminary aggregation design. Specify correct counts for facts and dimension attributes. Target 20-30% performance gain.
  • When the cube has been deployed, you can turn on query logging and then use the Usage-Based Optimization Wizard. Target higher performance gain (70-80%), since the aggregates are based on usage.
  • Set AggregationUsage Property of Attributes as follows -
    • None for infrequently used attributes or for attributes not used in aggregate queries. This will prevent aggregates from being created for those attributes.
    • Unrestricted for attributes which will be used in queries. This will create aggregations for the attribute. By default key attributes and attributes used in natural hierarchy are unrestricted.
  • Do not include too many aggregations - query will be slower due to large search space, processing takes longer.
  • Partition the measure group. Queries & processing will be faster because multiple threads will be used.
  • Frequently used partitions can have higher aggregations.        
  • Older partitions that are fairly-static can be processed infrequently and newer partitions processed regularly (daily).
  • Use Smallest datatypes for measures .

Friday, March 6, 2015

Dimensional Modelling Best Practices

Fact Tables Design

  1. Fact tables should be in third normal form. (Skinny & Long).
  2. They should have only surrogate keys (Dimension IDs) and measures.
  3. Fact tables should have a clearly defined grain (What each row of fact table represents).
  4. The facts should be true to the grain of fact table. Don't mix facts at different grains.
  5. The facts should be additive across all the dimensions. If they are not, then they are probably at different grain and belong to a different fact tables.
  6. To combine information from multiple fact tables (drill across) group the facts to same level of detail and then combine them, CTE or temp tables can be a good aid.
  7. Transaction fact tables can contain transaction identifiers to provide traceability to the source. These are also called degenerate dimensions.
  8. There should be one fact table per business process.

Use the Right Type of Fact Table.


Transaction Fact  
Capture individual transactions/encounters. e.g. Orders, Invoices, Shipments. These fact tables are normally not updated.

Accumulating Snapshot
One fact record per entity that is updated as the entity moves through its lifecycle.
e.g. One Row per Mortgage application. The record is updated as the application moves through different stages. Allows easy determination for time spent in various stages.
One Row per policy, the record is updated to reflect the latest policy status.
Accumulating Snapshot Fact can also be derived from an existing Transaction Fact.

Periodic Snapshot
Calculated once per period. Reflects activity for the period.
e.g. Month end balances for bank customers. Month end inventory levels at a Warehouse.
Periodic snapshot have one semi-additive dimension reflecting snapshot time. The facts cannot be aggregated across the semi-additive dimension.
Periodic snapshot  can be derived from Transaction Fact table. e.g. Month End balance snapshot can be derived by adding all the debit and credit transactions from Transaction Fact.

Factless Fact
Contain no measures.
Provide counts. e.g. number of students enrolled in a class.
Provide coverage. e.g. all the products on sale in a given week.


Dimension Tables Design

  1. Dimension tables should be in second normal form.
  2. Avoid the temptation to create snowflakes (Third normal form). They complicate the model and analyst queries.
  3. Add rich set of attributes to dimension tables. Store the calculated/derived attributes in dimension tables rather can calculating on the fly in reports.
  4. Use conformed dimensions to allow drill across fact tables.
  5. When relation between dimension attributes changes with context, place them in different dim tables. e.g. MD & Hospital relation can occur in multiple context, Location Manager  & Hospital  relation occurs in only one context.   So MD belongs to a different table. Location Manager & Hospital belong to same table.

Use Right Type of Dimension Table


Type 1 Slowly Changing Dimension Table
Type 2 Slowly Changing Dimension Table
Junk Dimension
Behavioral Dimension
Degenerate Dimension