Wednesday, August 11, 2010

SQL Server 2008 BI Enhancements.

Merge Statement
Allows you to perform INSERT/UPDATE/DELETE in one statement. 
Simplifies Insert/Update operations on fact / dimension tables.

To see how to apply Merge Statement for Type 1/Type 2 Updates on Dimension table refer to this article - http://www.kimballgroup.com/2008/11/design-tip-107-using-the-sql-merge-statement-for-slowly-changing-dimension-processing/

Star Join Optimizations

Bitmap filters from Dimension table are pushed down into the scan of fact table. Only rows from fact table that qualify for join from Dimension table are retrieved and others are eliminated. This decreases the numbers of rows that would be processed by each join step.

This improves performance of queries on star schema by 15-20 %.

For Start Join Optimization to work, the join between dimensions and fact table should be inner join and based on single column.

Start Join Optimization is applied to medium selectivity queries, for highly selective queries or queries which would need to scan most of the fact table, the Star Join Optimization is not applied.
  

Data Compression

Data warehousing queries typically scan large amount of data. Compression reduces I/O and hence good for DW queries. Compression also increases page life expectancy since data will be kept as compressed in buffer pool. Compression comes at increased CPU cost, analyze your CPU usage before implementing compression.

Two types of compression are supported.
  1. Row Compression - Fixed-Width data types are stored in variable format. E.g. INT may take less than 4 bytes if it’s small enough.
  2. Page Compression - It includes Row level compression. In addition the common values in the page are stored at one place and referenced from rows/columns.
Row Compression has less CPU overhead but page compression offers better data compression.
Compression can be applied to tables, indexed views, indexes, & partitions (Individual/All).

Alter Table TableName REBUILD with (data_compression=PAGE)
Create Table TableName ( Col1 INT, Col2 varchar(50) ) With (data_compression = PAGE)


Minimal Logging

Minimal logging reduces number of log records that need to be written to disk thereby greatly improving the performance of large insert operations.
SQL Server 2005 Allowed minimal logging for SELECT INTO and BULK INSERT Statements.
SQL Server 2008 Allows Minimal Logging for INSERT INTO SELECT FROM Statement.

Conditions for minimal logging -
Database in Simple Recover or Bulk Logged Recovery Mode.
Target Table is Heap (No Clustered Index), and TABLOCK Hint is used.
OR Table has Clustered Index AND Empty AND TABLOCK Hint is used.

 Insert INTO TargetTable ( Col1, Col2, Col3) WITH (TABLOCK)
 SELECT Col1, Col2, Col3 FROM SourceTable;

Grouping Sets.

Grouping Sets allows you to specify multiple groupings of data in same query. The results are equivalent to UNION ALL of the specified groups.

SELECT  T.SalesTerritoryCountry
      , SUM(F.SalesAmount) AS SalesAmount
FROM  dbo.FactResellerSales F      
      INNER JOIN dbo.DimSalesTerritory T ON
         F.SalesTerritoryKey = T.SalesTerritoryKey
WHERE F.OrderDateKey BETWEEN 20040101 AND 20041231
GROUP BY GROUPING SETS (
        (SalesTerritoryCountry)  /* Group by Sales Territory */
      , ()  /* Total Sales across all Territories */
    )

Provides Sales by Territory (Territory Grouping), and Sales across all Territories (Overall Total)

Partition Aligned Indexed Views

The indexed views on partitioned table are automatically updated when you switch in/out partitions on underlying partitioned table. In SQL 2005, you had to drop the indexed views to perform partition switch.