Sql-server – Performance Tuning for Huge Table (SQL Server 2008 R2)

sql-server-2008-r2

Background:
I have a fact table in UAT Phase. Objective to load 5 yrs of data in Prod (expected size 400 Mn records). Currently it has only 2 years of data in Test.

Table Features:

No of Dimensions ~ 45
Measures ~ 30
Non-additive measures and other columns~ 25
Current data size ~ 200 Million (2 years data)
Time View: 3 different Month views: Fiscal/Calendar/Adjusted (i.e same row can fall in different months based on which view one is looking for)
Only One view will be required at a time by a user. (ie. only one Month Column will be used in the query, it's stopping us to do partitioning on time view)
Indexes: 1 Clustered Index on the Natural Keys (8 columns).Created 3 covering Non Clustered Indexes one on the each Month column including few Dimension SKs (FKs) and all the measures).
Indexes are huge(total 190 GB) because of this.
Space is not constraint (1 TB allocated)
64 GB of RAM available in server.
Table compression also done.

Requirement:
Queries on this Fact table should give result within 30 seconds (General queries select sum(measure) joining few Dims group by Dim Values). Reports are directly done on top of this Fact table.

Issue:
Any query which includes columns available in the Index works fine, but if we include any other columns which are not in the include..It sucks. It takes more than 5-10 minutes. Can any one suggest some solution where it works fine for any dimension/column we select. Can Index view help in this situation?

Best Answer

Upgrade to SQL Server 2012 and use columnstores. They thrive in these requirements. Seriously, download the evaluation edition and give it a try. Drop all indexes, drop the clustered index, simply add a non-clustered columnstore index on all columns and give it a whirl. I've seen cases just like your that reduced the execution time to 2-3 seconds, mostly because of segment elimination kicking in. Some supplemental reads:

Related Solutions

SQL Server Partitioning – Partitioning and Clustered Indexes in SQL Server 2008

A partitioned table is really more like a collection of individual tables stitched together. So your in example of clustering by IncidentKey and partition by IncidentDate, say that the partitioning function splits the tables into two partitions so that 1/1/2010 is in partition 1 and 7/1/2010 is partition two. The data will be layed out on disk as:

Partition 1:
IncidentKey    Date
ABC123        1/1/2010
ABC123        1/1/2011
XYZ999        1/1/2010

Partition 2:
IncidentKey    Date
ABC123        7/1/2010
XYZ999        7/1/2010

At a low level there really are two, distinct rowsets. Is the query processor that gives the illusion of a single table by creating plans that seek, scan and update all rowsets together, as one.

Any row in any non-clustered index will have have the clustered index key to which it corresponds, say ABC123,7/1/2010. Since the clustered index key always contains the partitioning key column, the engine will always know in what partition (rowset) of the clustered index to search for this value (in this case, in partition 2).

Now whenever you're dealing with partitioning you must consider if your NC indexes will be aligned (NC index is partitioned exactly the same as the clustered index) or non-aligned (NC index is non-partitioned, or partitioned differently from clustered index). Non-aligned indexes are more flexible, but they have some drawbacks:

non-aligned indexes require large amounts of memory for certain query plans
non-aligned indexes prevent efficient partition switch operations

Using aligned indexes solves these issues, but brings its own set of problems, because this physical, storage design, option ripples into the data model:

aligned indexes mean unique constrains can no longer be created/enforced (except for the partitioning column)
all foreign keys referencing the partitioned table must include the partitioning key in the relation (since the partitioning key is, due to alignment, in every index), and this in turn requires that all tables referencing the partitioned table contain partitioning key column value. Think Orders->OrderDetails, if Orders have OrderID but is partitioned by OrderDate, then OrderDetails must contain not only OrderID, but also OrderDate, in order to properly declare the foreign key constraint.

These effects I found seldom called out at the beginning of a project that deploys partitioning, but they exists and have serious consequences.

If you think aligned indexes are a rare or extreme case, then consider this: in many cases the cornerstone of ETL and partitioning solutions is the fast switch in of staging tables. Switch in operations require aligned indexes.

Oh, one more thing: all my argument about foreign keys and the ripple effect of adding the partitioning column value to other tables applies equally to joins.

Sql-server – Partition Key questions in SQL Server 2008

Assuming that you have the primary key on a clustered index then the partitioning key needs to be part of the primary key.

You will not loose the benefit of partitioning by joining to non-partitioned tables, providing that the queries are designed to make use of the partitioned table, for example the following query WILL benefit from partitioning

SELECT F.Col1, F.Col2, D.Col3
FROM Fact_Partitioned F
    INNER JOIN Dim_MyDim D ON F.Col1 = D.Col1
WHERE F.Col1 = 5

But the following query WILL NOT benefit from partition elimination

SELECT F.Col1, F.Col2, D.Col3
FROM Fact_Partitioned F
    INNER JOIN Dim_MyDim D ON F.Col1 = D.Col1
WHERE D.Col1 = 5

It is a subtle difference, but in the first query, the join key is filtered in the partitioned table, taking advantage of elimination and then joined to the dimension. In the second query, the key is filtered in the dimension and then joined against the whole of the fact table, rather than just required partitions.

It goes without saying that the partitioning key needs to be in the WHERE clause for elimination to work, otherwise SQL Server does not know which partition(s) the data is in.

Adding a filter criteria on the JOIN clause will not help you. It needs to be in the WHERE clause to benefit from elimination.

The Partition Key does not need to be part of a non-clustered index (NCI) but if the NCI is unique, then it needs to contain the partitioning key in order to align the index. This is where the NCI is built on the same partition scheme as the table. NCIs should also be partition aligned unless there is an exceedingly good reason not to. I have never come across a good enough reason!

Best Answer

Related Solutions

SQL Server Partitioning – Partitioning and Clustered Indexes in SQL Server 2008

Sql-server – Partition Key questions in SQL Server 2008

Related Question