Sql-server – Star join query optimization – Change partitioning, use columnstore

partitioningsql serverssis

I am trying to know the best possible way to improve performance of a query given to me by my client. It contains a few tables which are joined and one of table is called dwh.fac_sale_detail which contains 1.5 billion rows.

This table dwh.fac_sale_detail is partitioned based on one of its columns called TradingDateKey1. It actually stores data in yyyymmdd format but it is of INT Datatype.

This has TradingDateKeys from 2005 to 2015, but the partitions are created until year 2014 only.

One of the guys in another team advised the following and I am trying to follow his advice but I am new to creating or altering of partitions and don't know if this will actually make any difference to query performance:

What he said in his own words is "The FactSalesDetail table is currently about 1.5 Billion Rows and is currently partitioned on TradingDate into 10 partitions by year with about 150 Million Rows per partition. It would be a good idea to further partition the latest year into monthly partitions and apply a columnstore index on all the partitions. Applying the index on each partition will be a once-off and you should only need to maintain the index for the current partition going forward."

Here is the query plan for the query I am trying to optimize.

Please also see the attached screenshots for a better understanding:

enter image description here

enter image description here

Best Answer

Thanks for adding the query plan; it is very informative. I have a number of recommendations based on the query plan, but first a caveat: don't just take what I say and assume it's correct, try it out (ideally in your testing environment) first and make sure you understand why the changes do or don't improve your query!

The query plan: an overview

From this query plan (as well as the corresponding XML), we can immediately see a few useful pieces of information:

  • You are on SQL 2012
  • This is a classic star join query and you are getting the benefit of the in-row bitmap filter optimization that was added in SQL 2008 for such plans
  • The fact table contains about 1.5 billion rows, and just over 500 million of those rows match the dimension filters
  • The query requests 72GB of memory, but is only granted 12GB of memory (presumably, 12GB is the max that will be granted to any given query, meaning your machine likely has ~64GB of memory)
  • SQL Server is performing a sort-stream aggregate that takes 500 million rows down to just 600,000 rows. The sort is exceeding it's memory grant and spilling to tempdb
  • We have warnings for plan-affecting converts due to either explicit and implicit conversions in your query
  • The query uses 32 threads, but the initial seek into your fact table has an enormous thread skew; just 2 of the 32 threads do all of the work. (At subsequent steps in the query plan, however, the work is more balanced.)

enter image description here

Optimization: columnstore or not

This is a tough question, but on balance I would not recommend columnstore for you in this case. The primary reason is that you are on SQL 2012, so if you are able to upgrade to SQL 2014 I think it might be worth trying out columnstore.

In general, your query is the type that columnstore was designed for and could benefit greatly from the reduced I/O of columnstore and the greater CPU efficiency of batch mode.

However, the limitations of columnstore in SQL 2012 are just too great, and the tempdb spill behavior, where any spill will cause SQL Server to abandon batch mode entirely, can be a devastating penalty that might come into play with the large volumes of rows you are working with. If you do go with columnstore on SQL 2012, be prepared to baby-sit all of your queries very closely and ensure that batch mode can always be used.

Optimization: more partitions?

I don't think that more partitions will help this particular query. You are welcome to try it, of course, but keep in mind that partitioning is primarily a data management feature (the ability to swap in new data in your ETL processes via SWITCH PARTITION and not a performance feature. It can obviously help performance in some cases, but similarly it can hurt performance in others (e.g., lots of singleton seeks that now have to be performed once per partition).

If you do go with columnstore, I think that loading your data for better segment elimination will be more important than partitioning; ideally you will probably want as many rows in each partition as possible in order to have full columnstore segments and great compression rates.

Optimization: improving cardinality estimates

Because you have a huge fact table and a handful of very small (hundreds or thousands of rows) set of rows from each dimension table, I would recommend an approach where you explicitly create a temporary table containing only the dimension rows that you plan to use. For example, rather than join to Dim_Date with a complicated logic like cast(right(ALHDWH.dwh.Dim_Date.Financial_Year,4) as int) IN ( 2015, 2014, 2013, 2012, 2011 ), you should write a pre-proccessing query to extract only the rows from Dim_Date that you care about and add the appropriate PK to those rows.

This will allow SQL Server to create statistics on just the rows you are actually using, which may yield better cardinality estimates throughout the plan. Because this pre-processing would be such a trivial amount of work compared to the overall query complexity, I would highly recommend this option.

Optimization: reducing thread skew

It's likely that extracting the data from Dim_Date into it's own table and adding a primary key to that table would also help to reduce thread skew (an imbalance of work across threads). Here's a picture that helps show why:

enter image description here

In this case, the Dim_Date table has 22,000 rows, SQL Server estimated that you are going to use 7,700 of those rows, and you actually only used 1,827 of those rows.

Because SQL Server uses statistics in order to allocate ranges of rows to threads, the poor cardinality estimates in this case are likely the root cause of the very poor distribution of rows.

Thread skew on 1,872 rows may not matter much, but the painful point is that this then cascades down to the seek into your 1.5 billion row fact table, where we have 30 threads sitting idle while 600 million rows are being processed by 2 threads.

enter image description here

Optimization: getting rid of the sort spill

Another area I would focus on is the sort spill. I think that the primary problem in this case is poor cardinality estimates. As we can see below, SQL Server thinks that the grouping operation being performed by the combination of a Sort and Stream Aggregate will yield 324 million rows. However, it actually yields just 643,000 rows.

enter image description here

If SQL Server knew that so few rows would come out of this grouping, it would almost certainly use a HASH GROUP (Hash Aggregate) rather than a SORT GROUP (Sort-Stream) in order to implement your GROUP BY clause.

It's possible that this may fix itself if you make some of the other changes above in order to improve cardinality estimates. However, if it doesn't you could try to use the OPTION (HASH GROUP) query hint in order to force SQL Server to do so. This would let you evaluate the magnitude of the improvement and decide whether or not to use the query hint in production. I would generally be cautious about query hints, but specifying just HASH GROUP is a much lighter touch than something like using a join hint, using FORCE ORDER, or otherwise taking too much of the control out of the query optimizer's hands.

Optimization: memory grants

One last potential problem was that SQL Server estimated that the query would want to use 72GB of memory, but your server was not able to provision this much memory to the query. While it's technically true that adding more memory to the server would help, I think there are at least a couple other ways to attack this problem:

  • Get rid of the Sort operator (as described above); it's really the only operator consuming any substantial memory grant in your query
  • Split your query up into multiple batches; it may be the case that you can run the query once per partition, for example. This could reduce the size of the sort, keep it in memory, and potentially improve performance significantly. A side benefit could be that you might get better utilization of threads if you access just one partition since this does impact the way that SQL Server allocates threads to partitions in some cases.