SQL Server – Identical Tables and Queries with Different Execution Times

performancequery-performancesql serversql server 2014

We have two tables with identical columns and indexing (No indexing at all, basically). We run the same query, which in case of the original table takes 5 seconds to run; in the case of the new table we let it run for 30 minutes and then killed the query.

We updated statistics, but that had no result. We tried a rebuild of the new table to see if defragmenting it would help, but that had no effect either.

On a hunch, we then exported both tables to the same database, in order to see whether or not anything would change there, but we have the exact same results.

I'm kind of baffled how this could be. To make things even more interesting, the original table contains more data than the new table, which should theoretically mean that the new table should finish the query faster.

Does anyone have a possible explanation? I've worked for years as a DBA (though not for a few years now), and frankly, I'm baffled as to why this could possibly happen.

In response to some of the comments, here is the table definition in question:

CREATE TABLE [dbo].[Fact_SubscriptionDetail_Test](
[CustomerSellTo_Key] [int] NULL,
[CustomerContactSellTo_Key] [int] NULL,
[CustomerRefTo_Key] [int] NULL,
[WebAuthUser_Key] [int] NULL,
[Country_Key] [int] NULL,
[Date_Key] [int] NULL,
[SnapshotDate_Key] [int] NULL,
[Product_Key] [int] NULL,
[License_Key] [int] NULL,
[Subscription_Key] [int] NULL,
[SubscriptionTypeLostSeatsType_Key] [int] NULL,
[M_SubscriptionDetail_NewSeats] [int] NULL,
[M_SubscriptionDetail_WinbackSeats] [int] NULL,
[M_SubscriptionDetail_RenewedFlexSeats] [int] NULL,
[M_SubscriptionDetail_RenewedCommitSeats] [int] NULL,
[M_SubscriptionDetail_RenewedSeats] [int] NULL,
[M_SubscriptionDetail_ActiveSeatsEndMonth] [int] NULL,
[M_SubscriptionDetail_LostNonPaymentSeats] [int] NULL,
[M_SubscriptionDetail_LostGraceInactiveSeats] [int] NULL,
[M_SubscriptionDetail_LostOtherSeats] [int] NULL,
[M_SubscriptionDetail_LostSeats] [int] NULL,
[M_SubscriptionDetail_GrossBookings] [numeric](38, 20) NULL,
[M_SubscriptionDetail_ActiveFlexSeats] [int] NULL,
[M_SubscriptionDetail_ActiveCommitSeats] [int] NULL,
[M_SubscriptionDetail_ActiveSeats] [int] NULL,
[DateCreated] [datetime] NULL 
    CONSTRAINT [DF__Fact_Subs__DateC__gtfhjCC]  DEFAULT (getdate())
) ON [DATA]

Data in the two tables is not entirely identical, but very similar in nature. The query we run on it ties to some other tables (primarily Data warehouse dimensions).

I defragged (which I agree, doesn't make much sense for a heap, but I figured it wouldn't and couldn't hurt) by simply running

ALTER TABLE Fact_SubscriptionDetail_Test REBUILD. 

When we initially checked the query plan, it suggested adding some indexes on the Test table (but not the original, fast table). We also attempted adding a clustered index (PK) on the Test table, but neither that, nor the suggested index by the Execution Plan had any effect.

Here are the execution plans:

http://pastebin.com/XagvSxjj (Original table; fast)
http://pastebin.com/LSgCsvUe (New table; slow)

There is a difference in the execution paths, making me think the cardinality of the data in the original table is somewhat better. The original table contains some 1.4 mio rows (230 MB), of which 400k are processed by the query. The new table contains 400k rows (52 MB).

This is the full query: http://pastebin.com/bYiaGW1d (slightly edited to remove some sensitive stuff).

The value for "cost treshold for parallellism" on the server is 5.

Best Answer

...the original table contains more data than the new table, which should theoretically mean that the new table should finish the query faster.

If the execution plans were the same, this would most likely be true, but they're not. The number of rows expected (and the distribution of the data according to the statistics) affects the strategy chosen by the query optimizer.

Small table

When the heap table contains 398,399 rows, the optimizer chooses a serial plan using nested loops for all join operations except those directly affecting the heap table; those joins employ a hash join.

Small table plan

The complexity of the query is such that cardinality (row count) estimates are likely to be inaccurate, so the nested loop strategy ends up being a disaster. The optimizer considered a parallel plan, but rejected it as more expensive than the serial nested loops option.

Larger table

When the heap table contains 1,750,640 rows, changes in cost estimations mean the optimizer assesses that a parallel plan using hash joins and optimized bitmap filters extensively will be a better strategy.

Larger plan

This plan shape is much more resilient to cardinality estimation errors. The hash join build inputs may spill to disk, but the worst case is much better than executing an entire subtree a huge number of times (using nested loops).

Solution 1

If you know that a parallel plan with bitmap filters is the best choice in general, you could use a Plan Guide to enforce this, or encourage such a plan with an OPTION (HASH JOIN, MERGE JOIN) query hint. You may not always get parallelism if the tables are small, but performance should still improve (and be more predictable in general).

Solution 2

You could also explore breaking the query into simpler sections, initially along the lines of the Common Table Expression boundaries. Materializing sensible-sized intermediate results into temporary tables means:

  • Each part of the query can be optimized separately
  • The optimizer has statistics and accurate cardinality information
  • Additional indexes can be added to the temporary tables if proven to be beneficial
  • Redundant CTE evaluations can be skipped