I have a query which contains a subquery. When executed on a local copy of my database it runs as normal and executes within < 1 second.
I've noticed however that if I run this same query on our server the inner query behaves like a correlated subquery, taking around 90 seconds, and the predicate applied to a clustered index scan for data which relates to the subquery appears to be incorrect – it contains the alias used in the outer query, even though all aliases used in the inner query are different, and instead of 1 execution for the index scan it shows hundreds.
I've noticed that this behaviour only applies when the query executes serially – if I lower the cost threshold for parallelism
on the server and it goes parallel then this behaviour does not appear and the query runs correctly. Notably, the query runs correctly in my local environment whether serial or parallel, and the predicate applied there looks fine.
Both the local environment and the server are running SQL Server 2016 SP1 CU7 (Enterprise on the server, Developer locally), and I've looked at the different SET
options and verified that they are using the same compatibility level (SQL 2012).
Is there anything that would explain the difference of behaviour here?
The problem is not that the plan is different, it's that it's not a correlated subquery and is being executed as one – the inner query is incorrectly referencing the outer query when the query executes serially, but not when it goes parallel on the same server.
The server is set to 2012 compatibility mode because when we upgraded, the new cardinality estimator caused some problems with a few queries and we haven't gotten around to fixing those yet so have remained on 2012 compatibility mode to use the legacy cardinality estimator.
All 4 plans can be viewed here.
Looking at the query and then at the clustered index scan on the deposit payment table, this shouldn't be using the predicate it is using as it references the outer query, which unless I'm misunderstanding, is incorrect. It also shouldn't be executing that operator the 200+ times that it is.
On the same server, if I lower the cost threshold for parallelism
so that it goes parallel, it applies a different predicate to that operator where it does not reference the outer table and does not behave like a correlated subquery. This is the same behaviour I see if I run the query locally (either running serially or parallel).
Best Answer
The big difference in the serial plan that I can see is this expensive
TOP
operator that has been introduced by the optimizer:The
TOP
is probably there due to row goals, there is a 4-part series on this at SQLPerformance.com:(Part 1 and maybe part 2 are relevant for this query.)
I can't confirm for sure in your case because you're using 2016 and the showplan properties that expose row goal information weren't added until 2017 CU3 (and were added in 2016 SP2, just released this week). You can see them if you show the output from running the original, serial query on the server adding the following (taken from part 1 of the series above):
Estimates are different all over the place, so that tells me that data and/or stats are different, but probably mostly stats since local and server both give the same actual rows.
I suggest a rewrite, changing
IN
toEXISTS
, simplifying the subquery (a lot of redundant work there), and removing theGROUP BY
(which is unnecessary busy work, and may be the culprit here):Updating stats can't hurt, but let us know if the rewrite helps. You could also try hinting the newer cardinality estimator using
OPTION (QUERYTRACEON 2312);
and in case you are hit by anything that has since been fixed in the optimizer you can also try 4199.Leaving the local variance out, the two plans on the server have the same predicate on the scan of
deposit_payment
, except that the parallel plan uses a probe forid_prov
. So I really don't think the predicate has anything to do with any performance differences:The only difference between server and local is that the local serial plan handles the
id_prov
filter later, in a hash match. Exercise to the reader to determine if that's better or worse. :-)I would focus on writing simpler queries without unnecessary and/or redundant logic (and perhaps introducing a supporting index on
deposit_payment
to help this type of query without a scan/predicate) rather than trying to discover bugs. :-)