Sql-server – IF EXISTS taking longer than embedded select statement

existsperformancesql server

When I run the following code it takes 22.5 minutes and does 106million reads. However, if I run just the inner select statement by itself it only takes 15 seconds and does 264k reads. As a side note, the select query returns no records.

Any idea why the IF EXISTS would make it run so much longer and do so many more reads? I also changed the select statement to do SELECT TOP 1 [dlc].[id] and I killed it after 2 minutes.

As a temporary fix I have changed it to do a count( * ) and assign that value to a variable @cnt. Then it does an IF 0 <> @cnt statement. But I thought EXISTS would be better, because if there were records returned in the select statement it would stop performing the scan/seeks once it found at least one record, whereas the count(*) will complete the full query. What am I missing?

IF EXISTS
   (SELECT [dlc].[ID]
   FROM TableDLC [dlc]
   JOIN TableD [d]
   ON [d].[ID] = [dlc].[ID]
   JOIN TableC [c]
   ON [c].[ID] = [d].[ID2]
   WHERE [c].[Name] <> [dlc].[Name])
BEGIN
   <do something>
END

Best Answer

Any idea why the IF EXISTS would make it run so much longer and do so many more reads? I also changed the select statement to do SELECT TOP 1 [dlc].[id] and I killed it after 2 minutes.

As I explained in my answer to this related question:

How (and why) does TOP impact an execution plan?

Using EXISTS introduces a row goal, where the optimizer produces an execution plan aimed at locating the first row quickly. In doing this, it assumes that the data is uniformly distributed. For example, if statistics show there are 100 expected matches in 100,000 rows, it will assume it will have to read only 1,000 rows to find the first match.

This will result in longer than expected execution times if this assumption turns out to be faulty. For example, if SQL Server chooses an access method (e.g. unordered scan) that happens to locate the first matching value very late on in the search, it could result in an almost complete scan. On the other hand, if a matching row happens to be found amongst the first few rows, performance will be very good. This is the fundamental risk with row goals - inconsistent performance.

As a temporary fix I have changed it to do a count( * ) and assign that value to a variable

It is usually possible to reformulate the query such that a row goal is not assigned. Without the row goal, the query can still terminate when the first matching row is encountered (if written correctly), but the execution plan strategy is likely to be different (and hopefully, more effective). Obviously, count(*) will require reading all rows, so it is not a perfect alternative.

If you are running SQL Server 2008 R2 or later, you can also generally use documented and supported trace flag 4138 to get an execution plan without a row goal. This flag can also be specified using the supported hint OPTION (QUERYTRACEON 4138), though be aware it requires runtime sysadmin permission, unless used with a plan guide.

Unfortunately

None of the above is functional with an IF EXISTS conditional statement. It only applies to regular DML. It will work with the alternate SELECT TOP (1) formulation you tried. That may well be better than using COUNT(*), which has to count all qualified rows, as previously mentioned.

That said, there are any number of ways to express this requirement that will allow you to avoid or control the row goal, while terminating the search early. One last example:

DECLARE @Exists bit;

SELECT @Exists =
    CASE
        WHEN EXISTS
        (
            SELECT [dlc].[ID]
            FROM TableDLC [dlc]
            JOIN TableD [d]
            ON [d].[ID] = [dlc].[ID]
            JOIN TableC [c]
            ON [c].[ID] = [d].[ID2]
            WHERE [c].[Name] <> [dlc].[Name]
        )
        THEN CONVERT(bit, 1)
        ELSE CONVERT(bit, 0)
    END
OPTION (QUERYTRACEON 4138);

IF @Exists = 1
BEGIN
    ...
END;