Sql-server – IF EXISTS taking longer than embedded select statement

existsperformancesql server

When I run the following code it takes 22.5 minutes and does 106million reads. However, if I run just the inner select statement by itself it only takes 15 seconds and does 264k reads. As a side note, the select query returns no records.

Any idea why the IF EXISTS would make it run so much longer and do so many more reads? I also changed the select statement to do SELECT TOP 1 [dlc].[id] and I killed it after 2 minutes.

As a temporary fix I have changed it to do a count( * ) and assign that value to a variable @cnt. Then it does an IF 0 <> @cnt statement. But I thought EXISTS would be better, because if there were records returned in the select statement it would stop performing the scan/seeks once it found at least one record, whereas the count(*) will complete the full query. What am I missing?

IF EXISTS
   (SELECT [dlc].[ID]
   FROM TableDLC [dlc]
   JOIN TableD [d]
   ON [d].[ID] = [dlc].[ID]
   JOIN TableC [c]
   ON [c].[ID] = [d].[ID2]
   WHERE [c].[Name] <> [dlc].[Name])
BEGIN
   <do something>
END

Best Answer

Any idea why the IF EXISTS would make it run so much longer and do so many more reads? I also changed the select statement to do SELECT TOP 1 [dlc].[id] and I killed it after 2 minutes.

As I explained in my answer to this related question:

How (and why) does TOP impact an execution plan?

Using EXISTS introduces a row goal, where the optimizer produces an execution plan aimed at locating the first row quickly. In doing this, it assumes that the data is uniformly distributed. For example, if statistics show there are 100 expected matches in 100,000 rows, it will assume it will have to read only 1,000 rows to find the first match.

This will result in longer than expected execution times if this assumption turns out to be faulty. For example, if SQL Server chooses an access method (e.g. unordered scan) that happens to locate the first matching value very late on in the search, it could result in an almost complete scan. On the other hand, if a matching row happens to be found amongst the first few rows, performance will be very good. This is the fundamental risk with row goals - inconsistent performance.

As a temporary fix I have changed it to do a count( * ) and assign that value to a variable

It is usually possible to reformulate the query such that a row goal is not assigned. Without the row goal, the query can still terminate when the first matching row is encountered (if written correctly), but the execution plan strategy is likely to be different (and hopefully, more effective). Obviously, count(*) will require reading all rows, so it is not a perfect alternative.

If you are running SQL Server 2008 R2 or later, you can also generally use documented and supported trace flag 4138 to get an execution plan without a row goal. This flag can also be specified using the supported hint OPTION (QUERYTRACEON 4138), though be aware it requires runtime sysadmin permission, unless used with a plan guide.

Unfortunately

None of the above is functional with an IF EXISTS conditional statement. It only applies to regular DML. It will work with the alternate SELECT TOP (1) formulation you tried. That may well be better than using COUNT(*), which has to count all qualified rows, as previously mentioned.

That said, there are any number of ways to express this requirement that will allow you to avoid or control the row goal, while terminating the search early. One last example:

DECLARE @Exists bit;

SELECT @Exists =
    CASE
        WHEN EXISTS
        (
            SELECT [dlc].[ID]
            FROM TableDLC [dlc]
            JOIN TableD [d]
            ON [d].[ID] = [dlc].[ID]
            JOIN TableC [c]
            ON [c].[ID] = [d].[ID2]
            WHERE [c].[Name] <> [dlc].[Name]
        )
        THEN CONVERT(bit, 1)
        ELSE CONVERT(bit, 0)
    END
OPTION (QUERYTRACEON 4138);

IF @Exists = 1
BEGIN
    ...
END;

Related Solutions

Postgresql – Performance issue using view vs. direct statement

You might want to try using a window function for that:

create view vw_current_value as 
select id1, id2, value from (
  select id1, id2, value, 
      row_number() over (partition by id1, id2 order by value_at desc) rn
  from your_values_table
) t where t.rn = 1;

sqlfiddle

EDIT:

The above has an advantage of being standard SQL. However (thanks to @a_horse_with_no_name) Postgres also offers a proprietary alternative: DISTINCT ON(something), which reportedly performs better:

create view vw_current_value as 
select distinct on (id1, id2) id1, id2, value 
from your_values_table
order by id1, id2, value_at desc;

sqlfiddle 2

Comparison of the execution plans of the alternatives is left as an excercise for the reader.

Sql-server – Why does Veritas backup take longer through CMDEXEC than cmd

I can't tell you why with the information provided, but I can give you some guidance to figuring it out.

Monitor your wait statistics while the longer job is running. You should be able to figure out which session is running the backup, see what it is waiting on. sys.dm_exec_requests will tell you this (check the wait_type, and also look at blocking_session_id in case it is getting blocked), you'll just need to refresh often.

Or you could roll up your sleeves and run a targeted extended events session, or turn on trace flags that give more insight into slow backups (see here).

Best Answer

Unfortunately

Related Solutions

Postgresql – Performance issue using view vs. direct statement

Sql-server – Why does Veritas backup take longer through CMDEXEC than cmd

Related Question