Sql-server – Why does the query end up with two seeks instead of one and how to fix that

azure-sql-databaseexecution-planperformancesql server

I have these two tables:

CREATE TABLE [TaskItems] (
    [ItemId]             UNIQUEIDENTIFIER NOT NULL UNIQUE PRIMARY KEY,
    [LastOperationTime]  DATETIME NULL DEFAULT GETUTCDATE(),
    [OwnerId]            UNIQUEIDENTIFIER NOT NULL
    -- more columns which are irrelevant
)

CREATE INDEX [TaskItemsByOwnerIdAndLastOpTimeIndex]
     ON [TaskItems](OwnerId, LastOperationTime)

CREATE TABLE [TaskOwners] (
    [OwnerId]         UNIQUEIDENTIFIER UNIQUE NOT NULL DEFAULT NEWID() UNIQUE PRIMARY KEY
    -- more columns which are irrelevant
)

and this query:

SELECT
  COUNT( OwnerId ),
  SUM(CASE WHEN LastOperationTime > DATEADD(day, -3, GETUTCDATE()) THEN 1 ELSE 0 END), 
  SUM(CASE WHEN LastOperationTime > DATEADD(day, -5, GETUTCDATE()) THEN 1 ELSE 0 END)
FROM 
(SELECT
  O.OwnerId,
  MAX( LastOperationTime) as LastOperationTime
    FROM
      TaskOwners AS O 
      LEFT OUTER JOIN TaskItems AS TI ON O.OwnerId = TI.OwnerId
  GROUP BY O.OwnerId) AS Whatever

Note that LastOperationTime is declared NULL which is a design error but fixing it would require recreating the index which is kind of problematic in production database. Anyway at any given moment there're no items which have LastOperationTime set to null.

TaskOwner table contains about 25 thousand rows. TaskItems contains about ten million rows and some rare "owners" have a million or so "items" owned while most of the "owners" own no "items" or some of them own hundreds or thousand "items".

So I run the query and request the actual execution plan and it has two index seeks (the two tree items in the lower right).

Plan from Azure Management Portal annotated

One of those has "object" TaskItemsByOwnerIdAndLastOpTimeIndex, "predicate" TaskObjects.LastOperationTime IS NULL and takes about 49% time. The other one has "object" TaskItemsByOwnerIdAndLastOpTimeIndex, "predicate" TaskObjects.LastOperationTime IS NOT NULL and takes 47% time. Both have "seek predicates" TaskObjects.OwnerId==TaskOwners.OwnerId and "backwards" direction in the plan tree. Both have "execution count" equal to the number of rows in the TaskOwners table.

The results of those seeks are then filtered (TOP(1)), concatenated, aggregated and fed into the JOIN which is implemented using "nested loops".

The question is why those two seeks are there (instead of one scan). Why does the query engine suddenly care so much about NULL vs NOT NULL that it initiates two separate queries? How do I get rid of those?

Best Answer

Try:

LEFT OUTER JOIN TaskItems AS TI ON O.OwnerId = TI.OwnerId
AND LastOperationTime IS NOT NULL

The engine could be grabbing the Null row separately, as it's a slightly different scenario to not finding one. But as you essentially want the same behaviour, just out the explicit filter in there, within the ON clause of your LEFT JOIN. Then the only NULLs will be as a result of the outer join.