Sql-server – Performance issue with TOP (1) on a nullable column

performancesql serversql-server-2005t-sql

I need help to identify the reason for a performance issue.

Actually following query sits in a scalar-valued-function but that is not the reason for the issue since it needs the same time as query:

SELECT TOP (1) dd.gsp_part_price 
FROM  tabdata 
       INNER JOIN locsl 
               ON tabdata.fisl = locsl.idsl 
       INNER JOIN locgsp 
               ON locsl.figsp = locgsp.idgsp 
       INNER JOIN tabdatadetail dd
               ON dd.fidata = tabdata.iddata 
       INNER JOIN tabsparepart sp 
               ON dd.fisparepart = sp.idsparepart 
WHERE  sp.sparepartname = '1270-3132' 
       AND dd.gsp_part_price IS NOT NULL 
ORDER BY Claim_Submitted_Date DESC

This query returns the price immediately for most spareparts but some parts need several minutes to return NULL. So this query returns 940 rows immediately for the same part as above where every gsp_part_price is NULL which is the reason for the NULL above:

SELECT dd.*, sp.* 
FROM tabdatadetail dd 
INNER JOIN tabsparepart sp 
    on dd.fisparepart=sp.idsparepart
WHERE sp.sparepartname='1270-3132'

Why does the first query needs 3 minutes to return NULL whereas the query below returns immediately the 940 rows? I must admit that i've still problems to interpret execution plans.

Note that the first query also returns immediately if i remove the IS NOT NULL CHECK:

dd.gsp_part_price IS NOT NULL

All involved tables are linked via foreign-keys, i have tried to add an index on gsp_part_price which had no effect.

Here's the query execution plan:

query execution plan

Edit: Here are the stats on the PK and FK (to parent table tabData) indexes:

DBCC SHOW_STATISTICS('dbo.tabDataDetail','PK_tabDataDetail') WITH STAT_HEADER;
PK_tabDataDetail    Nov  6 2013 10:04AM 64327191    387089  183 1   8   NO 

DBCC SHOW_STATISTICS('dbo.tabDataDetail','IX_fiData') WITH STAT_HEADER;
IX_fiData   Nov  6 2013 10:04AM 64327191    378150  198 0,2441889   4   NO 

DBCC SHOW_STATISTICS('dbo.tabDataDetail','IX_GspPartPrice') WITH STAT_HEADER;
IX_GspPartPrice Dez  3 2013 12:17PM 64788799    64788799    200 0,0002433512    7,065503    NO

The last index was added just now as mentioned above. So the actual rows are 64788799, why the great difference between the estimated and actual rows, also , why is there a difference between the query-plan's actual rows and the "real" actual rows?

Here's the complete plan: http://s000.tinyupload.com/index.php?file_id=86912774678831839436

Edit: from the comment of Mikael Eriksson:

I think your top(1) and order by is to blame here. The index
scan is using the index on Claim_Submitted_Date to find the first
occurrence of a value as fast as possible. SQL Server thinks that you
will find such value after scanning 3000 rows. Perhaps that estimate
makes sense if you have 5000 rows in tabSparePart and you only want
one. The thing is that if there are no rows found or you have a spare
part that is just recently submitted, SQL Server will have to scan the
entire index before it figures out that there is no rows to be found.

Although i still don't understand why, it seems that this is true. If i remove the TOP 1 or change it to say TOP 1000 the result comes immediately. Does anybody have an idea what i can do? Because, if multiple rows are returned i just need the last price submitted.

Best Answer

The query takes a lot of time because it finds no rows.

The plan first is setup top pick one row from tabSparePart where sparepartname = '1270-3132'. It then moves along to the index scan of tabData. For each row fetched from tabData there is a clustered index seek in locSL followed by a clustered index seek in tabDataDetail which also checks for not null on gsp_part_price and finally returns the values of fiSparePart and GSP_Part_Price. fiSparePart is then compared against the idSparePart returned from the Index Seek in tabSparePart. If they are the same the query immediately terminates and return the gsp_part_price. If they are not the same the index scan in tabData moves to the next record and all the checks are executed again. That keeps happening until there is a row returned or all rows in tabData have been processed.

As it turns out you get a totally different query plan when you do top(1000).

You can use the optimize for hint to make SQL Server think you actually want 1000 rows when you only need one.

Your scalar valued function would then look something like this:

CREATE FUNCTION [dbo].[FunctionName]()
RETURNS INT
AS
BEGIN

  DECLARE @TOP INT
  DECLARE @RET INT

  SET @TOP = 1

  SELECT TOP (@TOP) @RET = dd.gsp_part_price 
  FROM  tabdata 
          INNER JOIN locsl 
                  ON tabdata.fisl = locsl.idsl 
          INNER JOIN locgsp 
                  ON locsl.figsp = locgsp.idgsp 
          INNER JOIN tabdatadetail dd
                  ON dd.fidata = tabdata.iddata 
          INNER JOIN tabsparepart sp 
                  ON dd.fisparepart = sp.idsparepart 
  WHERE  sp.sparepartname = '1270-3132' 
          AND dd.gsp_part_price IS NOT NULL 
  ORDER BY Claim_Submitted_Date DESC 
  OPTION (OPTIMIZE FOR(@TOP = 1000))

  RETURN @RET

END

Best Answer

Related Solutions

Sql-server – Estimated vs. Actual rows and multi-column statistics

Sql-server – How (and why) does TOP impact an execution plan

Related Question