I need help to identify the reason for a performance issue.
Actually following query sits in a scalar-valued-function but that is not the reason for the issue since it needs the same time as query:
SELECT TOP (1) dd.gsp_part_price
FROM tabdata
INNER JOIN locsl
ON tabdata.fisl = locsl.idsl
INNER JOIN locgsp
ON locsl.figsp = locgsp.idgsp
INNER JOIN tabdatadetail dd
ON dd.fidata = tabdata.iddata
INNER JOIN tabsparepart sp
ON dd.fisparepart = sp.idsparepart
WHERE sp.sparepartname = '1270-3132'
AND dd.gsp_part_price IS NOT NULL
ORDER BY Claim_Submitted_Date DESC
This query returns the price immediately for most spareparts but some parts need several minutes to return NULL
. So this query returns 940 rows immediately for the same part as above where every gsp_part_price
is NULL
which is the reason for the NULL
above:
SELECT dd.*, sp.*
FROM tabdatadetail dd
INNER JOIN tabsparepart sp
on dd.fisparepart=sp.idsparepart
WHERE sp.sparepartname='1270-3132'
Why does the first query needs 3 minutes to return NULL
whereas the query below returns immediately the 940 rows? I must admit that i've still problems to interpret execution plans.
Note that the first query also returns immediately if i remove the IS NOT NULL CHECK
:
dd.gsp_part_price IS NOT NULL
All involved tables are linked via foreign-keys, i have tried to add an index on gsp_part_price
which had no effect.
Here's the query execution plan:
Edit: Here are the stats on the PK and FK (to parent table tabData
) indexes:
DBCC SHOW_STATISTICS('dbo.tabDataDetail','PK_tabDataDetail') WITH STAT_HEADER;
PK_tabDataDetail Nov 6 2013 10:04AM 64327191 387089 183 1 8 NO
DBCC SHOW_STATISTICS('dbo.tabDataDetail','IX_fiData') WITH STAT_HEADER;
IX_fiData Nov 6 2013 10:04AM 64327191 378150 198 0,2441889 4 NO
DBCC SHOW_STATISTICS('dbo.tabDataDetail','IX_GspPartPrice') WITH STAT_HEADER;
IX_GspPartPrice Dez 3 2013 12:17PM 64788799 64788799 200 0,0002433512 7,065503 NO
The last index was added just now as mentioned above. So the actual rows are 64788799
, why the great difference between the estimated and actual rows, also , why is there a difference between the query-plan's actual rows and the "real" actual rows?
Here's the complete plan: http://s000.tinyupload.com/index.php?file_id=86912774678831839436
Edit: from the comment of Mikael Eriksson:
I think your
top(1)
andorder by
is to blame here. The index
scan is using the index onClaim_Submitted_Date
to find the first
occurrence of a value as fast as possible. SQL Server thinks that you
will find such value after scanning 3000 rows. Perhaps that estimate
makes sense if you have 5000 rows intabSparePart
and you only want
one. The thing is that if there are no rows found or you have a spare
part that is just recently submitted, SQL Server will have to scan the
entire index before it figures out that there is no rows to be found.
Although i still don't understand why, it seems that this is true. If i remove the TOP 1
or change it to say TOP 1000
the result comes immediately. Does anybody have an idea what i can do? Because, if multiple rows are returned i just need the last price submitted.
Best Answer
The query takes a lot of time because it finds no rows.
The plan first is setup top pick one row from
tabSparePart
wheresparepartname = '1270-3132'
. It then moves along to the index scan of tabData. For each row fetched fromtabData
there is a clustered index seek inlocSL
followed by a clustered index seek intabDataDetail
which also checks for not null ongsp_part_price
and finally returns the values offiSparePart
andGSP_Part_Price
.fiSparePart
is then compared against theidSparePart
returned from the Index Seek intabSparePart
. If they are the same the query immediately terminates and return thegsp_part_price
. If they are not the same the index scan intabData
moves to the next record and all the checks are executed again. That keeps happening until there is a row returned or all rows intabData
have been processed.As it turns out you get a totally different query plan when you do top(1000).
You can use the
optimize for
hint to make SQL Server think you actually want 1000 rows when you only need one.Your scalar valued function would then look something like this: