Sql-server – Index seek much slower with OR condition compared with separate SELECTs

execution-planperformancequery-performancesql serversql-server-2008

Based on these questions and the answers given:

SQL 2008 Server – performance loss possibly connected with a very large table

Large table with historical data allocates too much of SQL Server 2008 Std. memory – performance loss for other databases

I have a table in a database SupervisionP defined like this:

CREATE TABLE [dbo].[PenData](
    [IDUkazatel] [smallint] NOT NULL,
    [Cas] [datetime2](0) NOT NULL,
    [Hodnota] [real] NULL,
    [HodnotaMax] [real] NULL,
    [HodnotaMin] [real] NULL,
 CONSTRAINT [PK_Data] PRIMARY KEY CLUSTERED 
(
    [IDUkazatel] ASC,
    [Cas] ASC
)WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY]
) ON [PRIMARY]

ALTER TABLE [dbo].[PenData]  WITH NOCHECK ADD  CONSTRAINT [FK_Data_Ukazatel] FOREIGN KEY([IDUkazatel])
REFERENCES [dbo].[Ukazatel] ([IDUkazatel])

ALTER TABLE [dbo].[PenData] CHECK CONSTRAINT [FK_Data_Ukazatel]

It contains cca 211 milion rows.

I run following statement:

DECLARE @t1 DATETIME;
DECLARE @t2 DATETIME;

SET @t1 = GETDATE();
SELECT min(cas) from PenData p WHERE IDUkazatel=24
SELECT min(cas) from PenData p WHERE IDUkazatel=25
SET @t2 = GETDATE();
SELECT DATEDIFF(millisecond,@t1,@t2) AS elapsed_ms;


SET @t1 = GETDATE();
SELECT min(cas) from PenData p WHERE IDUkazatel=24 OR IDUkazatel=25 
SET @t2 = GETDATE();
SELECT DATEDIFF(millisecond,@t1,@t2) AS elapsed_ms;

The result is shown here:

Execution plan

The third SELECT also loads much more data into SQL Server memory cache.

Why is the third SELECT so much slower (8.5 s) then the first two SELECTs(16 ms)?
How can I improve performance of the third select with OR? I want to run following SQL command but it seems to me that creating cursor and running separate queries is much faster than a single select in this case.

 SELECT MIN(cas) from PenData p WHERE IDUkazatel IN (SELECT IDUkazatel FROM  ...)

EDIT

As David suggested I have hovered over the fat arrow:

FatArrow

Best Answer

For the first two queries all it has to do is scan in the clustered index to the first entry for that value of IDUkazatel - because of the order of the index that row will be the lowest value for cas for that value of IDUkazatel.

In the second query this optimisation is not value and it is probably seeking to the first row for IDUkazatel=24 then scanning down the index until the last row with IDUkazatel=25 to find the minimum value of cas over all those rows.

If you hover over that fat arrow you'll see it is reading many rows (certainly all those for 24, probably all those for 25 too), whereas the thin arrows in the plan output for the other two show the top action causing it to only consider one row.

You could try run each query and then get the minimum for the minimums found:

SELECT MIN(cas)
FROM   (
        SELECT cas=MIN(cas) FROM PenData p WHERE p.IDUkazatel = 24
        UNION ALL
        SELECT cas=MIN(cas) FROM PenData p WHERE p.IDUkazatel = 25
    ) AS minimums

That said, it seems you have a table with IDUkazatel values rather than an explicit OR clause. The code below will work with that arrangement, simply replace the table name @T with the name of the table containing IDUkazatel values:

SELECT 
    MinCas = MIN(CA.PartialMinimum)
FROM @T AS T
CROSS APPLY 
(
    SELECT 
        PartialMinimum = MIN(PD.Cas)
    FROM dbo.PenData AS PD
    WHERE 
        PD.IDUkazatel = T.IDUkazatel
) AS CA;

In an ideal world, the SQL Server query optimizer would perform this rewrite for you, but it does not always consider this option today.

Best Answer

Related Solutions

Sql-server – Inserting and updating and selecting at thousands of times per second

Sql-server – Parent-Child Tree Hierarchical ORDER

Related Question