Sql-server – Optimize a query reducing logical reads

performancequery-performancesql serversql-server-2016

I have a SQL Server 2016 query that returns 127K rows. You can find the query and query plan here. Let me know if you also need tables structure.

I need to join with a table that has only 20 rows, which acts as a replacement for one of the products. In other words, I query products from a main table but, under certain conditions, some of them can be replaced by others.

Problem is that, for that simple table, I have 254K logical reads. I've tried LEFT JOIN and OUTER APPLY.

Any suggestion about how to replace this to avoid that amount of logical reads? Just to mention, only 1 product has a replacement.

Best Answer

Taking a closer look at the execution plan XML, notice these problematic statistics:

<Wait WaitType="ASYNC_NETWORK_IO" WaitTimeMs="1328" WaitCount="403"/> 

<QueryTimeStats CpuTime="353" ElapsedTime="1853"/>

The query spent 1.3 seconds waiting on the results to be consumed by the application. The query only ran for 1.8 seconds total. So the main problem here is that the application is consuming these 127k results row-by-row. The query itself runs fairly quickly.

Forrest McDaniel has a good blog post that demonstrates this problem: Two Easy ASYNC_NETWORK_IO Demos

The remainder of the answer addresses the "logical reads" portion of your question.

The reason for all of those logical reads on the OUTER APPLY'd table (DBVAREKT) is here:

The "Index Seek" there is executed once for each row on the upper input to the NESTED LOOPS join. So there are 127,329 seeks into that index (ID1), even though in the end only 302 matching rows are returned.

The optimizer wouldn't normally choose to do that many seeks into the index, but it only thought there would be 82 rows on the upper input. Doing 82 seeks is definitely more reasonable.

The general approach to solving this problem would be to avoid doing a NESTED LOOPS join on that particular table, since that is the source of the problem. To that end, you could use join hints, but it's a little hard for me to tell where the hints should be applied.

Randi mentioned a possible rewrite of the query that would place some of the data into a temp table, essentially breaking the query up into smaller chunks that the SQL Server optimizer can do a better job with. You could break this up at the OUTER APPLY as follows:

SELECT 
       1756,
       L.MADTYPE,
       DBM.VNR1
INTO #results
FROM dbo.STDORDRE S
INNER JOIN dbo.STDORD STO ON sto.DATO = s.DATO
                           AND sto.KUNDE = s.KUNDE
LEFT JOIN dbo.STDORDML L ON L.ONR = s.ONR
CROSS APPLY (SELECT dbo.MCS_ClarionDateToSQL(D.DATO) SQL_DATO,
                    DATEPART(dw, dbo.MCS_ClarionDateToSQL(D.DATO)) DP,  D.VNR1, D.DATO, D.LINE, D.KATALOGNR, D.VFAKTOR, D.MADTYPE FROM dbo.DBMENU D WHERE D.SNR = s.VARENR AND D.LINE = L.MENULINE) DBM
INNER JOIN dbo.DBKUNDE kun ON kun.NR = s.KUNDE
INNER JOIN dbo.DBKUNGRP dbk ON dbk.NR = kun.GRP
INNER JOIN dbo.DBVARE varm ON varm.NR = s.VARENR
INNER JOIN dbo.DBVARE var ON var.NR = DBM.VNR1
LEFT OUTER JOIN dbo.MENORDRE MEN ON MEN.KUNDE = s.KUNDE
                                AND MEN.DATO  = DBM.DATO
                                AND MEN.LINIE = DBM.LINE
                                AND MEN.NR    = s.MNR
WHERE 1 = 1
  AND ( kun.UDSKREVET = 0
                     OR ( kun.UDSKREVET = 1
                          AND kun.UDSDATO >= 79627 ))
                   AND varm.TYPE = 9
                   AND varm.KPFIX = 0
                   AND s.ML = 1
                   AND DBM.DATO BETWEEN 79627 AND 79777
                   AND sto.TYPE = 1;

SELECT 
       1756,
       L.MADTYPE,
       DBM.VNR1
FROM #results RES  
OUTER APPLY (SELECT MTY.PREC MTY_PREC, 
                    MTY.NR   MTY_NR ,
                    VAR1.PREC VAR1_PREC, 
                    var1.VAR_PKG_ID VAR1_VAR_PKG_ID,
                    VAR1.NR VAR1_NR ,
                    VAR1.TYPE VAR1_TYPE,
                    VAR1.GRP VAR1_GRP
             FROM dbo.DBVAREKT VKT
              LEFT JOIN dbo.DBVARE VAR1 ON VAR1.PREC = VKT.TO_VARE_PREC
                                               AND (ISNULL(VKT.MADTYPE,0) <> 0  )
              LEFT JOIN dbo.DBMTYPE MTY ON MTY.PREC = VKT.TO_MADTYPE_PREC
            WHERE 1 = 1
              AND VKT.MADTYPE = RES.MADTYPE
              AND VKT.VARENR  = RES.VNR1
                      ) OA;

Related Solutions

Sql-server – Logical reads amount difference with almost identical indexes

It all depends on the definitions and the key (and non-key) columns defined in the nonclustered index. The clustered index is the actual table data. Therefore it contains all of the data in the data pages, whereas the nonclustered index is only containing columns' data as defined in the index creation DDL.

Let's set up a test scenario:

use testdb;
go

if exists (select 1 from sys.tables where name = 'TestTable')
begin
    drop table TestTable;
end
create table dbo.TestTable
(
    id int identity(1, 1) not null
        constraint PK_TestTable_Id primary key clustered,
    some_int int not null,
    some_string char(128) not null,
    some_bigint bigint not null
);
go

create unique index IX_TestTable_SomeInt
on dbo.TestTable(some_int);
go

declare @i int;
set @i = 0;

while @i < 1000
begin
    insert into dbo.TestTable(some_int, some_string, some_bigint)
    values(@i, 'hello', @i * 1000);

    set @i = @i + 1;
end

So we've got a table loaded with 1000 rows, and a clustered index (PK_TestTable_Id) and a nonclustered index (IX_TestTable_SomeInt). As you've seen in your testing, but just for thoroughness:

set statistics io on;
set statistics time on;

select some_int
from dbo.TestTable -- with(index(PK_TestTable_Id));

set statistics io off;
set statistics time off;
-- nonclustered index scan (IX_TestTable_SomeInt)
-- logical reads: 4

Here we have a nonclustered index scan on the IX_TestTable_SomeInt index. We have 4 logical reads for this operation. Now let's force the clustered index to be used.

set statistics io on;
set statistics time on;

select some_int
from dbo.TestTable with(index(PK_TestTable_Id));

set statistics io off;
set statistics time off;
-- clustered index scan (PK_TestTable_Id)
-- logical reads: 22

Here with the clustered index scan we have 22 logical reads. Why? Here's why. It all matters on how many pages that SQL Server has to read in order to grab the entire result set. Get the average row count per page:

select
    object_name(i.object_id) as object_name,
    i.name as index_name,
    i.type_desc,
    ips.page_count,
    ips.record_count,
    ips.record_count / ips.page_count as avg_rows_per_page
from sys.dm_db_index_physical_stats(db_id(), object_id('dbo.TestTable'), null, null, 'detailed') ips
inner join sys.indexes i
on ips.object_id = i.object_id
and ips.index_id = i.index_id
where ips.index_level = 0;

Take a look at my result set of the above query:

enter image description here

As we can see here, there are an average of 50 rows per page on the leaf pages for the clustered index, and an average of 500 rows per page on the leaf pages for the nonclustered index. Therefore, in order to satisfy the query more pages need to be read from the clustered index.

SQL Server Optimization – New Query Slower Despite Fewer Logical Reads

The percentage costs on an execution plan are from the optimizer estimates, even when an actual execution plan produced. The actual execution plan does use the exact plan and include both the estimated rows and the actual row counts. Discrepancies between the row counts can be useful to determine how accurate the estimate was.

Somewhere between the subquery and comparing it to the column in derived, the optimizer wasn't able to correctly estimate how many rows would match. It guessed that there would be 18 rows from derived when there were actually over 220,000. An additional clue is the warning message Cardnality Estimate: CONVERT(nvarchar(35),[mssqlsystemresource].[sys].[spt_values].[name],0) on the SELECT node.

If you were to check the query run length with something else, such as STATISTICS TIME, I would expect them to be much closer, and likely the second query running faster.

Here's another plan analysis with a somewhat similar situation. (SQL Server Plan Explorer) (with hat tip to Kendra Little on how to fool the optimizer)

The estimate shows a 93%/7% cost split, but by looking at the actual CPU, time, or IO, the difference is not that extreme. IO is about 75%/25% and CPU is roughly 60%/40%. (I tried to come up with something more even, but wasn't able to.)

Best Answer

Related Solutions

Sql-server – Logical reads amount difference with almost identical indexes

SQL Server Optimization – New Query Slower Despite Fewer Logical Reads

Related Question