SQL Server – Improving Queries Waiting on PAGEIOLATCH_SH

index-tuningsql-server-2000

As an 'accidental DBA' of my company, one of my tasks is to identify possible performance problems of our ERP system. For complex reasons, the database back-end has to be SQL Server 2000.

From Windows' perflog, I can see that the storage is being hammered. So, I tried 'drilling down' the cause. I've installed Solarwinds DPA (formerly Confio Ignite), and after letting it run for several days, identified two queries with 'extreme' wait times.

Here's the first query; its problem is PAGEIOLATCH_SH taking nearly 100% of its wait:

(@P1 varchar(1000),@P2 varchar(1000),@P3 datetime,@P4 datetime,@P5 varchar(1000)) 
SELECT
  -- A HUGE list of columns here 
FROM INVENTTRANS A(INDEX(I_177DIMIDIDX))  
WHERE ((DATAAREAID=@P1)  
AND ((((ITEMID=@P2)  
AND (DATEPHYSICAL>=@P3))  
AND (DATEPHYSICAL<=@P4))  
AND (INVENTDIMID=@P5)))  
ORDER BY A.DATAAREAID, 
   A.ITEMID, 
   A.DATEPHYSICAL, 
   A.INVENTDIMID, 
   A.VOUCHERPHYSICAL OPTION(FAST 20)

Here's the second query; its problem is Memory/CPU wait:

(@P1 varchar(1000),@P2 varchar(1000),@P3 datetime) 
SELECT SUM(A.NOMINAL)  
FROM SMI_PL_HEADER A(INDEX(I_30301CUSTGROUPIDX))  
WHERE ((DATAAREAID=@P1)  
AND ((CUSTACCOUNT=@P2)  
AND (DEADLINEDELIVERY=@P3))) OPTION(FAST 2)

Here are the columns indexed by I_177DIMIDIDX (according to SQL Server Management Studio):

  • DATAAREAID
  • INVENTDIMID
  • ITEMID

And here are the columns indexed by I_30301CUSTGROUPIDX:

  • DATAAREAID
  • ITEMGROUP
  • CUSTACCOUNT

Now, my questions:

  1. I have a feeling that the wait time can be significantly reduced by recreating the indexes to INCLUDE the columns left out. Is my hunch right?
  2. How best to create such index? I don't really understand the differences between 'clustered' and 'nonclustered' indexes.

Sorry for not updating sooner; weekend got into the plan 🙂

Anyways, the execution plan for the first query:

Sort(ORDER BY:([A].[DATEPHYSICAL] ASC, [A].[VOUCHERPHYSICAL] ASC))
  EstimateRows=1.0, EstimateIO=0.011261261, EstimateCPU=1.0048E-4, AverageRowSize=480, TotalSubtreeCost=0.014695659, Parallel=false, EstimateExecutions=1.0

  |--Filter(WHERE:([A].[DATEPHYSICAL]<=Convert('DUMMY_VAL') AND [A].[DATEPHYSICAL]>=Convert('DUMMY_VAL')))
       EstimateRows=1.0, EstimateCPU=8.8E-7, AverageRowSize=487, TotalSubtreeCost=0.0033338175, Parallel=false, EstimateExecutions=1.0

       |--Bookmark Lookup(BOOKMARK:([Bmk1000]), OBJECT:([INVENTTRANS] AS [A]))
            EstimateRows=1.0, EstimateIO=4.882658E-5, EstimateCPU=1.1E-6, AverageRowSize=487, TotalSubtreeCost=0.0033329376, Parallel=false, EstimateExecutions=1.0

            |--Index Seek(OBJECT:([INVENTTRANS].[I_177DIMIDIDX] AS [A]), SEEK:([A].[DATAAREAID]='DUMMY_VAL' AND [A].[INVENTDIMID]='DUMMY_VAL' AND [A].[ITEMID]='DUMMY_VAL') ORDERED FORWARD)
                 EstimateRows=1.0, EstimateIO=0.003203401, EstimateCPU=7.961E-5, AverageRowSize=49, TotalSubtreeCost=0.003283011, Parallel=false, EstimateExecutions=1.0

and the execution plan for the second query:

Compute Scalar(DEFINE:([Expr1001]=If ([Expr1007]=0) then NULL else [Expr1008]))
  EstimateRows=1.0, EstimateCPU=2.5E-7, AverageRowSize=24, TotalSubtreeCost=0.0033404902, Parallel=false, EstimateExecutions=1.0

  |--Stream Aggregate(DEFINE:([Expr1007]=Count(*), [Expr1008]=SUM([A].[NOMINAL])))
       EstimateRows=1.0, EstimateCPU=2.5E-7, AverageRowSize=24, TotalSubtreeCost=0.0033404902, Parallel=false, EstimateExecutions=1.0

       |--Filter(WHERE:([A].[DEADLINEDELIVERY]=Convert('DUMMY_VAL')))
            EstimateRows=1.0, EstimateCPU=4.8E-7, AverageRowSize=214, TotalSubtreeCost=0.00334024, Parallel=false, EstimateExecutions=1.0

            |--Bookmark Lookup(BOOKMARK:([Bmk1000]), OBJECT:([SMI_PL_HEADER] AS [A]))
                 EstimateRows=1.0, EstimateIO=4.882658E-5, EstimateCPU=1.1E-6, AverageRowSize=214, TotalSubtreeCost=0.00333976, Parallel=false, EstimateExecutions=1.0

                 |--Index Seek(OBJECT:([SMI_PL_HEADER].[I_30301CUSTGROUPIDX] AS [A]), SEEK:([A].[DATAAREAID]='DUMMY_VAL'),  WHERE:([A].[CUSTACCOUNT]='DUMMY_VAL') ORDERED FORWARD)
                      EstimateRows=1.0, EstimateIO=0.003203401, EstimateCPU=8.40272E-5, AverageRowSize=38, TotalSubtreeCost=0.0032874283, Parallel=false, EstimateExecutions=1.0

Best Answer

Ideas to get you started:

  1. This is an excellent idea, and the one I'd go for first. Except, SQL Server 2000 does not support INCLUDEd columns in indexes.

  2. Clustered/non-clustered indexes is a different discussion entirely - to put it simply, a clustered index is actually the physical storage "order" of the data (the index is the data), while a non-clustered index stands alone and references the data by pointers. You can think of it as a library, where the clustered index is the actual bookshelves (ordered by type, author, title) and a non-clustered index is just a "rolodex". If you design a non-clustered index so it already contains all the information your query needs (a so-called "covering index"), you won't have to make the lookup-trip to the data, which radically improves query performance.

I would start by looking at why there are explicit index hints.

FROM INVENTTRANS A(INDEX(I_177DIMIDIDX))  

.. tells SQL Server to always choose the index I_177DIMIDIDX, which can absolutely kill your query if the index doesn't properly cover all the data you want to retrieve, including the columns INVENTDIMID, DATEPHYSICAL, as well as the huge column list in the SELECT clause and the columns in ORDER BY.

Right off the bat, I would try creating indexes like these to try to resolve the problem.

CREATE INDEX ... ON INVENTTRANS
    (DATAAREAID, ITEMID, INVENTDIMID, DATEPHYSICAL)

A unique index may be even better, if that's possible by adding for instance VOUCHERPHYSICAL. You may still have a problem with the fact that the huge list of selected columns means that the database will have to "join" the index with the (clustered) table to look up those columns, but give it a try.

The second table is simpler, mostly due to the smaller number of search/output columns:

CREATE INDEX ... ON SMI_PL_HEADER
    (DATAAREAID, CUSTACCOUNT, DEADLINEDELIVERY)
    --- INCLUDE (NOMINAL)

Also adding NOMINAL to the indexed columns (you can't INCLUDE it) is certainly not pretty, but if you're desperate, it'll probably help you.