SQL Server – Improving Queries Waiting on PAGEIOLATCH_SH

index-tuningsql-server-2000

As an 'accidental DBA' of my company, one of my tasks is to identify possible performance problems of our ERP system. For complex reasons, the database back-end has to be SQL Server 2000.

From Windows' perflog, I can see that the storage is being hammered. So, I tried 'drilling down' the cause. I've installed Solarwinds DPA (formerly Confio Ignite), and after letting it run for several days, identified two queries with 'extreme' wait times.

Here's the first query; its problem is PAGEIOLATCH_SH taking nearly 100% of its wait:

(@P1 varchar(1000),@P2 varchar(1000),@P3 datetime,@P4 datetime,@P5 varchar(1000)) 
SELECT
  -- A HUGE list of columns here 
FROM INVENTTRANS A(INDEX(I_177DIMIDIDX))  
WHERE ((DATAAREAID=@P1)  
AND ((((ITEMID=@P2)  
AND (DATEPHYSICAL>=@P3))  
AND (DATEPHYSICAL<=@P4))  
AND (INVENTDIMID=@P5)))  
ORDER BY A.DATAAREAID, 
   A.ITEMID, 
   A.DATEPHYSICAL, 
   A.INVENTDIMID, 
   A.VOUCHERPHYSICAL OPTION(FAST 20)

Here's the second query; its problem is Memory/CPU wait:

(@P1 varchar(1000),@P2 varchar(1000),@P3 datetime) 
SELECT SUM(A.NOMINAL)  
FROM SMI_PL_HEADER A(INDEX(I_30301CUSTGROUPIDX))  
WHERE ((DATAAREAID=@P1)  
AND ((CUSTACCOUNT=@P2)  
AND (DEADLINEDELIVERY=@P3))) OPTION(FAST 2)

Here are the columns indexed by I_177DIMIDIDX (according to SQL Server Management Studio):

DATAAREAID
INVENTDIMID
ITEMID

And here are the columns indexed by I_30301CUSTGROUPIDX:

DATAAREAID
ITEMGROUP
CUSTACCOUNT

Now, my questions:

I have a feeling that the wait time can be significantly reduced by recreating the indexes to INCLUDE the columns left out. Is my hunch right?
How best to create such index? I don't really understand the differences between 'clustered' and 'nonclustered' indexes.

Sorry for not updating sooner; weekend got into the plan 🙂

Anyways, the execution plan for the first query:

Sort(ORDER BY:([A].[DATEPHYSICAL] ASC, [A].[VOUCHERPHYSICAL] ASC))
  EstimateRows=1.0, EstimateIO=0.011261261, EstimateCPU=1.0048E-4, AverageRowSize=480, TotalSubtreeCost=0.014695659, Parallel=false, EstimateExecutions=1.0

  |--Filter(WHERE:([A].[DATEPHYSICAL]<=Convert('DUMMY_VAL') AND [A].[DATEPHYSICAL]>=Convert('DUMMY_VAL')))
       EstimateRows=1.0, EstimateCPU=8.8E-7, AverageRowSize=487, TotalSubtreeCost=0.0033338175, Parallel=false, EstimateExecutions=1.0

       |--Bookmark Lookup(BOOKMARK:([Bmk1000]), OBJECT:([INVENTTRANS] AS [A]))
            EstimateRows=1.0, EstimateIO=4.882658E-5, EstimateCPU=1.1E-6, AverageRowSize=487, TotalSubtreeCost=0.0033329376, Parallel=false, EstimateExecutions=1.0

            |--Index Seek(OBJECT:([INVENTTRANS].[I_177DIMIDIDX] AS [A]), SEEK:([A].[DATAAREAID]='DUMMY_VAL' AND [A].[INVENTDIMID]='DUMMY_VAL' AND [A].[ITEMID]='DUMMY_VAL') ORDERED FORWARD)
                 EstimateRows=1.0, EstimateIO=0.003203401, EstimateCPU=7.961E-5, AverageRowSize=49, TotalSubtreeCost=0.003283011, Parallel=false, EstimateExecutions=1.0

and the execution plan for the second query:

Compute Scalar(DEFINE:([Expr1001]=If ([Expr1007]=0) then NULL else [Expr1008]))
  EstimateRows=1.0, EstimateCPU=2.5E-7, AverageRowSize=24, TotalSubtreeCost=0.0033404902, Parallel=false, EstimateExecutions=1.0

  |--Stream Aggregate(DEFINE:([Expr1007]=Count(*), [Expr1008]=SUM([A].[NOMINAL])))
       EstimateRows=1.0, EstimateCPU=2.5E-7, AverageRowSize=24, TotalSubtreeCost=0.0033404902, Parallel=false, EstimateExecutions=1.0

       |--Filter(WHERE:([A].[DEADLINEDELIVERY]=Convert('DUMMY_VAL')))
            EstimateRows=1.0, EstimateCPU=4.8E-7, AverageRowSize=214, TotalSubtreeCost=0.00334024, Parallel=false, EstimateExecutions=1.0

            |--Bookmark Lookup(BOOKMARK:([Bmk1000]), OBJECT:([SMI_PL_HEADER] AS [A]))
                 EstimateRows=1.0, EstimateIO=4.882658E-5, EstimateCPU=1.1E-6, AverageRowSize=214, TotalSubtreeCost=0.00333976, Parallel=false, EstimateExecutions=1.0

                 |--Index Seek(OBJECT:([SMI_PL_HEADER].[I_30301CUSTGROUPIDX] AS [A]), SEEK:([A].[DATAAREAID]='DUMMY_VAL'),  WHERE:([A].[CUSTACCOUNT]='DUMMY_VAL') ORDERED FORWARD)
                      EstimateRows=1.0, EstimateIO=0.003203401, EstimateCPU=8.40272E-5, AverageRowSize=38, TotalSubtreeCost=0.0032874283, Parallel=false, EstimateExecutions=1.0

Best Answer

Ideas to get you started:

This is an excellent idea, and the one I'd go for first. Except, SQL Server 2000 does not support INCLUDEd columns in indexes.
Clustered/non-clustered indexes is a different discussion entirely - to put it simply, a clustered index is actually the physical storage "order" of the data (the index is the data), while a non-clustered index stands alone and references the data by pointers. You can think of it as a library, where the clustered index is the actual bookshelves (ordered by type, author, title) and a non-clustered index is just a "rolodex". If you design a non-clustered index so it already contains all the information your query needs (a so-called "covering index"), you won't have to make the lookup-trip to the data, which radically improves query performance.

I would start by looking at why there are explicit index hints.

FROM INVENTTRANS A(INDEX(I_177DIMIDIDX))

.. tells SQL Server to always choose the index I_177DIMIDIDX, which can absolutely kill your query if the index doesn't properly cover all the data you want to retrieve, including the columns INVENTDIMID, DATEPHYSICAL, as well as the huge column list in the SELECT clause and the columns in ORDER BY.

Right off the bat, I would try creating indexes like these to try to resolve the problem.

CREATE INDEX ... ON INVENTTRANS
    (DATAAREAID, ITEMID, INVENTDIMID, DATEPHYSICAL)

A unique index may be even better, if that's possible by adding for instance VOUCHERPHYSICAL. You may still have a problem with the fact that the huge list of selected columns means that the database will have to "join" the index with the (clustered) table to look up those columns, but give it a try.

The second table is simpler, mostly due to the smaller number of search/output columns:

CREATE INDEX ... ON SMI_PL_HEADER
    (DATAAREAID, CUSTACCOUNT, DEADLINEDELIVERY)
    --- INCLUDE (NOMINAL)

Also adding NOMINAL to the indexed columns (you can't INCLUDE it) is certainly not pretty, but if you're desperate, it'll probably help you.

Related Solutions

Sql-server – Optimizing IP Range Search

I'm assuming there is a typo in the question as per Martin's comment.

A problem with a single range seek on either ipFrom or ipTo is that on average it will need to seek half the table. An alternative approach is

1) Add a computed column, on something like:

 ALTER TABLE MyTable
  ADD granule 
AS CASE WHEN (ipTo-ipFrom)<16 THEN 1 WHEN (ipTo-ipFrom)<256 THEN 2 ELSE 3 END;

2) Index the computed column and ipFrom

CREATE ClUSTERED INDEX IX ON MyTable(granule, ipFrom );

3) Query like this:

SELECT url
FROM   MyTable
WHERE  granule = 1
       AND ipFrom BETWEEN @somevalue - 16 AND @somevalue
       AND ipTo >= @somevalue
UNION ALL
SELECT url
FROM   MyTable
WHERE  granule = 2
       AND ipFrom BETWEEN @somevalue - 256 AND @somevalue
       AND ipTo >= @somevalue
UNION ALL
SELECT url
FROM   MyTable
WHERE  granule = 3
       AND ipFrom <= @somevalue
       AND ipTo >= @somevalue;

Precisely how you define the 'granule' function will depend on the spead of ipto-ipfrom in your data.

SQL Server – How to Read an Execution Plan

Why don't your input parameters match the type for the table? Why would you want to keep the wrong types there and perform any casts or conversions at all (whether implicit or explicit)? Why are you converting anything to FLOAT, of all things? To address specific questions:

My query says [low] is being converted from int to numeric but that doesn't seem to be what the cluster index seek details is showing, is it?

The convert of low is happening in the output, not in the seek predicate (the predicate is what is used to find matching rows and/or eliminate non-matching rows).

Is there a way to tell in this specific example how much "Better" it would be if I did the conversions Explicitly? I would think I could just do a "cast" in the query and insert a few numbers where the variables are, is that correct?

There's no way to make the execution plan show you how much better a different plan would be, except to generate that different plan and compare. You can use this comparison to document how much better it would be if the interface were correct (and two other ways would be to keep the interface but (a) perform explicit converts in the query - not of the column, but of the variables or (b) use local variables of the right type and assign them the values of the parameters). So you could show them 3 different ways to solve the problem, and show evidence that all 3 are better than the current version.

My recommendation is to fix the procedure the right way. First let's look at the actual types you care about:

USE master;
GO
SELECT t.name, c.max_length/CASE 
  WHEN t.name LIKE N'n[cvt]%' THEN 2 ELSE 1 END
FROM sys.all_columns AS c 
INNER JOIN sys.types AS t
ON c.system_type_id = t.system_type_id
AND c.system_type_id = t.user_type_id
WHERE EXISTS
(
  SELECT 1 FROM sys.all_objects AS o
    INNER JOIN sys.schemas AS s
    ON o.[schema_id] = s.[schema_id]
    WHERE o.[object_id] = c.[object_id]
    AND o.name = N'spt_values'
    AND s.name = N'dbo'
)
AND c.name IN (N'number',N'type');

Results:

number    int     4
type      nchar   3

So the interface to your stored procedure should be:

USE yourdb;
GO
ALTER PROCEDURE dbo.some_name
  @1 INT,
  @2 NCHAR(3),
  @3 NUMERIC(4, 0)
AS
BEGIN
  SET NOCOUNT ON;

  SELECT CONVERT([float], [low] / @3, 0) -- don't think you want float here
    FROM [master].[dbo].[spt_values]
    WHERE [number] = @1
    AND [type] = @2;
END
GO

Implicit conversions between varchar and nvarchar can be particularly bad (especially in the opposite scenario as yours - parameter is nvarchar and column is varchar), but there really is no reason to allow for a 8000-character parameter of any type when the longest string possible in the table is 3 characters...

Best Answer

Related Solutions

Sql-server – Optimizing IP Range Search

SQL Server – How to Read an Execution Plan

Related Question