Sql-server – SQL multiple select with where conditions performs really poorly

performancequery-performancesql serversql-server-2005subqueryt-sql

Data structure is something like this:

Number| ID | DateTime | Status

50000001, 101, 1/1/09 0:10, PO
50000001, 110, 1/1/09 0:11, PO
50000001, 102, 1/1/09 0:15, PO
50000001, 101, 1/1/09 0:10, PCK 
50000001, 102, 1/1/09 0:12, PCK 
50000001, 110, 1/1/09 0:12, PCK 
50000001, 101, 1/1/09 0:15, C 
50000001, 101, 1/1/09 0:15, C+
50000001, 110, 1/1/09 0:15, C 
50000001, 110, 1/1/09 0:15, C+
50000001, 102, 1/1/09 0:15, C 
50000001, 102, 1/1/09 0:15, C+
50000002, 126, 1/1/09 0:13, WO
50000002, 126, 1/1/09 0:14, PCK 
50000002, 126, 1/1/09 0:14, C 
50000002, 126, 1/1/09 0:14, S

I am trying to select data based on several values in the Status column and return it formatted like:

| Number | OrderOpen | OrderCleared |

My query looks like this:

Select 
SetOpen.ID, SetOpen.Num, SetOpen.OrderOpen, SetCleared.OrderCleared
FROM 
(
SELECT
    order_status_hist.ID AS 'ID',
    order_status_hist.Number AS 'Num',
    order_status_hist.datetime AS 'OrderOpen' 
FROM 
    dbo.order_status_hist
WHERE 
    order_status_hist.Status='WO' OR 
    order_status_hist.Status='PO'
) as SetOpen
inner JOIN
(
SELECT
    order_status_hist.ID AS 'ID',
    order_status_hist.Number AS 'Num',
    order_status_hist.datetime AS 'OrderCleared' 
FROM 
    order_status_hist
WHERE 
(
    order_status_hist.Status='C+' OR 
    order_status_hist.Status='S' 
)
) AS SetCleared

ON SetOpen.ID = SetCleared.ID

In my production database there are about 50 values in the status column, and my where statements have 10 'cases' in the first one and 5 in the second one. When I run the production version of this, it takes a VERY long time to complete. The further trouble is, this is really just part of a larger query, I would be joining these results on 'Number' to pull data in more tables.

Best Answer

Given this heap:

CREATE TABLE #foo(Number int, ID tinyint, [DateTime] datetime, [Status] varchar(10));

INSERT #foo(Number,ID,[DateTime],[Status]) VALUES
(50000001, 101, '1/1/2009 0:10', 'PO'),
(50000001, 110, '1/1/2009 0:11', 'PO'),
(50000001, 102, '1/1/2009 0:15', 'PO'),
(50000001, 101, '1/1/2009 0:10', 'PCK'), 
(50000001, 102, '1/1/2009 0:12', 'PCK'), 
(50000001, 110, '1/1/2009 0:12', 'PCK'), 
(50000001, 101, '1/1/2009 0:15', 'C'),
(50000001, 101, '1/1/2009 0:15', 'C+'),
(50000001, 110, '1/1/2009 0:15', 'C'),
(50000001, 110, '1/1/2009 0:15', 'C+'),
(50000001, 102, '1/1/2009 0:15', 'C'),
(50000001, 102, '1/1/2009 0:15', 'C+'),
(50000002, 126, '1/1/2009 0:13', 'WO'),
(50000002, 126, '1/1/2009 0:14', 'PCK'), 
(50000002, 126, '1/1/2009 0:14', 'C'),
(50000002, 126, '1/1/2009 0:14', 'S');

This query drops it from two scans to a single scan, gets rid of a hash match, and shifts most of the cost to a sort:

SELECT
  ID, Num = Number,
  OrderOpen    = MIN(CASE WHEN [Status] IN ('WO','PO') THEN [DateTime] END),
  OrderCleared = MAX(CASE WHEN [Status] IN ('S','C+')  THEN [DateTime] END)
FROM #foo
GROUP BY ID, Number;

This PIVOT is also cheaper, but still has the expensive sort:

SELECT * FROM 
(
  SELECT ID, Number, [DateTime], BetterStatus = CASE 
    WHEN [Status] IN ('WO','PO') THEN 'OrderOpen'
    WHEN [Status] IN ('S', 'C+') THEN 'OrderCleared' END
  FROM #foo
) AS f
PIVOT
(
  MAX([Datetime]) FOR BetterStatus IN ([OrderOpen],[OrderCleared])
) AS p;
GO

However, if we add a computed column and an index:

ALTER TABLE #foo ADD BetterStatus AS CONVERT(varchar(12), 
  ISNULL(CASE WHEN [Status] IN ('WO','PO') THEN 'OrderOpen'
       WHEN [Status] IN ('S','C+')  THEN 'OrderCleared' END, '?'));
GO
CREATE INDEX x ON #foo(Number, ID, [Datetime], BetterStatus);
GO

Then these two queries fare much better (both producing ordered scans without any additional sorting required):

SELECT
  ID, Num = Number,
  OrderOpen    = MIN(CASE [BetterStatus] WHEN 'OrderOpen' THEN [DateTime] END),
  OrderCleared = MAX(CASE [BetterStatus] WHEN 'OrderCleared' THEN [DateTime] END)
FROM #foo
GROUP BY ID, Number;

SELECT ID, Num, OrderOpen, OrderCleared
FROM 
(
  SELECT ID, Num = Number, [DateTime], BetterStatus FROM #foo
) AS f
PIVOT
(
  MAX([Datetime]) FOR BetterStatus IN ([OrderOpen],[OrderCleared])
) AS p;

So, if you don't have supporting indexes and/or can't add computed columns or new indexes, use one of the first two queries, and if you can add these items (or modify existing indexes to achieve the same effect), use one of the latter two.

This isn't exhaustive tuning, of course. Just a kickstart.

Related Solutions

Sql-server – How to avoid index scans in SQL Server 2005

Any solution :)

No, not really.

As you said yourself - CellRow is just not very selective - 5 possible values, 100'000 rows = roughly 20'000 rows for each possible value.

SQL Server's query optimizer recognizes this and probably figures it's easier and more efficient to do a index scan rather than a seek for 20'000 rows.

The only way to avoid this would be to use a more selective index, i.e. some other column that selects 2%, 3% or max. 5% of the rows for each query.

PS: Check your execution plan - does it get the values straight from the index, or does it need a "Key Lookup" step to go get the data??

You didn't mention what data types your columns are - if the CellValue isn't too big, you could add it to the index (or at least include it in the index) to avoid costly key lookups:

CREATE INDEX IX_CellRow_CellValues
ON dbo.Cell(CellRow) INCLUDE(CellValue)

You'd still have the index scan, though

SQL Server – Using PIVOT to Combine Single Year Column and Multiple Weeks Columns

This sounds more like an UNPIVOT to me.

SELECT SalesTable.Pk, CrossApplied.WeekYear, CrossApplied.Value
FROM SalesTable
CROSS APPLY (VALUES ('01'+CAST([Year] AS CHAR(40)), Week1),
                    ('02'+CAST([Year] AS CHAR(40)), Week2),
                    ('03'+CAST([Year] AS CHAR(40)), Week3),
                    .......)
        CrossApplied (WeekYear, Value)

There are other ways to do an UNPIVOT but using CROSS APPLY is my favorite. I give a more general example here.

Best Answer

Related Solutions

Sql-server – How to avoid index scans in SQL Server 2005

SQL Server – Using PIVOT to Combine Single Year Column and Multiple Weeks Columns

Related Question