Sql-server – SQL multiple select with where conditions performs really poorly

performancequery-performancesql serversql-server-2005subqueryt-sql

Data structure is something like this:

Number| ID | DateTime | Status

50000001, 101, 1/1/09 0:10, PO
50000001, 110, 1/1/09 0:11, PO
50000001, 102, 1/1/09 0:15, PO
50000001, 101, 1/1/09 0:10, PCK 
50000001, 102, 1/1/09 0:12, PCK 
50000001, 110, 1/1/09 0:12, PCK 
50000001, 101, 1/1/09 0:15, C 
50000001, 101, 1/1/09 0:15, C+
50000001, 110, 1/1/09 0:15, C 
50000001, 110, 1/1/09 0:15, C+
50000001, 102, 1/1/09 0:15, C 
50000001, 102, 1/1/09 0:15, C+
50000002, 126, 1/1/09 0:13, WO
50000002, 126, 1/1/09 0:14, PCK 
50000002, 126, 1/1/09 0:14, C 
50000002, 126, 1/1/09 0:14, S

I am trying to select data based on several values in the Status column and return it formatted like:

| Number | OrderOpen | OrderCleared |

My query looks like this:

Select 
SetOpen.ID, SetOpen.Num, SetOpen.OrderOpen, SetCleared.OrderCleared
FROM 
(
SELECT
    order_status_hist.ID AS 'ID',
    order_status_hist.Number AS 'Num',
    order_status_hist.datetime AS 'OrderOpen' 
FROM 
    dbo.order_status_hist
WHERE 
    order_status_hist.Status='WO' OR 
    order_status_hist.Status='PO'
) as SetOpen
inner JOIN
(
SELECT
    order_status_hist.ID AS 'ID',
    order_status_hist.Number AS 'Num',
    order_status_hist.datetime AS 'OrderCleared' 
FROM 
    order_status_hist
WHERE 
(
    order_status_hist.Status='C+' OR 
    order_status_hist.Status='S' 
)
) AS SetCleared

ON SetOpen.ID = SetCleared.ID

In my production database there are about 50 values in the status column, and my where statements have 10 'cases' in the first one and 5 in the second one. When I run the production version of this, it takes a VERY long time to complete. The further trouble is, this is really just part of a larger query, I would be joining these results on 'Number' to pull data in more tables.

Best Answer

Given this heap:

CREATE TABLE #foo(Number int, ID tinyint, [DateTime] datetime, [Status] varchar(10));

INSERT #foo(Number,ID,[DateTime],[Status]) VALUES
(50000001, 101, '1/1/2009 0:10', 'PO'),
(50000001, 110, '1/1/2009 0:11', 'PO'),
(50000001, 102, '1/1/2009 0:15', 'PO'),
(50000001, 101, '1/1/2009 0:10', 'PCK'), 
(50000001, 102, '1/1/2009 0:12', 'PCK'), 
(50000001, 110, '1/1/2009 0:12', 'PCK'), 
(50000001, 101, '1/1/2009 0:15', 'C'),
(50000001, 101, '1/1/2009 0:15', 'C+'),
(50000001, 110, '1/1/2009 0:15', 'C'),
(50000001, 110, '1/1/2009 0:15', 'C+'),
(50000001, 102, '1/1/2009 0:15', 'C'),
(50000001, 102, '1/1/2009 0:15', 'C+'),
(50000002, 126, '1/1/2009 0:13', 'WO'),
(50000002, 126, '1/1/2009 0:14', 'PCK'), 
(50000002, 126, '1/1/2009 0:14', 'C'),
(50000002, 126, '1/1/2009 0:14', 'S');

This query drops it from two scans to a single scan, gets rid of a hash match, and shifts most of the cost to a sort:

SELECT
  ID, Num = Number,
  OrderOpen    = MIN(CASE WHEN [Status] IN ('WO','PO') THEN [DateTime] END),
  OrderCleared = MAX(CASE WHEN [Status] IN ('S','C+')  THEN [DateTime] END)
FROM #foo
GROUP BY ID, Number;

This PIVOT is also cheaper, but still has the expensive sort:

SELECT * FROM 
(
  SELECT ID, Number, [DateTime], BetterStatus = CASE 
    WHEN [Status] IN ('WO','PO') THEN 'OrderOpen'
    WHEN [Status] IN ('S', 'C+') THEN 'OrderCleared' END
  FROM #foo
) AS f
PIVOT
(
  MAX([Datetime]) FOR BetterStatus IN ([OrderOpen],[OrderCleared])
) AS p;
GO

However, if we add a computed column and an index:

ALTER TABLE #foo ADD BetterStatus AS CONVERT(varchar(12), 
  ISNULL(CASE WHEN [Status] IN ('WO','PO') THEN 'OrderOpen'
       WHEN [Status] IN ('S','C+')  THEN 'OrderCleared' END, '?'));
GO
CREATE INDEX x ON #foo(Number, ID, [Datetime], BetterStatus);
GO

Then these two queries fare much better (both producing ordered scans without any additional sorting required):

SELECT
  ID, Num = Number,
  OrderOpen    = MIN(CASE [BetterStatus] WHEN 'OrderOpen' THEN [DateTime] END),
  OrderCleared = MAX(CASE [BetterStatus] WHEN 'OrderCleared' THEN [DateTime] END)
FROM #foo
GROUP BY ID, Number;

SELECT ID, Num, OrderOpen, OrderCleared
FROM 
(
  SELECT ID, Num = Number, [DateTime], BetterStatus FROM #foo
) AS f
PIVOT
(
  MAX([Datetime]) FOR BetterStatus IN ([OrderOpen],[OrderCleared])
) AS p;

So, if you don't have supporting indexes and/or can't add computed columns or new indexes, use one of the first two queries, and if you can add these items (or modify existing indexes to achieve the same effect), use one of the latter two.

This isn't exhaustive tuning, of course. Just a kickstart.