Sql-server – Performance tuning for gap analysis of table

gaps-and-islandssql serversql-server-2005

I have a table which stores the sequence (counter) of data received from devices out in the field. At any rate, these sequences need to be in order within a configurable time span, but can come into the system out of order. If a device is reset, then it's sequence number is set back to 0.

CREATE TABLE [dbo].[TapGapDetail]
(
[Id] [bigint] NOT NULL, --FK to another table
[DeviceESN] [varchar] (100) COLLATE SQL_Latin1_General_CP1_CI_AS NOT NULL,
[TapDateUTC] [datetime] NOT NULL, --date event occurred on device
[CreatedDateUTC] [datetime] NOT NULL,
[Counter] [int] NOT NULL,
)

CREATE CLUSTERED INDEX [CX_TapGapDetail] ON [dbo].[TapGapDetail] ([DeviceESN], [CreatedDateUTC], [Counter]) ON [PRIMARY]
GO

I've tried to switch the order the CX by DeviceESN and CreatedDateUTC, but on my system it doesn't seem to make much of a difference in the IO. There are millions of rows in this table.

Example insert would be:

INSERT INTO TapGapDetail (1, 'A', '1/1/2012 01:00AM', '1/1/2012 01:10AM', 5)
INSERT INTO TapGapDetail (2, 'A', '1/1/2012 12:05AM', '1/1/2012 01:15AM', 4) --out of order by insert date
INSERT INTO TapGapDetail (3, 'A', '1/2/2012 12:00AM', '1/2/2012 12:05AM', 6) --back in order
INSERT INTO TapGapDetail (4, 'A', '1/3/2012 01:00AM', '1/3/2012 01:05AM', 8) --missing 7
INSERT INTO TapGapDetail (5, 'A', '1/3/2012 06:00AM', '1/3/2012 06:05AM', 9) --in order outside 'tolerance'
INSERT INTO TapGapDetail (6, 'A', '1/10/2012 06:00AM', '1/10/2012 06:05AM', 0) --device reset

The report needs to report that a gap has occurred once for this data. There is also a detail, but one step at a time. I want to get it to run in less than 10 seconds, but seems to take at least 1.25 minutes on my local. I have stat IO if needed. So, the proc I created to get the "gaps" is like so:

CREATE PROC [dbo].[GetValidatorTapGapSummary]
    @Validator varchar(100) = ''
    ,@FilterDateUtc datetime
    ,@ToleranceHours int
AS

--temp table
SELECT 
RowID = ROW_NUMBER() OVER (ORDER BY DeviceESN, CreatedDateUTC, [Counter]),  
DeviceESN, 
TapDateUTC,
CreatedDateUTC,
[Counter]
INTO #taps
FROM TapGapDetail
WHERE CreatedDateUTC >= @FilterDateUtc
Order By 1

--cx
CREATE CLUSTERED INDEX CX1 ON #taps(RowID)

--results
select  
    t.DeviceESN as Validator
    ,sum(
        case 
            --They are in sequence
            when t2.[Counter] = t.[Counter]+ 1 then 0 
            --A reset has occured
            when t2.[Counter] < t.[Counter] then 0 
            --A gap exists. Find the difference
            else (t2.[Counter] - t.[Counter] - 1) 
        end) 
        as TapGaps
    ,case 
        --gets the last tap date per validator
        when MAX(t.TapDateUTC) > MAX(t2.TapDateUTC) THEN MAX(T.TapDateUTC) 
        ELSE MAX(t2.TapDateUTC) 
    END 
    AS MaxTapDate
    ,case 
        --gets the last tap date per validator
        when MAX(t.CreatedDateUTC) > MAX(t2.CreatedDateUTC) THEN MAX(T.CreatedDateUTC) 
        ELSE MAX(t2.CreatedDateUTC) 
    END 
    AS MaxCreatedDate
from #Taps t 
    inner join #Taps t2 on t.DeviceESN = t2.DeviceESN and t2.RowID = t.RowID + 1
where t2.[Counter] != t.[Counter] + 1
 and t2.[Counter] > t.[Counter]
 And t.CreatedDateUTC >= @FilterDateUtc
 And t2.CreatedDateUTC >= @FilterDateUtc
 And (t.DeviceESN = @Validator Or @Validator = '')
 And Not Exists --edge case for when there is a gap at the end, tried with left join and stats are the same, so this is easier to read I think.
        (Select Top 1 Null From TapGapDetail tgd Where tgd.DeviceEsn = t.DeviceESN And (t.Counter + 1) = tgd.Counter 
            And tgd.CreatedDateUTC >= DateAdd(day, -1 * @ToleranceHours, t.CreatedDateUTC)
            And tgd.CreatedDateUTC <= DateAdd(day, @ToleranceHours, t.CreatedDateUTC))
group by t.DeviceESN
Order By MaxCreatedDate Desc, t.DeviceEsn

GO

An example STATIO would be:

Table 'TapGapDetail'. Scan count 5, logical reads 24060, physical reads 400, read-ahead reads 4590, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

(2763981 row(s) affected)

(1 row(s) affected)
Table '#taps_______________________________________________________________________________________________________________000000000051'. Scan count 1, logical reads 18246, physical reads 0, read-ahead reads 251, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

(154 row(s) affected)
Table '#taps_______________________________________________________________________________________________________________000000000051'. Scan count 10, logical reads 37834, physical reads 0, read-ahead reads 1240, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
Table 'Worktable'. Scan count 0, logical reads 0, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
Table 'TapGapDetail'. Scan count 5, logical reads 13563, physical reads 3, read-ahead reads 13445, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

I've tried various indexes and what not on the temp table, but I suspect that even just filling is taking too long. Any suggestions are appreciated.

Best Answer

Use a different method.

For a start, don't populate a temporary table with 2.7M rows - that's not going to want to return in under ten seconds. You could use a CTE instead, and that might work much better.

WITH taps as (
SELECT 
RowID = ROW_NUMBER() OVER (ORDER BY DeviceESN, CreatedDateUTC, [Counter]),  
DeviceESN, 
TapDateUTC,
CreatedDateUTC,
[Counter]
FROM TapGapDetail
WHERE CreatedDateUTC >= @FilterDateUtc
)

--results
select  
    t.DeviceESN as Validator
    ,sum(
        case 
            --They are in sequence
            when t2.[Counter] = t.[Counter]+ 1 then 0 
            --A reset has occured
            when t2.[Counter] < t.[Counter] then 0 
            --A gap exists. Find the difference
            else (t2.[Counter] - t.[Counter] - 1) 
        end) 
        as TapGaps
    ,case 
        --gets the last tap date per validator
        when MAX(t.TapDateUTC) > MAX(t2.TapDateUTC) THEN MAX(T.TapDateUTC) 
        ELSE MAX(t2.TapDateUTC) 
    END 
    AS MaxTapDate
    ,case 
        --gets the last tap date per validator
        when MAX(t.CreatedDateUTC) > MAX(t2.CreatedDateUTC) THEN MAX(T.CreatedDateUTC) 
        ELSE MAX(t2.CreatedDateUTC) 
    END 
    AS MaxCreatedDate
from taps t 
    inner join taps t2 on t.DeviceESN = t2.DeviceESN and t2.RowID = t.RowID + 1
where t2.[Counter] != t.[Counter] + 1
 and t2.[Counter] > t.[Counter]
 And t.CreatedDateUTC >= @FilterDateUtc
 And t2.CreatedDateUTC >= @FilterDateUtc
 And (t.DeviceESN = @Validator Or @Validator = '')
 And Not Exists --edge case for when there is a gap at the end, tried with left join and stats are the same, so this is easier to read I think.
        (Select Top 1 Null From TapGapDetail tgd Where tgd.DeviceEsn = t.DeviceESN And (t.Counter + 1) = tgd.Counter 
            And tgd.CreatedDateUTC >= DateAdd(day, -1 * @ToleranceHours, t.CreatedDateUTC)
            And tgd.CreatedDateUTC <= DateAdd(day, @ToleranceHours, t.CreatedDateUTC))
group by t.DeviceESN
Order By MaxCreatedDate Desc, t.DeviceEsn

I'm not saying this is necessarily terrific (the "NOT EXISTS" bit could be a pain), but it's almost certainly an improvement.