Sql-server – Improve a query using cursor to SET Based approach

cursorsoptimizationperformancequery-performancesql serversql-server-2012

I am still new to Query optimisation, and I have a Stored Procedure which uses a Cursor to go through each row within the Table, and performs the following operations:

  1. Calculate the Time Difference between each Row
  2. Calculate Distance Between each Row
  3. If Distance < 5 AND TimeDifference > 3 minutes THEN ADD to TEMP TABLE

I tried converting this Cursor to a WHILE Loop but performance decreased.
So I need help converting this into a SET BASED approach instead of a Procedural Based Approach

So the Cursor performs this Logic:

-- READ Current Row into Cursor Variables
FETCH NEXT FROM crAssetIgnitionOnOff INTO
              @current_iVehicleMonitoringID
            , @current_iAssetID
            , @current_dtUTCDateTime
            , @current_sptGeoLocationPoint
            , @current_fLatitude
            , @current_fLongitude
            , @current_fAngle
            , @current_fSpeedKPH 
            , @current_sIgnitionStatus
            , @current_eEventCode
            , @current_sEventCode

IF(@current_iAssetID = @prev_iAssetID)
    BEGIN

        ---- Calculate Time Difference from previous Point
    DECLARE @diffInSeconds INT
        SET @diffInSeconds = DATEDIFF(SECOND, @prev_dtUTCDateTime, @current_dtUTCDateTime)

    DECLARE @diffInMinutes INT
        SET @diffInMinutes = @diffInSeconds / 60

        -- Calcualte the Distance from previous position
        DECLARE @tempDistance FLOAT;
            SELECT @tempDistance = @current_sptGeoLocaitonPoint.STDistance(@prev_sptGeoLocaitonPoint);

        -- Check if distance travelled less than 5, AND Time difference between points greater than user selected Idle Minutes (@iIdleMinutes) AND prev ignition status = On
        IF(@diffInSeconds > @iIdleMinutes AND @tempDistance < 5 AND @prev_sIgnitionStatus = 'On')   
            BEGIN

                DECLARE @sTime VARCHAR(30)
                    SELECT @sTime = dbo.xPT_ConvertTimeToDDHHMMSS(@diffInSeconds,'s')

                INSERT INTO @tblExcessiveIdleTime(
                                    AssetID,
                                    PreviousDate,
                                    CurrentDate,
                                    TimeString,
                                    TimeInSeconds
                            )
                            VALUES
                            (
                                @current_iAssetId,
                                @prev_dtUTCDateTime,
                                @current_dtUTCDateTime,
                                @sTime,
                                @diffInSeconds
                            )

            END 
    END


    -- Set Previous Values End of Loop
    SET  @prev_iVehicleMonitoringID = @current_iVehicleMonitoringID
    SET  @prev_iAssetID = @current_iAssetID
    SET  @prev_dtUTCDateTime = @current_dtUTCDateTime
    SET  @prev_sptGeoLocationPoint = @current_sptGeoLocationPoint
    SET  @prev_fLatitude = @current_fLatitude
    SET  @prev_fLongitude = @current_fLongitude
    SET  @prev_fAngle = @current_fAngle
    SET  @prev_fSpeedKPH = @current_fSpeedKPH
    SET  @prev_sIgnitionStatus = @current_sIgnitionStatus
    SET  @prev_eEventCode = @current_eEventCode
    SET  @prev_sEventCode = @current_sEventCode
END

Now this takes 17 minutes to Execute in some cases, so I tried converting it to a WHILE Loop – (http://www.sqlbook.com/SQL/Avoiding-using-SQL-Cursors-20.aspx)

Which was not a good idea – as it's performance with Number of logical reads was 4 times that of the Cursor. And it took longer to process:

WHILE @RowCount <= @NumberRecords
    BEGIN

        -- Check for First Row
        IF @RowCount = 1
            BEGIN
                -- Set First Row as Previous
                SELECT @previous_iAssetID = iAssetID, @previous_sptGeoLocaitonPoint = sptGeoLocaitonPoint, @previous_dtUTCDateTime = dtUTCDateTime, @previous_sIgnitionStatus = sIgnitionStatus
                FROM #tblVehicleMonitoringLog WHERE RowID = @RowCount
            END
        ELSE
            BEGIN

                /* Select current Row */
                SELECT @current_iAssetID = iAssetID, @current_sptGeoLocaitonPoint = sptGeoLocaitonPoint, @current_dtUTCDateTime = dtUTCDateTime, @current_sIgnitionStatus = sIgnitionStatus
                FROM #tblVehicleMonitoringLog WHERE RowID = @RowCount


                /******** IMPLEMENT REPORT LOGIC **********/

                IF(@current_iAssetID = @previous_iAssetID)
                    BEGIN

                        ---- Calculate Time Difference from previous Point
                    DECLARE @diffInSeconds INT
                        SET @diffInSeconds = DATEDIFF(SECOND, @previous_dtUTCDateTime, @current_dtUTCDateTime)

                    DECLARE @diffInMinutes INT
                        SET @diffInMinutes = @diffInSeconds / 60

                        -- Calcualte the Distance from previous position
                        DECLARE @tempDistance FLOAT;
                            SELECT @tempDistance = @current_sptGeoLocaitonPoint.STDistance(@previous_sptGeoLocaitonPoint);

                        -- Check if distance travelled less than 5, AND Time difference between points greater than user selected Idle Minutes (@iIdleMinutes) AND prev ignition status = On
                        IF(@diffInSeconds > @iIdleMinutes AND @tempDistance < 5 AND @previous_sIgnitionStatus = 'On')   
                            BEGIN

                                DECLARE @sTime VARCHAR(30)
                                    SELECT @sTime = dbo.xPT_ConvertTimeToDDHHMMSS(@diffInSeconds,'s')

                                INSERT INTO @tblExcessiveIdleTime(
                                                    iAssetID,
                                                    dtIgnitionOn,
                                                    dtNextPeriodic,
                                                    sTime,
                                                    iTimeDurationInSeconds
                                            )
                                            VALUES
                                            (
                                                @current_iAssetId,
                                                DATEADD(hour, @fGmtOffSet, @previous_dtUTCDateTime),
                                                DATEADD(hour, @fGmtOffSet, @current_dtUTCDateTime),
                                                @sTime,
                                                @diffInSeconds
                                            )

                            END 
                    END


                -- Set Previous Values End of Loop
                SET @previous_iAssetID = @current_iAssetID;
                SET @previous_sptGeoLocaitonPoint = @current_sptGeoLocaitonPoint;
                SET @previous_dtUTCDateTime = @current_dtUTCDateTime;
                SET @previous_sIgnitionStatus = @current_sIgnitionStatus;

            END



        -- increment Row Number
        SET @RowCount = @RowCount + 1


    END -- END OF WHILE LOOP

So after looking online AGAIN – I found we can calculate time difference between two Rows. (https://stackoverflow.com/questions/2357515/calculate-time-difference-between-two-rows)

This is what the RAW Data looks like.

This is what the Raw data looks like

I need to calculate the Time Difference and the Distance Between Each Row WHERE

  1. Current iAssetID = Previous iAssetID
  2. Previous sDigitalInputValue = '10000000'

This is the Query I came up with:

WITH    rows AS
        (
            SELECT  *, ROW_NUMBER() OVER (ORDER BY dtUTCDateTime) AS rn
                FROM    VehicleMonitoringLog
                Where dtUTCDateTime > GetDate() - 1
                --Order by iAssetId, dtUTCDateTime
        )


SELECT mc.iVehicleMonitoringId as CurrentID, mp.iVehicleMonitoringId as PreviousID, 
        mc.iAssetId as CurrentAsset, mp.iAssetId As PreviousAsset, mc.dtUTCDateTime as CurrentTime, mp.dtUTCDateTime as PreviousTime,
 DATEDIFF(second, mc.dtUTCDateTime, mp.dtUTCDateTime) AS DateDiffSeconds
FROM    rows mc
JOIN    rows mp
ON      mc.rn = mp.rn - 1

EDIT

My Query which is working now – Please let me know if you see any performance issues with this:

SELECT dt.CurrentAsset,
        dt.Distance,
        dt.DateDiffSeconds,
        dt.CurrentIgnition,
        dt.PreviousIgnition,
        ta.sReference,
        ta.sCategoryName,
        ta.sSiteName,
    dbo.xPT_ConvertTimeToDDHHMMSS(DateDiffSeconds,'s')
FROM (
    SELECT iVehicleMonitoringId AS CurrentID,
       LEAD(iVehicleMonitoringId, 1) OVER (PARTITION BY iAssetID ORDER BY dtUTCDateTime) AS PreviousID, 
       iAssetId AS CurrentAsset,
       LEAD(iAssetId, 1) OVER (PARTITION BY iAssetID ORDER BY dtUTCDateTime) AS PreviousAsset,
       sDigitalInputValue AS CurrentIgnition,
       LEAD(sDigitalInputValue, 1) OVER (PARTITION BY iAssetID ORDER BY dtUTCDateTime) AS PreviousIgnition,
       dtUTCDateTime AS CurrentTime,
       LEAD(dtUTCDateTime, 1) OVER (PARTITION BY iAssetID ORDER BY dtUTCDateTime) AS PreviousTime,
       DATEDIFF(second, dtUTCDateTime, LEAD(dtUTCDateTime, 1) OVER (PARTITION BY iAssetID ORDER BY dtUTCDateTime)) AS DateDiffSeconds,
       sptGeoLocaitonPoint.STDistance(LEAD(sptGeoLocaitonPoint, 1) OVER (PARTITION BY iAssetID ORDER BY dtUTCDateTime)) AS Distance

        FROM VehicleMonitoringLog
        WHERE dtUTCDateTime > @utcStartDate AND dtUTCDateTime < @utcEndDate

) AS dt
Inner join #tblAssets ta on ta.iAssetID = dt.CurrentAsset
WHERE  CurrentIgnition = '10000000' AND Distance < 5 AND DateDiffSeconds > @iIdleMinutes

Best Answer

Your CTE-based approach with window functions is a very good start. There's another, even more suitable window function that you could use: LAG().

Here's how:

SELECT iVehicleMonitoringId AS CurrentID,
       LAG(iVehicleMonitoringId, 1) OVER (ORDER BY dtUTCDateTime) AS PreviousID, 
       iAssetId AS CurrentAsset,
       LAG(iAssetId, 1) OVER (ORDER BY dtUTCDateTime) AS PreviousAsset,
       dtUTCDateTime AS CurrentTime,
       LAG(dtUTCDateTime, 1) OVER (ORDER BY dtUTCDateTime) AS PreviousTime,
       DATEDIFF(second,
           dtUTCDateTime,
           LAG(dtUTCDateTime, 1) OVER (ORDER BY dtUTCDateTime)
           ) AS DateDiffSeconds
FROM VehicleMonitoringLog
WHERE dtUTCDateTime > DATEADD(day, -1, SYSDATETIME());

Basically, LAG(column, n) OVER (ORDER BY x) returns the value of column, n row(s) back (so n=1 returns the previous row), ordered by x.

Your CTE solution will scan VehicleMonitoringLog twice, then join the two streams. This query will perform only a single scan, which is much, much more efficient. The LAG() function (and its cousin LEAD()) is available as of SQL Server 2012.

Partitioning

It looks like you've forgotten some type of partitioning term (iAssetID, pehaps?). The partitioning term serves to separate data points between different vehicles, in case two vehicles are out-and-about at the same time. Add this partition term to the query by changing the OVER() clause from OVER (ORDER BY dtUTCDateTime) to OVER (PARTITION BY iAssetID ORDER BY dtUTCDateTime).

Indexing

To make this solution really fly, I would create the following index on VehicleMonitoringLog:

CREATE INDEX... (iAssetID, dtUTCDateTime);   --- if you're using PARTITION BY

.. or

CREATE INDEX... (dtUTCDateTime);             --- without PARTITION BY