Sql-server – Slow While Loop, Query Improvment Assistance

sql serversql-server-2016t-sql

I am working on creating a Datawarehouse.
I have created a Time Dimension (Dim_Time), at 5-minute intervals. Hour Aggregations will have [Minutes] = NULL.
For the purpose of this example:

CREATE TABLE [dbo].[Dim_Time](
    [TimeID] [int] IDENTITY(1,1) NOT NULL,
    [StartDateTime] [datetime] NULL,
    [Hour] [int] NULL,
    [Minute] [int] NULL,
  CONSTRAINT [PK_Dim_Time] PRIMARY KEY CLUSTERED 
  ([TimeID] ASC)
  ) ON [PRIMARY]
GO

Then I have an Incoming Table, which is updated every 5 minutes from the OLTP Database.

CREATE TABLE [dbo].[Stg_IncomingQueue](
    [IncomingID] [int] IDENTITY(1,1) NOT NULL,
    [CustomerID] [int] NOT NULL,
    [TimeID] [int] NULL,
    [InsertTime] [datetime] NULL,
 CONSTRAINT [PK_IncomingQueueMonitor] PRIMARY KEY CLUSTERED 
([IncomingID] ASC)
) ON [PRIMARY]
GO

I then have the following While loop. It's purpose is to get the correct 5-Minute time slot (TimeID) that relates to a particular incoming row:

WHILE 0 < (SELECT COUNT(*) FROM [dba_local].[dbo].[Stg_IncomingQueue] WHERE TimeID IS NULL)
BEGIN

    SELECT TOP 1 @IncomingID = IncomingID, @RowInserTime = InsertTime 
    FROM [dba_local].[dbo].[Stg_IncomingQueue] WHERE TimeID IS NULL


    ;WITH DimTime
    AS (
        SELECT MAX(TimeID) AS MaxTimeID FROM [dba_local].[dbo].[Dim_Time]
        WHERE StartDateTime < @RowInserTime AND [Minute] IS NOT NULL
    )
    UPDATE [dba_local].[dbo].[Stg_IncomingQueue]
    SET TimeID = (SELECT MaxTimeID FROM DimTime)
    WHERE IncomingID = @IncomingID

END

It's such a simple process, and yet I cannot figure out a simpler way to update the TimeID. As per the CTE SELECT in the loop, I need to get the MAX(TimeID) where the StartDateTime is less then the rows InsertTime.
Because time is the only relationship, I am struggling with all options to do this in 1 query without the loop, but I feel it is possible

Please can someone help me out here either with a better option or confirming that this is the simplest way.

Thank you very much for your time and assistance.
Wade

Best Answer

I created the following minimally complete and verifiable example, based on the two tables in your original question. It uses the LEAD T-SQL statement to obtain a time range from the dbo.Dim_Time table, which can be compared to the incoming rows quite easily.

IF OBJECT_ID(N'dbo.Stg_IncomingQueue', N'U') IS NOT NULL
DROP TABLE dbo.Stg_IncomingQueue;

IF OBJECT_ID(N'dbo.Dim_Time', N'U') IS NOT NULL
DROP TABLE dbo.Dim_time;

CREATE TABLE dbo.Dim_Time(
    TimeID int IDENTITY(1,1) NOT NULL,
    StartDateTime time(0) NULL,
  CONSTRAINT PK_Dim_Time PRIMARY KEY CLUSTERED 
  (TimeID ASC)
  ) ON [PRIMARY]
GO

;WITH src AS
(
    SELECT TOP (10) sv.number
    FROM master.dbo.spt_values sv
    WHERE sv.type = N'P'
    ORDER BY sv.number
)
INSERT INTO dbo.Dim_Time (StartDateTime)
SELECT TOP(289) CONVERT(time(0), DATEADD(minute, (s3.number * 100 + s2.number * 10 + s1.number) * 5, CONVERT(time(0), '00:00:00')))
FROM src s1
    CROSS JOIN src s2
    CROSS JOIN src s3
ORDER BY s3.number * 100 + s2.number * 10 + s1.number

CREATE TABLE dbo.Stg_IncomingQueue(
    IncomingID int IDENTITY(1,1) NOT NULL,
    CustomerID int NOT NULL,
    TimeID int NULL,
    InsertTime datetime NULL,
 CONSTRAINT PK_IncomingQueueMonitor PRIMARY KEY CLUSTERED 
(IncomingID ASC)
) ON [PRIMARY]
GO

INSERT INTO dbo.Stg_IncomingQueue (CustomerID, InsertTime)
VALUES (1, DATEADD(SECOND, CONVERT(int, CRYPT_GEN_RANDOM(4), 0), '1901-01-01 00:00:00'))
    , (2, DATEADD(SECOND, CONVERT(int, CRYPT_GEN_RANDOM(4), 0), '1901-01-01 00:00:00'))
    , (3, DATEADD(SECOND, CONVERT(int, CRYPT_GEN_RANDOM(4), 0), '1901-01-01 00:00:00'))
    , (4, DATEADD(SECOND, CONVERT(int, CRYPT_GEN_RANDOM(4), 0), '1901-01-01 00:00:00'));

This piece replaces your entire WHILE loop, with a single UPDATE statement, which is both more efficient, and easier to understand.

UPDATE dbo.Stg_IncomingQueue
SET TimeID = t.TimeID
FROM dbo.Stg_IncomingQueue iq
    INNER JOIN (
        SELECT dt.TimeID
            , dt.StartDateTime
            , EndDateTime = LEAD(dt.StartDateTime, 1) OVER (ORDER BY dt.StartDateTime)
        FROM dbo.Dim_Time dt 
        ) t ON CONVERT(time(0), iq.InsertTime) >= t.StartDateTime AND CONVERT(time(0), iq.InsertTime) < t.EndDateTime;

The results, compared side-by-side with the Dim_Time table:

SELECT *
FROM dbo.Stg_IncomingQueue iq
    INNER JOIN dbo.Dim_Time dt ON iq.TimeID = dt.TimeID;

The output looks like:

╔════════════╦════════════╦════════╦═════════════════════════╦════════╦═══════════════╗
║ IncomingID ║ CustomerID ║ TimeID ║       InsertTime        ║ TimeID ║ StartDateTime ║
╠════════════╬════════════╬════════╬═════════════════════════╬════════╬═══════════════╣
║          1 ║          1 ║    271 ║ 1875-06-30 22:31:49.000 ║    271 ║ 22:30:00      ║
║          2 ║          2 ║    116 ║ 1857-07-01 09:38:59.000 ║    116 ║ 09:35:00      ║
║          3 ║          3 ║    218 ║ 1854-09-18 18:08:39.000 ║    218 ║ 18:05:00      ║
║          4 ║          4 ║    221 ║ 1860-05-31 18:22:25.000 ║    221 ║ 18:20:00      ║
╚════════════╩════════════╩════════╩═════════════════════════╩════════╩═══════════════╝

Assuming there isn't a massive amount of incoming rows, this may work fairly well. Be aware, I'm using CONVERT() to convert the incoming datetime column into a time(0) value, which comes at a cost of the query optimizer not being able to use available statistics to help create a great plan. The "actual" query plan for the insert statement shows this warning:

Type conversion in expression (CONVERT(time(0),[iq].[InsertTime],0)>=[dt].[StartDateTime]) may affect "SeekPlan" in query plan choice, Type conversion in expression (CONVERT(time(0),[iq].[InsertTime],0)<[Expr1002]) may affect "SeekPlan" in query plan choice.

If you need to avoid the type-conversion during the update, you can move that workload to the insert operation by updating the definition of dbo.Stg_IncomingQueue to include a persisted computed column, as in:

CREATE TABLE dbo.Stg_IncomingQueue(
    IncomingID int IDENTITY(1,1) NOT NULL,
    CustomerID int NOT NULL,
    TimeID int NULL,
    InsertTime datetime NULL,
    InsertTime0 AS CONVERT(TIME(0), InsertTime) PERSISTED
 CONSTRAINT PK_IncomingQueueMonitor PRIMARY KEY CLUSTERED 
(IncomingID ASC)
) ON [PRIMARY]
GO

The update statement then becomes:

UPDATE dbo.Stg_IncomingQueue
SET TimeID = t.TimeID
FROM dbo.Stg_IncomingQueue iq
    INNER JOIN (
        SELECT dt.TimeID
            , dt.StartDateTime
            , EndDateTime = LEAD(dt.StartDateTime, 1) OVER (ORDER BY dt.StartDateTime)
        FROM dbo.Dim_Time dt 
        ) t ON iq.InsertTime0 >= t.StartDateTime AND iq.InsertTime0 < t.EndDateTime;