T-SQL Efficient Looping – Best Practices for Conditional Loops

sql servert-sql

In got a programming task in the area of T-SQL.

Task:

People want to get inside an elevator every person has a certain
weight.
The order of the people waiting in line is determined by
the column turn.
The elevator has a max capacity of <= 1000 lbs.
Return the last person's name that is able to enter the elevator before it gets too heavy!
Return type should be table

Question:
What is the most efficient way to solve this problem? If looping is correct is there any room for improvement?

I used a loop and # temp tables, here my solution:

set rowcount 0
-- THE SOURCE TABLE "LINE" HAS THE SAME SCHEMA AS #RESULT AND #TEMP
use Northwind
go

declare @sum int
declare @curr int
set @sum = 0
declare @id int

IF OBJECT_ID('tempdb..#temp','u') IS NOT NULL
    DROP TABLE #temp

IF OBJECT_ID('tempdb..#result','u') IS NOT NULL
    DROP TABLE #result

create table #result( 
    id int not null,
    [name] varchar(255) not null,
    weight int not null,
    turn int not null
)

create table #temp( 
    id int not null,
    [name] varchar(255) not null,
    weight int not null,
    turn int not null
)

INSERT into #temp SELECT * FROM line order by turn

 WHILE EXISTS (SELECT 1 FROM #temp)
  BEGIN
   -- Get the top record
   SELECT TOP 1 @curr =  r.weight  FROM  #temp r order by turn  
   SELECT TOP 1 @id =  r.id  FROM  #temp r order by turn

    --print @curr
    print @sum

    IF(@sum + @curr <= 1000)
    BEGIN
    print 'entering........ again'
    --print @curr
      set @sum = @sum + @curr
      --print @sum
      INSERT INTO #result SELECT * FROM  #temp where [id] = @id  --id, [name], turn
      DELETE FROM #temp WHERE id = @id
    END
     ELSE
    BEGIN    
    print 'breaaaking.-----'
      BREAK
    END 
  END

   SELECT TOP 1 [name] FROM #result r order by r.turn desc

Here the Create script for the table I used Northwind for testing:

USE [Northwind]
GO

/****** Object:  Table [dbo].[line]    Script Date: 28.05.2018 21:56:18 ******/
SET ANSI_NULLS ON
GO

SET QUOTED_IDENTIFIER ON
GO

CREATE TABLE [dbo].[line](
    [id] [int] NOT NULL,
    [name] [varchar](255) NOT NULL,
    [weight] [int] NOT NULL,
    [turn] [int] NOT NULL,
PRIMARY KEY CLUSTERED 
(
    [id] ASC
)WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY],
UNIQUE NONCLUSTERED 
(
    [turn] ASC
)WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY]
) ON [PRIMARY]

GO

ALTER TABLE [dbo].[line]  WITH CHECK ADD CHECK  (([weight]>(0)))
GO

INSERT INTO [dbo].[line]
    ([id], [name], [weight], [turn])
VALUES
    (5, 'gary', 800, 1),
    (3, 'jo', 350, 2),
    (6, 'thomas', 400, 3),
    (2, 'will', 200, 4),
    (4, 'mark', 175, 5),
    (1, 'james', 100, 6)
;

Best Answer

You should try to avoid loops generally. They are normally less efficient than set based solutions as well as less readable.

The below should be pretty efficient.

Even more so if the name and weight columns could be INCLUDE-d in the index to avoid the key lookups.

It can scan the unique index in order of turn and calculate the running total of the Weight column - then use LEAD with the same ordering criteria to see what the running total in the next row will be.

As soon as it finds the first row where this exceeds 1000 or is NULL (indicating there is no next row) then it can stop the scan.

WITH T1
     AS (SELECT *,
                SUM(Weight) OVER (ORDER BY turn ROWS UNBOUNDED PRECEDING) AS cume_weight
         FROM   [dbo].[line]),
     T2
     AS (SELECT LEAD(cume_weight) OVER (ORDER BY turn) AS next_cume_weight,
                *
         FROM   T1)
SELECT TOP 1 name
FROM   T2
WHERE  next_cume_weight > 1000
        OR next_cume_weight IS NULL
ORDER  BY turn

Execution Plan

In practice it seems to read a few rows ahead of where is strictly necessary - it looks like each window spool/stream aggregate pair causes two additional rows to be read.

For the sample data in the question ideally it would only need to read two rows from the index scan but in reality it reads 6 but this is not a significant efficiency issue and it does not degrade as more rows are added to the table (as in this demo)

For those interested in this issue an image with the rows output by each operator (as shown by the query_trace_column_values extended event) is below, the rows are output in row_id order (starting at 47 for the first row read by the index scan and finishing at 113 for the TOP)

Click the image below to make it larger or alternatively see the animated version to make the flow easier to follow.

Pausing the animation at the point where the Right hand stream aggregate has emitted its first row (for gary - turn = 1). It seems apparent that it was waiting to receive its first row with a different WindowCount (for Jo - turn = 2). And the window spool doesn't release the first "Jo" row until it has read the next row with a different turn (for thomas - turn = 3)

So the window spool and stream aggregate both cause an additional row to be read and there are four of these in the plan - hence 4 additional rows.

An explanation of the columns shown in the above follows (based on info here)

NodeName: Index Scan, NodeId: 15, ColumnName: id base table column covered by index
NodeName: Index Scan, NodeId: 15, ColumnName: turn base table column covered by index
NodeName: Clustered Index Seek, NodeId: 17, ColumnName: weight base table column retrieved from lookup
NodeName: Clustered Index Seek, NodeId: 17, ColumnName: name base table column retrieved from lookup
NodeName: Segment, NodeId: 13, ColumnName: Segment1010 Returns 1 at start of new group or null otherwise. As no Partition By in the SUM only the first row gets 1
NodeName: Sequence Project, NodeId: 12, ColumnName: RowNumber1009 row_number() within group indicated by Segment1010 flag. As all rows are in the same group this is ascending integers from 1 to 6. Would be used for filtering right frame rows in cases like rows between 5 preceding and 2 following. (or as for LEAD later)
NodeName: Segment, NodeId: 11, ColumnName: Segment1011 Returns 1 at start of new group or null otherwise. As no Partition By in the SUM only the first row gets 1 (Same as Segment1010)
NodeName: Window Spool, NodeId: 10, ColumnName: WindowCount1012 Attribute that groups together rows belonging to a window frame. This window spool is using the "fast track" case for UNBOUNDED PRECEDING. Where it emits two rows per source row. One with the cumulative values and one with the detail values. Though there is no visible difference in the rows exposed by query_trace_column_values I assume that cumulative columns are there in reality.
NodeName: Stream Aggregate, NodeId: 9, ColumnName: Expr1004 Count(*) grouped by WindowCount1012 according to plan but actually a running count
NodeName: Stream Aggregate, NodeId: 9, ColumnName: Expr1005 SUM(weight) grouped by WindowCount1012 according to plan but actually the running sum of weight (i.e. cume_weight)
NodeName: Segment, NodeId: 7, ColumnName: Expr1002 CASE WHEN [Expr1004]=(0) THEN NULL ELSE [Expr1005] END - Don't see how COUNT(*) can be 0 so will always be running sum (cume_weight)
NodeName: Segment, NodeId: 7, ColumnName: Segment1013 No partition by on the LEAD so first row gets 1. All remaining get null
NodeName: Sequence Project, NodeId: 6, ColumnName: RowNumber1006 row_number() within group indicated by Segment1013 flag. As all rows are in the same group this is ascending integers from 1 to 4
NodeName: Segment, NodeId: 4, ColumnName: BottomRowNumber1008 RowNumber1006 + 1 as the LEAD requires the single next row
NodeName: Segment, NodeId: 4, ColumnName: TopRowNumber1007 RowNumber1006 + 1 as the LEAD requires the single next row
NodeName: Segment, NodeId: 4, ColumnName: Segment1014 No partition by on the LEAD so first row gets 1. All remaining get null
NodeName: Window Spool, NodeId: 3, ColumnName: WindowCount1015 Attribute that groups together rows belonging to a window frame using the previous row numbers. The window frame for LEAD has max 2 rows (the current one and the next one)
NodeName: Stream Aggregate, NodeId: 2, ColumnName: Expr1003 LAST_VALUE([Expr1002]) for the LEAD(cume_weight)

Best Answer

Execution Plan

Related Solutions

SQL Server Hierarchical Order – Parent-Child Tree Hierarchical ORDER BY

Sql-server – Publishing database changes with SSDT that include altering data types of columns

Related Question