Sql-server – Iterating through a record set and selecting certain records

sql serversql-server-2016

I am tasked with automating our item return process at work. Our policy for returns is to take the average price of the most recently purchased items desired to be returned.

We are effectively calculating a weighted average unit price of the most recently purchased items, excluding items from the average price once we have hit the desired return quantity.

I currently have a table in Microsoft SQL Server that contains the purchase history of all items ($, units, and purchase date) that have been asked to be returned, and the desired quantity to be returned.

enter image description here

I'll add a specific example; 18 units of Item number 138799528 are to be returned (NOTE that the table is already ordered by ItemNumber, CreatedDateTime descending). In order to determine the return price, we need the query to start with the most recently purchased item (line 137) and select records until the sum of the POQuantity is equal to or greater than the desired return quantity. In this case, the proper item selection would be rows 137,138 and 139, while excluding the rest of the purchases of Item 138799528. The query continues until it has iterated through all items in the record set.

I am attempting to iterate over this table and select the appropriate records into a another table with which to calculate the average. I have attempted using a cursor but my DBA has insisted to avoid cursors at all costs. Below is the cursor logic:

    DECLARE ReturnCursor CURSOR 


        FOR SELECT 
                T.ItemNumber
                ,-(T.ReturnQuantity)
                ,T.POQuantity
                ,T.PurchasePrice

            FROM #tableforcalc T
            ORDER BY T.ItemNumber, T.CreatedDateTime DESC           --Added order by so following logic works when looping through records

    OPEN ReturnCursor

        SET @Counter=0
        SET @Item2=''           --Needed to set Item2 = ''  



        FETCH NEXT FROM ReturnCursor
            INTO @Item, @ReturnQty, @POQty, @PurchPrice

    WHILE @@FETCH_STATUS=0
    IF @Item2!=@Item 
            BEGIN               

                SET @Counter+=@POQty ;
                SET @Item2=@Item;

                SELECT

                 @Item AS ID
                ,@POQty AS qty 
                ,@PurchPrice AS price
                ,@ReturnQty AS RQ

                --INTO #Temp1               --  Put selected variables into a temp table

            END


        ELSE                --this else condition is essentially  IF @Item2=@Item

            BEGIN

                IF (@Counter<@ReturnQty AND @Counter!=0)

                    BEGIN

                        SET @Counter+= @POQty;
                        SELECT 
                            @Item AS ID
                           ,@POQty AS qty
                           ,@PurchPrice AS price
                           ,@ReturnQty AS RQ

                        --INTO #Temp2        
                    END

                ELSE

                    BEGIN

                        SET @Counter=0;

                    END     

I was wondering if there is a way to solve this problem using set operations in SQL? I am open to all suggestions.

Best Answer

I'm going to assume that the screenshot in the question is the result of a join and that you have separate tables for returns to be processed and your order history. Even if your data model doesn't match this exactly I imagine that it's not difficult to get it into a similar structure.

CREATE TABLE #ActiveReturns (
        ItemNumber BIGINT,
        ReturnQuantity NUMERIC(19,2)
    );

INSERT INTO #ActiveReturns
VALUES
(138799528, -18);

CREATE TABLE #OrderHistory (
    ItemNumber BIGINT,
    POQuantity NUMERIC(19,2),
    PurchasePrice NUMERIC(19,2),
    CreatedDateTime DATETIME
);

INSERT INTO #OrderHistory
VALUES
(138799528, 4, 518.4, '20171102'),
(138799528, 8, 518.4, '20170606'),
(138799528, 12, 480, '20170126'),
(138799528, 8, 480, '20161115'),
(138799528, 8, 480, '20161114');

CREATE INDEX IXOH ON #OrderHistory (ItemNumber, CreatedDateTime)
INCLUDE (POQuantity, PurchasePrice);

I loaded in some sample data but didn't want to type everything. For future questions you should consider defining your tables and providing code to load in sample data. You'll get better and faster answers if you don't require the people answering to look at images and type a bunch of stuff.

The two set-based solutions that come to mind are using APPLY or using window functions. I imagine that your returns table is much, much smaller than your order history table and your question hints that you have supporting indexing in place so I like the APPLY approach better as a first attempt.

The algorithm goes something like this:

  • Start with a row from your returns table.
  • Get the oldest relevant row from the order history table.
  • Split that row into POQuantity rows. If you have at least ReturnQuantity rows then stop. Otherwise continue to get and split rows into you have enough.
  • Only keep the first ReturnQuantity rows.
  • Calculate the average of PurchasePrice.
  • Get the next row from returns and process all rows in this way.

You can accomplish that in SQL with a join to a numbers table and the TOP operator. Here's a quick numbers table:

CREATE TABLE #numbers (num BIGINT NOT NULL, PRIMARY KEY (num));

INSERT INTO #numbers
SELECT TOP (1000000) ROW_NUMBER() OVER (ORDER BY (SELECT NULL))
FROM master..spt_values t1
CROSS JOIN master..spt_values t2
OPTION (MAXDOP 1);

Here's the query I wrote to implement the algorithm:

SELECT ItemNumber, wa.weighted_average
FROM #ActiveReturns ar
CROSS APPLY (
    SELECT SUM(t.PurchasePrice) / ABS(ar.ReturnQuantity)
    FROM (
        SELECT TOP (CAST(ABS(ReturnQuantity) AS BIGINT)) oh.PurchasePrice
        FROM #OrderHistory oh
        INNER JOIN #numbers n ON oh.POQuantity >= n.num
        WHERE oh.ItemNumber = ar.ItemNumber
        ORDER BY oh.CreatedDateTime DESC
    ) t
) wa (weighted_average);

I get a weighted average of 505.60 which matches your expected results.

The query plan avoids processing unnecessary rows. As you can see in the query plan below, only three rows are read form the order history table and only 18 rows are read from the numbers table.

enter image description here