Mysql – Problems splitting and aggregating data in SQL

aggregateMySQLstring-splitting

Hello I have the following data in csv file:

OrderID, OrderType, Amount, Price, Time
      1,       Buy,     10,     1, 10/01/2018 10:30 AM
      2,       Buy,      2,     1, 10/01/2018 11:30 AM
      3,      Sell,      5,     3, 10/01/2018 01:30 PM
      4,      Sell,      3,     6, 11/01/2018 10:30 AM
      5,       Buy,     13,     2, 11/01/2018 11:30 AM
      6,      Sell,      3,     3, 12/01/2018 10:30 AM
      7,      Sell,     11,     2, 13/01/2018 10:30 AM

What I need to do is I need to aggregate the records, so that I have matching amounts for buy and sell.

So I need to split the buy or the sell in order to have matching amounts.
like this:

ver1: (do nto modify buy records, just sell)

for OrderID=1 I have 10 buy but OrderID=3 (first sell) has just 5.
So what I have to do is search next sell (ID 4) 3, now the sum is still under 10, so I go to the next. If the next is taken at full I will have sum over 10, so I need to split OrderID=6 into two lines, one with the remaining to the 10 (2) and the second with the remaining (3-2=1). Then I look into OrderID=2. Amount is 2, from OrderID=6 I have 1 remaining that I need to use + 1 from (ID 7).

ver2: (keep sell orders and split buy)

In this case OrderID=1 has to be split into 5,3,2 (to be used in OrderID= {3,4,6})
Then OrderID=2 has to be split into two for (OrderID={6,7})
And OrderID=5 is ok to be used in full,s o no modification needed…

Is there a way to do all this in normal SQL or do I need some external program to manipulate the data accordingly?

My data is coming in csv, but I can import it in MySQL.
The program that I use to read the csv with sql is QlikView, but I am looking for general solution for general SQL or MySQL.

Best Answer

As I mentioned in comments on your original question, one option is to leave the original records alone, and have a separate table that you use to tie "buy"s to matching "sell"s.

CREATE TABLE reconcile
     ( id INT AUTO_INCREMENT
      ,buy_order_id INT
      ,sell_order_id INT
      ,amount INT);

Then, create a stored procedure something like this:

CREATE PROCEDURE reconcile_buy_sell()
BEGIN
  DECLARE c_buy CURSOR FOR
  SELECT t.OrderID, (t.`Amount` - rec.rec_amount) as `Amount`
    FROM MyTable t
           INNER JOIN (SELECT buy_order_id, SUM(`amount`) as rec_amount
                         FROM reconcile
                        GROUP BY buy_order_id
                      ) rec ON (t.OrderId = rec.buy_order_id)
   WHERE t.OrderType = 'Buy'
     AND t.`Amount` > rec.rec_amount
   ORDER BY t.`Time`
  ;

  DECLARE c_sell CURSOR FOR
  SELECT t.OrderID, (t.`Amount` - rec.rec_amount) as `Amount`
    FROM MyTable t
           INNER JOIN (SELECT sell_order_id, SUM(`amount`) as rec_amount
                         FROM reconcile
                        GROUP BY sell_order_id
                      ) rec ON (t.OrderId = rec.sell_order_id)
   WHERE t.OrderType = 'Sell'
     AND t.`Amount` > rec.rec_amount
   ORDER BY t.`Time`
  ;

  DECLARE buy_id, sell_id, buy_amount, sell_amount INT;
  DECLARE done INT DEFAULT FALSE;

  DECLARE CONTINUE HANDLER FOR NOT FOUND SET done = TRUE;

  OPEN c_buy;
  OPEN c_sell;

  FETCH c_buy INTO buy_id, buy_amount;
  FETCH c_sell INTO sell_id, sell_amount;

  WHILE NOT done
  DO
    IF buy_amount < sell_amount THEN
      INSERT INTO reconcile (buy_order_id, sell_order_id, `amount`)
      VALUES (buy_id, sell_id, buy_amount);
      SET sell_amount = sell_amount - buy_amount;
      SET buy_amount = 0;
    ELSE
      INSERT INTO reconcile (buy_order_id, sell_order_id, `amount`)
      VALUES (buy_id, sell_id, sell_amount);
      SET buy_amount = buy_amount - sell_amount;
      SET sell_amount = 0;
    END IF;

    IF buy_amount = 0 THEN
      FETCH c_buy INTO buy_id, buy_amount;
    END IF;

    IF sell_amount = 0 THEN
      FETCH c_sell INTO sell_id, sell_amount;
    END IF;
  END WHILE;

  CLOSE c_buy;
  CLOSE c_sell;
END;

[Note: code is untested, and I'm not a MySQL expert - treat this as pseudocode, and feel free to correct any errors found.]

The two cursors will iterate through the "buy"s and "sell"s in the order in which they were entered. They only return orders where the full amount of the "buy" or "sell" has not already been reconciled in the reconcile table, and only return the unreconciled portion of the amount for those rows.

In the WHILE loop, we add rows to reconcile. If the amount of the current "buy" is less that the amount of the current "sell", then the full (remaining) amount for the buy goes into thereconcile` row; otherwise (whether the "sell" amount is less that or equal to the "buy" amount), we can use the amount for the sell.

We reduce the remaining amount for the current "sell" (or "buy", whichever is more) by the remaining amount for the "buy" (or "sell") we just used up, and set the "buy" (or "sell") amount to 0. If the amounts are the same, they'll both be 0 now.

Then, for whichever amounts are zero, we pull in the next record from the cursor.

When either cursor is out of records (including the degenerate case, where one or the other cursor returned no records), we stop reconciling.

NOTE: For performance purposes, it may be worth adding a reconciled bit column to MyTable (or whatever you actually call the table your CSV data gets imported to), and marking rows where the full amount has been reconciled as reconciled = TRUE. This could speed up locating the rows that aren't reconciled.