Hello I have the following data in csv file:
OrderID, OrderType, Amount, Price, Time
1, Buy, 10, 1, 10/01/2018 10:30 AM
2, Buy, 2, 1, 10/01/2018 11:30 AM
3, Sell, 5, 3, 10/01/2018 01:30 PM
4, Sell, 3, 6, 11/01/2018 10:30 AM
5, Buy, 13, 2, 11/01/2018 11:30 AM
6, Sell, 3, 3, 12/01/2018 10:30 AM
7, Sell, 11, 2, 13/01/2018 10:30 AM
What I need to do is I need to aggregate the records, so that I have matching amounts for buy and sell.
So I need to split the buy or the sell in order to have matching amounts.
like this:
ver1: (do nto modify buy records, just sell)
for OrderID=1 I have 10 buy but OrderID=3 (first sell) has just 5.
So what I have to do is search next sell (ID 4) 3, now the sum is still under 10, so I go to the next. If the next is taken at full I will have sum over 10, so I need to split OrderID=6 into two lines, one with the remaining to the 10 (2) and the second with the remaining (3-2=1). Then I look into OrderID=2. Amount is 2, from OrderID=6 I have 1 remaining that I need to use + 1 from (ID 7).
ver2: (keep sell orders and split buy)
In this case OrderID=1 has to be split into 5,3,2 (to be used in OrderID= {3,4,6})
Then OrderID=2 has to be split into two for (OrderID={6,7})
And OrderID=5 is ok to be used in full,s o no modification needed…
Is there a way to do all this in normal SQL or do I need some external program to manipulate the data accordingly?
My data is coming in csv, but I can import it in MySQL.
The program that I use to read the csv with sql is QlikView, but I am looking for general solution for general SQL or MySQL.
Best Answer
As I mentioned in comments on your original question, one option is to leave the original records alone, and have a separate table that you use to tie "buy"s to matching "sell"s.
Then, create a stored procedure something like this:
[Note: code is untested, and I'm not a MySQL expert - treat this as pseudocode, and feel free to correct any errors found.]
The two cursors will iterate through the "buy"s and "sell"s in the order in which they were entered. They only return orders where the full amount of the "buy" or "sell" has not already been reconciled in the
reconcile
table, and only return the unreconciled portion of the amount for those rows.In the
WHILE
loop, we add rows toreconcile
. If the amount of the current "buy" is less that the amount of the current "sell", then the full (remaining) amount for the buy goes into the
reconcile` row; otherwise (whether the "sell" amount is less that or equal to the "buy" amount), we can use the amount for the sell.We reduce the remaining amount for the current "sell" (or "buy", whichever is more) by the remaining amount for the "buy" (or "sell") we just used up, and set the "buy" (or "sell") amount to 0. If the amounts are the same, they'll both be 0 now.
Then, for whichever amounts are zero, we pull in the next record from the cursor.
When either cursor is out of records (including the degenerate case, where one or the other cursor returned no records), we stop reconciling.
NOTE: For performance purposes, it may be worth adding a
reconciled
bit column toMyTable
(or whatever you actually call the table your CSV data gets imported to), and marking rows where the full amount has been reconciled asreconciled = TRUE
. This could speed up locating the rows that aren't reconciled.