Mysql – Update row if it has a certain set of values, add new row if it returns to initial value

MySQL

I'm having some trouble importing data into a table that I'm not sure how to describe succintly enough that can I can properly search for it (not that I haven't tried).

I have a table with three columns (actually more like 150, but these are the ones that matter):

order_number, date, status

that I update with some CSV data I don't control.

Status can have 5 states: 'pending', 'started', 'cancelled', 'not done' and 'complete'.

Each order will start 'pending', then progress to 'started', and finally end at 'cancelled', 'not done' or 'complete'.

If they end at 'cancelled' or 'complete', there's no problem, I just keep updating the status for the day and that's it.

The problem is when the status ends at 'not done'.
Usually when that happens, there's a new attempt only the next day, so I can just use the date to differentiate between attempts, but frequently enough to be a problem, a new attempt is made on the same day, and I need to keep both the first attempt, and the new one (that can still fail).

There's no field meant specifically for differentiating the attempts, nor changes to the order number, the time fields I could possibly try to use are populated with estimated times until the attempt is finished, and the only reason I know there was more than one attempt is because I get two rows for the same order on the same day.

Currently I'm loading the data into a temporary table, then using "INSERT…ON DUPLICATE KEY UPDATE" with a unique key on order_number and date to load it into its proper place, but that's causing me some loss of data.

I've tried adding a calculated field with value based on the final status to the index, but then I got at least two rows for each order: one for the final status, and one for 'started'.

I'm looking for some ideas for how I can solve it without processing the data outside the database before insert, since it would take awhile on the limited hardware available at the moment.

Best Answer

MySQL is rusty so bear with me

You are on the right path using a staging table to import the CSV file, but you going to need to break the Inserts and the Update into separate steps

Not stated but i'm going to assume the import may have specific Order show up for each status changed so the order may show up 5 or more times from the Import and that the CSV file is created serially meaning PENDING, STARTED, NOT DONE, (CANCELED, COMPLETE) would appear in that order never in a mixed order. Cancel Showing before Pending or Start. If that is the case order the Update may need to fiddled with

Create ImportTable (Serial_ID INT NOT NULL AUTO_INCREMENT,
"columns from the CSV" )

Created an ID field so the order of the CSV is maintained by the import.

Import the CSV. Not sure this is the correct MySQL code to Import and set a column to a different value

LOAD DATA INFILE 'import.csv'
INTO TABLE ImportTable
FIELDS TERMINATED BY ',' ENCLOSED BY '"'
LINES TERMINATED BY '\n'
IGNORE 1 ROWS
(@id, "lis of columns ")
SET id =nextval('Serial_ID)  ;

Second step get all the new orders we are going to ignore the status for the first import

Insert into OrderTable 
(Select Distinct "columns to insert", 'importing' as OrderStatus 
 from ImportTable where Order_id not in 
   (Select Order_id from OrderTable ) 
)

Now have all the orders in the system lets just process the status changes serially Don't need the CASE statement for Order_Status unless the CSV order is all random..

DELIMITER $$
CREATE FUNCTION process_import_status() returns bool
BEGIN
DECLARE @(list of vars for COLUMNS to Update)
DECLARE updates_cursor CURSOR FOR
 SELECT "columns to update" 
  CASE Order_status
    WHEN 'pending' THEN 1
    WHEN 'started' THEN 2
    WHEN 'not done' THEN 3
    WHEN 'cancelled' THEN 4
    WHEN 'complete' THEN 5 
 END as 'sort_order
 FROM ImportTable order by sort_order, Serial_ID, date;

updates_loop  Loop:
  FETCH  updates_cursor INTO @(list of vars for COLUMNS to Update);
  update OrderTable SET "list of columns"= @(LIST of COLUMNS) 
     where Order_id = @order_id ;
END LOOP  updates_loop;
END $$$
DELIMITER ;

This should get you close to what you need to do..