Mysql – Replace value based on previous row / next row depending on conditions

MySQL

I have a price table, contain 2.9 million rows of price figures belong to a total of 941 of company_id for three months. 0.00 represents missing data. As an example:

   +------------+---------------------+------------+
   | company_id | date_time           | price_open |
   +------------+---------------------+------------+
   |      4     | 13/12/2014 16:23:00 |    56.30   |
   |      4     | 13/12/2014 16:24:00 |    56.25   |
   |      4     | 13/12/2014 16:25:00 |    56.25   | <--- last price of `company_id` = `4`
   |      5     |  22/9/2014 08:00:00 |     0.00   | <--- first price of `company_id` = `5`
   |      5     |  22/9/2014 08:01:00 |     0.00   |
   |      5     |  22/9/2014 08:02:00 |     5.45   |
   |      5     |  22/9/2014 08:03:00 |     5.25   |
   |      5     |  22/9/2014 08:04:00 |     0.00   |
   |      5     |  22/9/2014 08:05:00 |     0.00   |
   +------------+---------------------+------------+

May I know if I can achieve the below by entirely using MySQL command? If yes, any guidance/help would be much appreciated.

  1. Loop through each company_id and sort price_open by its
    date_time, then
  2. if found any price_open = 0.00,
    replace this/these 0.00 value(s) with previous known price_open
    figure, but
  3. if the price_open = 0.00 is in the very
    first row or first few rows of a company_id, do not replace with
    previous known price_open figure. Instead, use the next known
    price_open figure – because it shouldn't mix up with other
    company_id.

Expected result will be as follow:

   +------------+---------------------+------------+
   | company_id | date_time           | price_open |
   +------------+---------------------+------------+
   |      4     | 13/12/2014 16:23:00 |    56.30   |
   |      4     | 13/12/2014 16:24:00 |    56.25   |
   |      4     | 13/12/2014 16:25:00 |    56.25   | <--- last price of `company_id` = `4`
   |      5     |  22/9/2014 08:00:00 |     5.45   | <--- first price of `company_id` = `5`
   |      5     |  22/9/2014 08:01:00 |     5.45   |
   |      5     |  22/9/2014 08:02:00 |     5.45   |
   |      5     |  22/9/2014 08:03:00 |     5.25   |
   |      5     |  22/9/2014 08:04:00 |     5.25   |
   |      5     |  22/9/2014 08:05:00 |     5.25   |
   +------------+---------------------+------------+

I have tried searching about this but those weren't really close to what I am trying to achieve.

Note: It doesn't matter if it has to be done in a new table rather than original price table.

Thank you very much in advanced! 🙂

Best Answer

Add this composite index if you don't already have it -- it should probably be the PRIMARY KEY.

INDEX(company_id, date_time)

Run this a few times (until it stops modifying any rows)

UPDATE    tbl a
    JOIN  tbl b
       ON  b.company_id = a.company_id
      AND  b.date_time  = a.date_time + INTERVAL 1 MINUTE
    SET a.price_open = b.price_open
    WHERE  a.price_open  = 0.00
      AND  b.price_open != 0.00; 

In case I am wrong, suggest you copy the table over for testing:

CREATE TABLE tbl LIKE real_tbl;
ALTER TABLE tbl DROP PRIMARY KEY,
                ADD  PRIMARY KEY(company_id, date_time);  -- if needed
INSERT INTO tbl SELECT * FROM real_tbl;
run the update a few times
check the values
if satisfied, ...
RENAME TABLE real_tbl TO old, tbl TO real_tbl;
DROP TABLE old;

On the presumption that your SELECTs will be company-centric, it is better to have PRIMARY KEY(co, dt) than (dt, co).