MySQL – Updating Large Data Set with Time Aggregation

mysql-5.7queryquery-performanceupdate

I am having a large MySQL (5.7) table with millions of rows (contains data for each second). Based on these values calculations should be performed, in some cases for large data ranges. Therefore, I want to perform a preprocessing where I perform the calculations beforehand and store the result in a separate table. Due to some internal reasons, aggregations are performed so, for example, I need the avaerage value of the formula for one hour. Here is an example:

CREATE TABLE `datavalues` (
    `id` BIGINT(20) NOT NULL AUTO_INCREMENT,
    `ddate` DATE NOT NULL,
    `ttime` TIME NOT NULL,
    `unixtime` BIGINT(20) NOT NULL,
    `value1` DOUBLE,
    `value2` DOUBLE,
    PRIMARY KEY (`id`),
    INDEX `unixtime_idx` (`unixtime`) 
);
CREATE TABLE `calculation_result` (
    `id` BIGINT(20) NOT NULL AUTO_INCREMENT,
    `ddate` DATE NOT NULL,
    `ttime` TIME NOT NULL,
    `unixtime` BIGINT(20) NOT NULL,
    `result1` DOUBLE,
    PRIMARY KEY (`id`),
    INDEX `unixtime_idx` (`unixtime`) 
);

As an exemplary calculation I calculate value1-value2. The initial data query for filling the calculation table with one value for each hour could look like this:

INSERT INTO calculation_result (ddate, ttime, unixtime, result1)
select max(ddate), STR_TO_DATE(TIME_FORMAT(max(ttime),'%H:00:00'),'%H:%i:%s'), 
UNIX_TIMESTAMP(adddate(max(ddate), INTERVAL HOUR(max(ttime)) HOUR)), AVG(value1 - value2)
FROM datavalues GROUP BY ddate, hour(ttime);

So far so good. However, I am struggling to implement a query for updating the result1 column in the calculation_result table in case the calculation description changes, for example, from value1 – value2 to value1 + value2. How can I efficiently update the result1 column? Deleting and recreating the calculation table is not an option as it contains a lot of further calculations in different columns.
Additionally, I expect the updating process to take some time. As data is newly imported on a regular basis, how can I prevent the calculation_result table to be locked? Maybe using bulk updates of data chunks?
Thanks for your help.

Best Answer

An UPDATE can be done by Joining the SELECT that you have, only you need some aliases for the columns.

The logic isd simple you can join both tables with the ddate and the time which is for example always 01:00:00 as You group by data and hour

UPDATE calculation_result cr
        INNER JOIN
    (SELECT 
        MAX(ddate) ddate,
            STR_TO_DATE(TIME_FORMAT(MAX(ttime), '%H:00:00'), '%H:%i:%s') AS ttime,
            UNIX_TIMESTAMP(ADDDATE(MAX(ddate), INTERVAL HOUR(MAX(ttime)) HOUR)) unixtime,
            AVG(value1 - value2) avgresult
    FROM
        datavalues
    GROUP BY ddate , HOUR(ttime)) t1 ON cr.ddate = t1.ddate
        AND cr.ttime = t1.ttime 
SET 
    cr.result1 = t1.result1 ,cr.unixtime = t1.unixtime