Mysql – sum interval dates

MySQL

I would like to sum the hours for each name, giving a total interval between START and END activities. It would be simple if I could subtract from each record the end of the beginning. For example, Mary started at 13:00 and worked up to 15:00 and started another activity from 14:00 to 16:00. I would like the result to be 3 (she used 3 hours of their time to perform both activities).

Example data

Name    |    START               |    END                 |
-----------------------------------------------------------
KATE    | 2014-01-01 13:00:00    | 2014-01-01 14:00:00    |
MARY    | 2014-01-01 13:00:00    | 2014-01-01 15:00:00    |
TOM     | 2014-01-01 13:00:00    | 2014-01-01 16:00:00    |
KATE    | 2014-01-01 12:00:00    | 2014-01-02 04:00:00    |
MARY    | 2014-01-01 14:00:00    | 2014-01-01 16:00:00    |
TOM     | 2014-01-01 12:00:00    | 2014-01-01 18:00:00    |
TOM     | 2014-01-01 22:00:00    | 2014-01-02 02:00:00    |

Result

KATE    15 hours
MARY     3 hours
TOM      9 hours

Best Answer

You could merge overlapping intervals and count the hours from there (untested):

select name, min(start), end 
from (
    select x.name, x.start, min(y.end) as end 
    from t as x 
    join t as y 
        on x.name = y.name 
       and x.start <= y.end 
       and not exists (
           select 1 
           from t as z 
           where y.name = z.name 
             and y.end >= z.start 
             and y.end < z.end
       ) 
    where not exists (
        select 1 
        from t as u 
        where x.name = u.name 
          and x.start > u.start 
          and x.start <= u.start
    ) 
    group by x.name, x.start
) as v group by name, end;

Related Solutions

MySQL: LEFT OUTER JOIN within reason

First consider a query that computes which rows are actually relevant from tablethree. With the assumption that with "most recently entered result" you mean "most recent enddate" the following query would gather the appropriate rows:

SELECT sid, MAX(enddate) FROM `tablethree` GROUP BY sid

Now you can build a join to retrieve not only sid, but all of the data of tablethree:

SELECT a.*
FROM tablethree a
INNER JOIN (
  SELECT sid, MAX(enddate) FROM `tablethree` GROUP BY sid
) b
ON a.sid = b.sid AND a.enddate = b.enddate

This is the result set you actually want to "left join in". You have to insert this into your original query:

SELECT t1.*
FROM tableone AS t1
INNER JOIN tabletwo AS t2
  ON t1.cid = t2.id
LEFT OUTER JOIN (
  SELECT a.*
  FROM tablethree a
  INNER JOIN (
    SELECT sid, MAX(enddate) FROM `tablethree` GROUP BY sid
  ) b
  ON a.sid = b.sid AND a.enddate = b.enddate
) AS t3
  ON t3.sid = t2.sid
WHERE t1.fieldone = 1 
  AND t1.odate NOT BETWEEN t3.startdate AND t3.enddate

What should also work is the following:

SELECT t1.*
FROM tableone AS t1
INNER JOIN tabletwo AS t2
  ON t1.cid = t2.id
LEFT OUTER JOIN tablethree AS t3
  ON t3.sid = t2.sid
LEFT OUTER JOIN (
  SELECT sid, MAX(enddate) FROM `tablethree` GROUP BY sid
) mostrecent
  ON t3.sid = mostrecent.sid AND t3.enddate = mostrecent.enddate

WHERE t1.fieldone = 1 
  AND t1.odate NOT BETWEEN t3.startdate AND t3.enddate
  AND mostrecent.enddate IS NULL

This includes both tablethree and the new SELECT as left joins, and sorts out the rows where mostrecent.enddate IS NULL (meaning those rows which are actually not most recent). This should lead to the same result, but MySQL may be able to compute this result a little faster. EXPLAIN on both queries should reveal possible differences in computation.

Sql-server – Design of an application log database

I did the following

CREATE TABLE L(
Time_Series_TS TIMESTAMP, 
Channel VARCHAR(10), 
Operation VARCHAR(10), 
Function VARCHAR(10), 
Duration INT);

Then

INSERT INTO L VALUES('2014-06-10 09:00:03.457', 'Channel1', 'Operation3', 'Function15', 15);
INSERT INTO L VALUES('2014-06-10 09:00:08.245', 'Channel2', 'Operation5', 'Function10', 22);
INSERT INTO L VALUES('2014-06-10 09:00:22.005', 'Channel1', 'Operation3', 'Function15', 48);
INSERT INTO L VALUES('2014-06-10 09:01:03.457', 'Channel2', 'Operation3', 'Function15', 296);
INSERT INTO L VALUES('2014-06-10 09:01:08.245', 'Channel2', 'Operation5', 'Function10', 225);
INSERT INTO L VALUES('2014-06-10 09:01:22.005', 'Channel1', 'Operation3', 'Function15', 7);
INSERT INTO L VALUES('2014-06-10 09:01:16.245', 'Channel2', 'Operation5', 'Function10', 10);
INSERT INTO L VALUES('2014-06-10 09:01:47.005', 'Channel1', 'Operation3', 'Function15', 20);

I added a few records to your sample for checking. Then ran this query

SELECT MINUTE(Time_Series_TS) AS Minute, Channel, Operation, Function, 
COUNT(*) AS "Count/min", SUM(Duration) AS Duration 
FROM L
GROUP BY Minute, Channel, Operation, Function
ORDER By Minute, Channel, Operation, Function;

Which gave

+--------+----------+------------+------------+-----------+----------+
| Minute | Channel  | Operation  | Function   | Count/min | Duration |
+--------+----------+------------+------------+-----------+----------+
|      0 | Channel1 | Operation3 | Function15 |         2 |       63 |
|      0 | Channel2 | Operation5 | Function10 |         1 |       22 |
|      1 | Channel1 | Operation3 | Function15 |         2 |       27 |
|      1 | Channel2 | Operation3 | Function15 |         1 |      296 |
|      1 | Channel2 | Operation5 | Function10 |         2 |      235 |
+--------+----------+------------+------------+-----------+----------+

Which appears to be the result you want (note 63 as the 1st duration as per my earlier comment). Is this the result you wanted? You can then use HOUR() and DAYOFMONTH() and even YEAR() to aggregate over these also with this query.

For performance, I did create an index

CREATE INDEX L_Index ON L(Channel, Operation, Function) using BTREE;

and explained the query before and after creating it, but there was no difference. This is hardly a surprise, since the optimizer probably said that there's no point in using one for such a small table. Obviously, I can't test with your data, but there are a couple of points. If you are performing this operation over a large number of records with a large no. of fields, you may run into issues and if you create many indexes, your insert performance will decrease. Is it possible for you to categorise your data in some way to reduce the number of fields - i.e. split your big table into ones with a smaller number of fields? Check out different scenarios, test and see what happens with your data, your queries, your application and your hardware.

[EDIT]

For something more human readable, you might like to try something like

SELECT TIME(FROM_UNIXTIME(UNIX_TIMESTAMP(Time_Series_TS) - MOD(UNIX_TIMESTAMP(Time_Series_TS), 60))) AS Minute,
..
..

for your first field.

[EDIT - Response to UPDATE-1]

OK - so in my schema, you are indexing by (Minute, Channel, Operation, Function)? See here for the docco on composite indexes in MySQL. If your queries have a predominatly left-right orientation, i.e you [always | usually] query Channel first and then Operation, then Function, you could try an index on Minute + (the usual three). If it's fairly arbitrary, then you could try using 6 indexes, but this will hit insert performance. How much, I can't say, but if this is a DW type app which performs the analysis, you can batch the inserts and only occasionally take the hit for that. You'll have to do a few tests with realistic data and EXPLAIN your queries - with realistic sample data, as I said earlier, the Optimiser with just a few records ignores indexes because the table is too small. Interestingly, on the MySQL man page given above, there's a hashing strategy which looks interesting - take MD5 hashes of CONCAT(Your_Column_List_Here). One other thing that I can suggest is that instead of using the

SELECT TIME(FROM_UNIXTIME(UNIX_TIMESTAMP(Time_Series_TS) - MOD(UNIX_TIMESTAMP(Time_Series_TS), 60))) AS Minute,...

Just remove the TIME() function and then you'll be storing INTs which appears to be better than indexes on DATETIMES - see here for a benchmark. Also as previously mentioned, you should remove your data from Production and perform the OLAP/DW on another machine. You could also test out the InfiniDB solution that I suggested. It's drop-in compatible with MySQL (no learning curve). Then there are all the NoSQL solutions - we could be here all day :-). Take a look at a few scenarios, evaluate and test and then choose what best fits your budget and requirements. Forgot: Make your OLAP/DW system read only for performing queries - no transactional overhead! Make the OLAP/DW tables MyISAM? This last one is controversial - again, test and see.

Best Answer

Related Solutions

MySQL: LEFT OUTER JOIN within reason

Sql-server – Design of an application log database

Related Question