Mysql – Identifying number of rows (COUNT) that fall between two timestamps for each minute in the hour

mariadbMySQLquery

Good day all!

I'm attempting to write some SQL that can identify the number of rows that fall between two timestamps but iteratively count them up.

An example use case, a telephone call CDR for one user:

|Call Number|Start Time         |EndTime            |
|===========|===================|===================|
|1          |2019-01-01 10:01:23|2019-01-01 10:05:44|
|2          |2019-01-01 10:02:33|2019-01-01 10:05:22|
|3          |2019-01-01 10:03:50|2019-01-01 10:06:11|

I'd like a query that could produce this sort of result, per each minute for the hour (using the example data above, Call ID is an arbitrary number, so can be ignored in this context, it's the count of the calls in progress that's important):

|Time Chunk         |Calls In Progress|
|===================|=================|
|2019-01-01 10:00:00|                0|
|2019-01-01 10:01:00|                1|
|2019-01-01 10:02:00|                2|
|2019-01-01 10:03:00|                3|
|2019-01-01 10:04:00|                3|
|2019-01-01 10:05:00|                3|
|2019-01-01 10:06:00|                1|


A script would iterate through customers to produce reports per customer so the WHERE clause would consist of something like WHERE user = 123 for example.

I've tried a couple of queries, but they will only produce instantaneous results based on the start time of the call and then show me how many were in progress for that particular minute of the day, with a load of gaps in between, and not counting for any calls that are "still in progress", as follows:

SELECT 
    FROM_UNIXTIME(
        CEILING(
            `timestamp`/60
        )*60
    ) AS timeslice,
    COUNT(*) AS calls_in_progress
FROM
    calls
WHERE 
    account = {ACCOUNT_ID}
    AND `timestamp` >= UNIX_TIMESTAMP('2019-01-01 00:00:00')
    AND `timestamp` <= UNIX_TIMESTAMP('2019-01-01 23:59:59')
GROUP BY
    timeslice
WITH ROLLUP;

…which in turn produced the following undesirable result (the rollup is useful but not required)…

|timeslice          |calls_in_progress|
|===================|=================|
|2019-01-01 09:35:00|               18|
|2019-01-01 12:10:00|                2|
|2019-01-01 12:11:00|                2|
|2019-01-01 12:19:00|                1|
|2019-01-01 12:49:00|               51|
|NULL               |               74|

Performance impact of said query is quite important given this is going through a table with millions of rows, so the use of the indexes associated to timestamps would be critical.

Any input would be super useful.

Best Answer

I'd like a query that could produce this sort of result

Look at

WITH RECURSIVE 
cte1 AS ( SELECT CAST(DATE_FORMAT(MIN(StartTime), '%Y-%m-%d %H:%i:00') AS DATETIME) mintime,
                 CAST(DATE_FORMAT(MAX(EndTime), '%Y-%m-%d %H:%i:00') AS DATETIME) maxtime
                 FROM srctable ),
cte2 AS ( SELECT mintime timechunk, 
                 mintime + INTERVAL 1 MINUTE nextchunk
          FROM cte1
        UNION ALL
          SELECT cte2.nextchunk, 
                 cte2.nextchunk + INTERVAL 1 MINUTE
          FROM cte1, cte2
          WHERE cte2.timechunk < cte1.maxtime )
SELECT cte2.timechunk time_chunk, 
       COUNT(*) calls_in_progress
FROM cte2
LEFT JOIN srctable ON cte2.timechunk < srctable.endtime
                  AND cte2.nextchunk >= srctable.starttime
GROUP BY cte2.timechunk 
/* WITH ROLLUP 
   ORDER BY cte2.timechunk IS NULL, cte2.timechunk */

fiddle