Mysql – thesql – create a summary of event data in a event list by date even with 0 events in a given time frame

MySQL

Using MySQL 5.7, I am trying to create a summary of events given a list of events.

The event table looks like this:

mysql> select camtimestamp,camname from events where camtimestamp >= "2019-12-31" and camtimestamp < "2020-01-02" limit 10;
+---------------------+---------+
| camtimestamp        | camname |
+---------------------+---------+
| 2019-12-31 00:47:04 | wls1    |
| 2019-12-31 04:57:41 | wls1    |
| 2019-12-31 05:15:22 | wls1    |
| 2019-12-31 05:35:00 | wls1    |
| 2019-12-31 05:46:56 | wls1    |
| 2019-12-31 05:51:30 | wls1    |
| 2019-12-31 05:54:19 | wls1    |
| 2019-12-31 05:56:14 | wls1    |
| 2019-12-31 05:58:45 | wls1    |
| 2019-12-31 05:58:46 | wls1    |
+---------------------+---------+
10 rows in set (0.00 sec)

I want the summary to look like this except I want the camtimestamp in the first column to increment by the hour (or what ever increments I choose) and the count to be a summary of all events between the first row and the second.

mysql> select camtimestamp,camname,count(camname) from events where camtimestamp >= "2019-12-31" and camtimestamp < "2020-01-02" group by camtimestamp,camname limit 10;
+---------------------+---------+----------------+
| camtimestamp        | camname | count(camname) |
+---------------------+---------+----------------+
| 2019-12-31 00:47:04 | wls1    |              1 |
| 2019-12-31 04:57:41 | wls1    |              1 |
| 2019-12-31 05:15:22 | wls1    |              1 |
| 2019-12-31 05:35:00 | wls1    |              1 |
| 2019-12-31 05:46:56 | wls1    |              1 |
| 2019-12-31 05:51:30 | wls1    |              1 |
| 2019-12-31 05:54:19 | wls1    |              1 |
| 2019-12-31 05:56:14 | wls1    |              1 |
| 2019-12-31 05:58:45 | wls1    |              1 |
| 2019-12-31 05:58:46 | wls1    |              1 |
+---------------------+---------+----------------+
10 rows in set (0.00 sec)

What I am really going for is this (as a sample) result:

Date                  wls1  wls2
2020-01-05 12:00:00   5     7
2020-01-05 13:00:00   10    3
2020-01-05 14:00:00   2     0
2020-01-05 15:00:00   0     0
2020-01-05 16:00:00   0     2
2020-01-05 17:00:00   0     4
2020-01-05 18:00:00   1     23
2020-01-05 19:00:00   2     3
2020-01-05 20:00:00   3     4
2020-01-05 21:00:00   4     5

I have dabbled with creating a table that contains only timestamps from now until 2040 in hourly increments and using a join in my SELECT statement, but I can't seem to get that join between the two tables quite right.

EDIT

The answer below gets most of the way to answering my question. It certainly shows me that I have a ways to go to learn about mySQL and general db work.

After slightly modifying the SELECT statement it is as follows:

SELECT 
  startpit, 
  SUM(IF(camname='wls1', 1,0)) as wls1,
  SUM(IF(camname='wls2', 1,0)) as wls2
FROM 
(
  SELECT
    FROM_UNIXTIME(FLOOR(UNIX_TIMESTAMP(camtimestamp)/@interval) * @interval
) AS startpit,
  camname,
  count(*) as num
FROM events,
(
  SELECT @interval := 3600) AS init

  GROUP BY startpit, camname
) AS baseview
GROUP BY startpit;

It produces this output:

+---------------------+------+------+
| startpit            | wls1 | wls2 |
+---------------------+------+------+
| 2019-05-01 21:00:00 |    0 |    1 |
| 2019-05-01 22:00:00 |    0 |    1 |
| 2019-05-02 06:00:00 |    0 |    1 |
| 2019-05-02 07:00:00 |    1 |    1 |
| 2019-05-02 08:00:00 |    1 |    1 |
| 2019-05-02 09:00:00 |    1 |    1 |
| 2019-05-02 10:00:00 |    1 |    1 |
| 2019-05-02 11:00:00 |    1 |    1 |
| 2019-05-02 12:00:00 |    1 |    1 |
| 2019-05-02 13:00:00 |    1 |    1 |
+---------------------+------+------+

I can see how to get these results and am warned about the only_full_group_by setting in MySQL 5.7.

There are still time gaps in the output where I would prefer to have zeros in both camname columns. I can see that my PHP will have to dynamically create the SQL select statements and dynamically process the results/output.

Thank you for taking the time to answer my question!

Best Answer

2 Tricks needed:

  • UNIX_TIMESTAMP() allows application of integer division to datetimes
  • Pivoting isn't part of SQL proper, but group_concatcan do something similar.

Putting it together:

select 
  startpit, 
  FROM_UNIXTIME(UNIX_TIMESTAMP(startpit)+@interval) as endpit,
  group_concat(concat(camname,': ',num)) as events
from (
select
  from_unixtime(
    floor(
      unix_timestamp(camtimestamp)/@interval
    )*@interval
  ) as startpit,
  camname,
  count(*) as num
from events,
(select @interval:=3600) as init
-- where 
group by floor(unix_timestamp(camtimestamp)/@interval)*@interval, camname
) as baseview
group by startpit;

See this SQLfiddle (I replaced changed every second rows from 'wsl1' to 'wsl2' since your test data has only one cam.

EDIT

MySQL 8.0 defaults to a very buggy implementation of strict grouping. It will e.g. think, that FROM_UNIXTIME(FLOOR(UNIX_TIMESTAMP(camtimestamp)/@interval) * @interval is (quote) "not functionally dependent" on FLOOR(UNIX_TIMESTAMP(camtimestamp)/@interval) * @interval. This can easily be resolved by using the name of the calcualted column (startpit). So

SELECT 
  startpit, 
  FROM_UNIXTIME(UNIX_TIMESTAMP(startpit)+@interval) AS endpit,
  GROUP_CONCAT(CONCAT(camname,': ',num)) AS the_events
FROM 
(
  SELECT
    FROM_UNIXTIME(FLOOR(UNIX_TIMESTAMP(camtimestamp)/@interval) * @interval
) AS startpit,
  camname,
  count(*) as num
FROM event,
(
  SELECT @interval := 3600) AS init
-- where 
  GROUP BY startpit, camname
) AS baseview
GROUP BY startpit;

works with MySQL 8.0, while the functionally identical original does not. This is not a flaw in the SQL, but in how MySQL 8.0 understands it. Unexperienced SQL programmers might consider this a good thing.

EDIT 2

There is no pivoting in SQL, i.e. the number of columns can not depend on the data so the format with a variable number of columns and an integer in each is not reachable via pure SQL.

Assuming, that the SQL is generated by another layer higher in the stack you can reach it by creating SQL along the lines of

SELECT 
  startpit, 
  FROM_UNIXTIME(UNIX_TIMESTAMP(startpit)+@interval) AS endpit,
  SUM(IF(camname='wls1', 1,0)) as wls1,
  SUM(IF(camname='wls2', 1,0)) as wls2,
  SUM(IF(camname='wls3', 1,0)) as wls3
  -- and so on
FROM 
(
  SELECT
    FROM_UNIXTIME(FLOOR(UNIX_TIMESTAMP(camtimestamp)/@interval) * @interval
) AS startpit,
  camname,
  count(*) as num
FROM event,
(
  SELECT @interval := 3600) AS init
-- where 
  GROUP BY startpit, camname
) AS baseview
GROUP BY startpit;

You can even put creating this SQL as a string into a stored procedure and then use PREPARE STATEMENT and EXECUTE to run it. Be advised, that tis creates a stored procedure where the number of output columns depends on the data, which is unusual at least.