Mysql – How to efficiently store impressions in db

MySQLmysql-5.6sqlalchemy

For my flask app I need to track impressions, clicks on images. The current implementation that I have is impressions table where I store impression ip, useragent, geo location and all the relationships that I would like to have. So every impression has it's own row. Mainly I'm interested in count of impressions by hour, day, date range, month.

I tested this implementation by inserting 2.5M rows in my impression table. The problem with this implementation is that count query are slow when I try to get impressions count for whole month.(5 seconds per query) Index doesn't help much here. I tried different query's and MySQL db optimization and results are not satisfactory.

In my opinion I will have to change the way I handle impression tracking. I was thinking to create a table that holds impressions count for every hour. When ever new impression happens I would just increment counter for that hour.

But before I go and rewrite bunch of code and make mistake again.

I'm asking here how other web app handle tracking impressions and get results so fast? Or what is the correct way to store impressions?

Hopefully some can guide me to fix this problem.

Best regards

Best Answer

Summary tables. Separate table for counts.

Are you keeping information about each impression? Or just a counter? It makes a big difference. But we can solve it either way.

If you go with a table with just daily counter:

CREATE TABLE FooImpressions (
    foo_id  INT UNSIGNED NOT NULL,  -- same as main table, but not AUTO_INCREMENT
    date DATE NOT NULL,
    counter INT UNSIGNED NOT NULL,
    PRIMARY KEY(foo_id, date)
) ENGINE=InnoDB;

And

INSERT INTO FooImpressions (foo_id, date, counter)
    VALUES (?, ?, 1)
    ON DUPLICATE KEY UPDATE
        counter = VALUES(counter) + 1;

This automatically starts a new row for each Foo each day (as needed).

Then, to get last weeks counts:

SELECT foo_id, SUM(counter) AS impressions
    FROM FooImpressions
    WHERE date >= CURDATE() - INTERVAL 7 DAY
      AND date  < CURDATE()
    GROUP BY foo_id;

Of course, the date range can be changed to handle any consecutive date range.

There are variants on this for more complex situations and for situations that need more performance. But this is a simple "summary table" together with its maintenance and its usage.

Yes, one variation would have DATETIME and truncate that to the hour. Don't have both an hourly summary table and a daily one until you have demonstrated that the hourly one is not fast enough.

If you need up-to-the-minute counts, some Summary techniques do not provide such (because the do nightly or hourly batch updates for speed); the use of IODKU is always up-to-the-minute.

Even if you have another table with more info on the impressions, have this Summary table for speed of reporting.

More discussion: http://localhost/rjweb/mysql/doc.php/summarytables