Mysql – How to re-write this query or restructure the table for SPEED

datetimegroup byMySQLperformance

I have an InnoDB MySQL table of this structure:

column type
service varchar(16)
url varchar(255)
datetime datetime

It contains logs for requests made to one of multiple services. It contains ~3 million rows, and gets ~3 million more rows a month (but I will delete old data if need be).

I am trying to generate a report, and get the number of requests made to each service within a date range.

Here is my current query:

  SELECT service, COUNT(*) as lastMonthCount 
    FROM request_logs 
   WHERE datetime > '2021-02-16 10:51:05' 
GROUP BY service

This works, but is painfully slow (~28 seconds).

It outputs this:

service lastMonthCount
API A 3056752
API B 38451

I have indexes on datetime and service. I can see they are of type BTREE.

How can I radically speed up this query, or restructure my table/indexes so I can achieve the same use case another way?

Best Answer

For a relatively simple query like that there isn't a lot you can do to optimise it as there is little room for change (there isn't a simple way of asking for that data).

You can probably reduce the amount of pages being touched as it runs significantly by having an index on datetime and service. This way the data it needs to group by will already be available in what it has read to perform the filter on the date. This will increase the amount of data on disk as the index will be larger, and slow down writes a touch for the same reason. You would probably want to replace the existing index on datetime with the new composite index instead of just adding it, for those reasons.

A little more detail on why this will be faster:

With the index just on the datetime, it will need to read the base table data pages to get the service column for each matching row which as well as being extra page reads just because of referencing the other structure it will possibly be many more page reads because those pages contain less rows each due to the larger data per row (they include the URL column and any other properties that you add, or might already have, for the services).

If that still isn't fast enough...

You may need to look at some form of caching for the counts. There are several options here:

  • a materialised view may work if they are intelligent enough in mysql
  • a little denormalising, by including a counts table updated by trigger when the main table is updated (this is essentially a more "manual" version of a materialised view)
  • something in the application layer, if you can live with the result not necessarily being 100% up-to-date every time (the fastest option by far if done right, but obviously with that key disadvantage)
  • if the data is big enough that the query is reading from storage rather than RAM each time and you don't want to use one/more of the three options above: throw hardware at the problem and buy oodles of RAM! (this is usually not a good solution, though sometimes can be)