Mariadb – How to speed up this query

indexmariadbmariadb-10.1performancequery-performance

Here is the definition of the relevant table:

CREATE TABLE IF NOT EXISTS pricehistory (
        id INTEGER(6) UNSIGNED AUTO_INCREMENT PRIMARY KEY,
        price INTEGER NOT NULL,
        region VARCHAR(20) NOT NULL,
        timestamp TIMESTAMP NOT NULL,
        is_from_api INTEGER DEFAULT 0)

Here is the query at issue:

SELECT MAX(price) AS price 
FROM pricehistory 
WHERE UNIX_TIMESTAMP(CONVERT_TZ(timestamp, '+00:00', 'SYSTEM')) 
      > 
      UNIX_TIMESTAMP(CONVERT_TZ(DATE_SUB(NOW(), INTERVAL 1000 DAY), '+00:00', 'SYSTEM')) 
  AND region='us';

Here is the Explain

id: 1
select_type: SIMPLE
table: pricehistory
type: ALL
possible_keys: NULL
key: NULL
key_len: NULL
ref: NULL
rows: 368040
Extra: Using where

Ok so this query runs really slow. I think it takes about 500ms to run for only 300k-400k rows.

Roughly 1/5 of all the rows have a region='us'. If I remove the UNIX_TIMESTAMP() function it drops the run time by about 100ms so I'll probably do that.

I've tried indexing, but I'm new to indexing and have never done it before. All the indexing I've done has either had no effect or slowed the query down even more.

I need the max price in the last x days, of only 1/5 of the rows (region='us') in the whole table. I can't think of a way to build an index out of that.

Note: I also have a query that gets the minimum price.

Server version: 10.1.35-MariaDB – mariadb.org binary distribution.

Best Answer

Because your WHERE clasue referencing timestamp has the column wrapped in a function, it is not going to be able to use an index efficiently to may currently be scanning the whole table.

An index on region may help, but not a lot unless that column is quite selective (or at all if the majority of the rows are from the region you are looking at).

As you are applying the same functions to the column and the value you are searching through it with, you can just drop that: the inequality

UNIX_TIMESTAMP(CONVERT_TZ(timestamp, '+00:00', 'SYSTEM')) > UNIX_TIMESTAMP(CONVERT_TZ(DATE_SUB(NOW(), INTERVAL 1000 DAY), '+00:00', 'SYSTEM'))

is exactly equivelant to just

timestamp > DATE_SUB(NOW(), INTERVAL 1000 DAY)

and that will allow in index on timestamp to be useful.

With that change in place, a combined index on both timestamp and region is likely to be most helpful here. Perhaps also adding price to the end then the query running will not need separate lookups to find that column after filtering by the others (though this will increase the size of the index so you are trading space for perhaps only a small amount of speed).