Mysql – How to use Spatial Indexes for timestamps (non-spatial 1d data)

MySQLrange-typesspatialtimestamp

I'm using MySQL 8.0. I have some table containing TIMESTAMP column and I need to select within TIMESTAMP range:

SELECT *
FROM t_tbl
WHERE ts BETWEEN @someDate1 AND @someDate2

I want to make this query as fast as possible so I decide to try GEOMETRY trick. First of all I add some magik geometry column:

ALTER TABLE t_tbl
  ADD COLUMN geom_ts GEOMETRY NOT NULL
  GENERATED ALWAYS AS (Point(TO_SECONDS(ts),0)) STORED;
ALTER TABLE t_tbl ADD SPATIAL INDEX IDX_g_i (geom_ts);

So ts momentum is really a point onto X (time) axis. Now to check ts is in some range I'm using such geometry expression:

SELECT *
FROM t_tbl
WHERE MBRCONTAINS(LineString(point(@someDate1,0),point(@someDate2,0)), geom_ts)

This works but EXPLAIN doesn't show me any spatial index usage – it performs tablescan…

Best Answer

TIMESTAMP, DATETIME, INT, FLOAT, DECIMAL, etc are effectively "numbers". Numbers can be ordered/searched/etc with ordinary BTREE indexes. There is no need for SPATIAL for such one-dimensional quantities.

For date-like quantities, I like to do:

WHERE ts >= '2018-02-23'
  AND ts  < '2018-02-23' + INTERVAL 3 WEEK

INDEX(ts)

This avoids dealing with leap years, kludges like '23:59:59', inclusive/exclusive (not the careful choice of inequality operators), etc.

Related Solutions

Mysql – Implications of a second index on a timestamp field in a sequential table

Implication is that you need ~1144Mb extra storage to index column time for ~100M rows...

Because the InnoDB engine wiil store the data off an PRIMARY or UNIQUE index within an non PRIMARY or UNIQUE index, as result your secondary index will be become larger

How much larger?? you can calculate it with this formula

int = 4 bytes
datetime = 8 bytes

100000000 records * (4 + 8 bytes) = 
100000000 * 12 bytes ~ 1200000000 bytes ( 1144.40918 Mb ) extra storage (note index records/page overhead are not in the calculation)

An larger index size will slow down inserts, delete and only updates when you update an value whats indexed.. An larger index size in thoery can slow down selects because off the InnoDB index page off 16K (read http://www.ovaistariq.net/733/) But still it depends on innodb configuration and cached data within the innodb buffer pool..

Or maybe you can use your approach by using an lookup table

CREATE TABLE tracker_snapshot_lookup (
    tracker_date DATE NOT NULL
  , tracker_snapshot_start_id INT UNSIGNED NOT NULL 
  , tracker_snapshot_end_id INT UNSIGNED NOT NULL

  , PRIMARY KEY(tracker_date)
  --   Covering index below is overkill... 
  -- , PRIMARY KEY(tracker_date, tracker_snapshot_start_id, tracker_snapshot_end_id)
) ENGINE = InnoDB;


insert into tracker_snapshot_lookup values('2013-11-13', 1, 10000);
insert into tracker_snapshot_lookup values('2013-11-14', 10001, 20000);

If you use an JOIN or deliverd table the MySQL optimzer can use in worse case

1 index key (Random disk I/O) lookup on tracker_snapshot_lookup.date (assuming with WHERE tracker_date = '2013-11-13' )
1 table (Random disk I/O) record key for tracker_snapshot_start_id and tracker_snapshot_end_id (not necessary when you make it an covering index)

Based on tracker_snapshot_start_id and tracker_snapshot_end_id MySQL will most likly choose an range scan (sequential disk I/O what is low costing with I/O waittime) on the tracker_snapshot table.

Your savings

DATE            3 bytes
INT NOT NULL    4 bytes

So in one year you lose on storage...

Table data

356 days * (3 + 4 bytes) 
356 * 12 = 4272 bytes ( 0.004 Mb )

Index data 

356 days * (3 bytes) = 1068 bytes ( 0.001 Mb )

It's magic because you use that ~1143Mb storage space for more important data

Mysql – query records to fetch by time stamp

Since the seconds are from 0 (1970-01-01 00:00:00 UTC), you should look for every multiple of 60

SELECT * FROM mytable WHERE MOD(TimeStamp,60)=0;

or if TimeStamp is indexed, you can do

SELECT T.* FROM
(SELECT TimeStamp FROM mytable WHERE MOD(TimeStamp,60)=0) M
INNER JOIN mytable T USING (TimeStamp);

Give it a Try !!!

SUGGESTION #1

You should store the timestamp of the minute and index it

ALTER TABLE mytable ADD COLUMN MinuteTimeStamp AFTER TimeStamp;
UPDATE mytable SET MinuteTimeStamp = TimeStamp - MOD(TimeStamp,60);
ALTER TABLE mytable ADD INDEX MinuteTimeStamp_UniqueKey_ndx (MinuteTimeStamp,UniqueKey);

Then, you can do MIN aggregation on MinuteTimeStamp.

SELECT MinuteTimeStamp,MIN(UniqueKey) UniqueKey
FROM mytable GROUP BY MinuteTimeStamp;

and use it get those records

SELECT B.* FROM
(SELECT MinuteTimeStamp,MIN(UniqueKey) UniqueKey
FROM mytable GROUP BY MinuteTimeStamp) A
INNER JOIN mytable B USING (UniqueKey);

It was tactfully pointed out that triggers would degrade performance

Perhaps doing INSERTs like this may help

INSERT INTO mytable (UniqueKey,TimeStamp,MinuteTimeStamp) VALUES
(
    uniquevalue,
    UNIX_TIMESTAMP(NOW()),
    UNIX_TIMESTAMP(NOW() - INTERVAL SECOND(NOW()) SECOND)
);

SUGGESTION #2

Since you have over 1000 columns (Ugh), perhaps a table of those minute timestamps would be better.

CREATE TABLE MinuteKeys
(
    MinuteTimeStamp INT UNSIGNED NOT NULL,
    UniqueKey INT UNSIGNED NOT NULL,
    PRIMARY KEY (UniqueKey)
    KEY MinuteTimeStamp_UniqueKey_ndx (MinuteTimeStamp,UniqueKey)
) ENGINE=MyISAM;
ALTER TABLE MinuteKeys DISABLE KEYS;
INSERT INTO MinuteKeys SELECT TimeStamp - MOD(TimeStamp,60),UniqueKey FROM mytable;
ALTER TABLE MinuteKeys ENABLE KEYS;

Then, you could use that table for the aggregation

SELECT B.* FROM
(SELECT MinuteTimeStamp,MIN(UniqueKey) UniqueKey
FROM MinuteKeys GROUP BY MinuteTimeStamp) A
INNER JOIN mytable B USING (UniqueKey);

EPILOGUE

Other suggestions are possible but you should really consider normalization of the table

See my post Too many columns in MySQL as to why