Mysql – Implications of a second index on a timestamp field in a sequential table

indexMySQLtimestamp

I have a MySQL table of a few gigabytes with roughly ~100M rows. I store data sequentially, such that the timestamps increase as the ID also increases. Because ordering and filtering by the timestamp is so tremendously slow, I often simply use the ID to filter date ranges. If I know that Wednesday's data started at ID=87000000 and that Thursday's data started at ID=90000000, I can find all of Wednesday's data by filtering between those two IDs.

Everything works fast enough, except for the part where I need to find the ID range. This part requires filtering by the timestamp, and that is the slow part.

My question is this: If I add a second index on the timestamp field, what would be the implications? Would my insert rate be severely impacted? Would the size of my table increase dramatically? Would this improve performance enough to justify the action? If I were to add this index, would I even need to use the ID range trick anymore?

I've also been toying around with the idea of creating a function that performs a binary search on the table to find an ID near a target time. My assumption is that MySQL scans and compares the most of the table contents with the target time. This means, on average, 50000000 rows scanned. A binary search would take, at worst, Log2(100000000)=~27 queries for a single ID.

Any comments would be appreciated. I'll get to work on my binary search algorithm and post the results.

EDIT: My schema is:

CREATE TABLE `tracker_snapshot` (
  `id` int(11) NOT NULL AUTO_INCREMENT,
  `tracker_id` int(11) NOT NULL,
  `time` datetime NOT NULL,
  `value1` decimal(13,4) NOT NULL,
  `value2` decimal(13,4) NOT NULL,
  `value3` decimal(13,4) NOT NULL,
  `value4` decimal(13,4) NOT NULL,
  `value5` decimal(13,4) NOT NULL,
  `value6` decimal(13,4) NOT NULL,
  PRIMARY KEY (`id`),
  KEY `tracker_snapshot_c9837659` (`tracker_id`),
  CONSTRAINT `tracker_id_refs_id_68c52750` FOREIGN KEY (`tracker_id`) REFERENCES `tracker_tracker` (`id`)
) ENGINE=InnoDB AUTO_INCREMENT=77107026 DEFAULT CHARSET=utf8

Best Answer

Implication is that you need ~1144Mb extra storage to index column time for ~100M rows...

Because the InnoDB engine wiil store the data off an PRIMARY or UNIQUE index within an non PRIMARY or UNIQUE index, as result your secondary index will be become larger

How much larger?? you can calculate it with this formula

int = 4 bytes
datetime = 8 bytes

100000000 records * (4 + 8 bytes) = 
100000000 * 12 bytes ~ 1200000000 bytes ( 1144.40918 Mb ) extra storage (note index records/page overhead are not in the calculation)

An larger index size will slow down inserts, delete and only updates when you update an value whats indexed.. An larger index size in thoery can slow down selects because off the InnoDB index page off 16K (read http://www.ovaistariq.net/733/) But still it depends on innodb configuration and cached data within the innodb buffer pool..

Or maybe you can use your approach by using an lookup table

CREATE TABLE tracker_snapshot_lookup (
    tracker_date DATE NOT NULL
  , tracker_snapshot_start_id INT UNSIGNED NOT NULL 
  , tracker_snapshot_end_id INT UNSIGNED NOT NULL

  , PRIMARY KEY(tracker_date)
  --   Covering index below is overkill... 
  -- , PRIMARY KEY(tracker_date, tracker_snapshot_start_id, tracker_snapshot_end_id)
) ENGINE = InnoDB;


insert into tracker_snapshot_lookup values('2013-11-13', 1, 10000);
insert into tracker_snapshot_lookup values('2013-11-14', 10001, 20000);

If you use an JOIN or deliverd table the MySQL optimzer can use in worse case

1 index key (Random disk I/O) lookup on tracker_snapshot_lookup.date (assuming with WHERE tracker_date = '2013-11-13' )
1 table (Random disk I/O) record key for tracker_snapshot_start_id and tracker_snapshot_end_id (not necessary when you make it an covering index)

Based on tracker_snapshot_start_id and tracker_snapshot_end_id MySQL will most likly choose an range scan (sequential disk I/O what is low costing with I/O waittime) on the tracker_snapshot table.

Your savings

DATE            3 bytes
INT NOT NULL    4 bytes

So in one year you lose on storage...

Table data

356 days * (3 + 4 bytes) 
356 * 12 = 4272 bytes ( 0.004 Mb )

Index data 

356 days * (3 bytes) = 1068 bytes ( 0.001 Mb )

It's magic because you use that ~1143Mb storage space for more important data

Best Answer

Related Solutions

Mysql – Adding index to large thesql tables

Mysql – Should I create a multi-column UNIQUE index

Related Question