I have a MySQL table of a few gigabytes with roughly ~100M rows. I store data sequentially, such that the timestamps increase as the ID also increases. Because ordering and filtering by the timestamp is so tremendously slow, I often simply use the ID to filter date ranges. If I know that Wednesday's data started at ID=87000000 and that Thursday's data started at ID=90000000, I can find all of Wednesday's data by filtering between those two IDs.
Everything works fast enough, except for the part where I need to find the ID range. This part requires filtering by the timestamp, and that is the slow part.
My question is this: If I add a second index on the timestamp field, what would be the implications? Would my insert rate be severely impacted? Would the size of my table increase dramatically? Would this improve performance enough to justify the action? If I were to add this index, would I even need to use the ID range trick anymore?
I've also been toying around with the idea of creating a function that performs a binary search on the table to find an ID near a target time. My assumption is that MySQL scans and compares the most of the table contents with the target time. This means, on average, 50000000 rows scanned. A binary search would take, at worst, Log2(100000000)=~27 queries for a single ID.
Any comments would be appreciated. I'll get to work on my binary search algorithm and post the results.
EDIT: My schema is:
CREATE TABLE `tracker_snapshot` (
`id` int(11) NOT NULL AUTO_INCREMENT,
`tracker_id` int(11) NOT NULL,
`time` datetime NOT NULL,
`value1` decimal(13,4) NOT NULL,
`value2` decimal(13,4) NOT NULL,
`value3` decimal(13,4) NOT NULL,
`value4` decimal(13,4) NOT NULL,
`value5` decimal(13,4) NOT NULL,
`value6` decimal(13,4) NOT NULL,
PRIMARY KEY (`id`),
KEY `tracker_snapshot_c9837659` (`tracker_id`),
CONSTRAINT `tracker_id_refs_id_68c52750` FOREIGN KEY (`tracker_id`) REFERENCES `tracker_tracker` (`id`)
) ENGINE=InnoDB AUTO_INCREMENT=77107026 DEFAULT CHARSET=utf8
Best Answer
Implication is that you need ~1144Mb extra storage to index column time for ~100M rows...
Because the InnoDB engine wiil store the data off an PRIMARY or UNIQUE index within an non PRIMARY or UNIQUE index, as result your secondary index will be become larger
How much larger?? you can calculate it with this formula
An larger index size will slow down inserts, delete and only updates when you update an value whats indexed.. An larger index size in thoery can slow down selects because off the InnoDB index page off 16K (read http://www.ovaistariq.net/733/) But still it depends on innodb configuration and cached data within the innodb buffer pool..
Or maybe you can use your approach by using an lookup table
If you use an JOIN or deliverd table the MySQL optimzer can use in worse case
Your savings
It's magic because you use that ~1143Mb storage space for more important data