I have a table like this
CREATE TABLE `agentpositions` (
`idPosition` INT(11) NOT NULL AUTO_INCREMENT,
`idAgent` INT(11) NULL DEFAULT NULL,
`hardwareSerial` VARCHAR(128) NOT NULL,
`LatitudePoint` DECIMAL(18,12) NOT NULL,
`LongitudePoint` DECIMAL(18,12) NOT NULL,
`PointDate` DATE NULL DEFAULT NULL,
`TimeStamp` TIMESTAMP NOT NULL DEFAULT CURRENT_TIMESTAMP,
...
PRIMARY KEY (`idPosition`),
INDEX `idAgent` (`idAgent`),
INDEX `idagent_pointdate` (`idAgent`, `PointDate`) -- next one is getting ignored and this one is getting used
INDEX `pointdate_idagent` (`PointDate`, `idAgent`) -- this is getting ignored
)
ENGINE=InnoDB
we query this table with both (idAgent,TimeStamp) and because this table is going to get very big, we put index on both of them
but for sake of architecture (we think date-int is faster and more optimized than timestamp-int, is that event true?? ) we avoided putting index on timestamp, instead we added another field which stores only date of the timestamp, and added index on that, but I don't know why it ignores my index
my query looked like this:
SELECT idPosition , idAgent , LatitudePoint , LongitudePoint , TimeStamp AS StartTime
FROM `agentpositions` `t` IGNORE INDEX(`idAgent`, PRIMARY)
WHERE (
t.idAgent=880 AND t.PointDate >= "2015-04-04"
AND (t.TimeStamp >= "2015-04-04 00:00:00" and t.TimeStamp < "2015-04-04 23:59:59"))
AND t.LatitudePoint != 0 and t.LongitudePoint != 0
ORDER BY t.TimeStamp
so after not identifying the index, I managed to change it to this which uses that index!
SELECT idPosition , idAgent , LatitudePoint , LongitudePoint , TimeStamp AS StartTime
FROM agentpositions t IGNORE INDEX(idAgent, PRIMARY)
WHERE
t.idAgent=880 AND t.PointDate in ("2015-04-04","2015-04-05","2015-04-03" )
AND t.LatitudePoint != 0 and t.LongitudePoint != 0
order by t.TimeStamp
I want to be able to give time range like >='2014-01-01'
what is wrong with this index? how can I achive this?
UPDATE : I totally forgot to include my first index that did not get used, I updated the table
Best Answer
This is a nice a example to explain how the query optimizer works, because in my test server, with a couple or rows, your first query uses an index (look at the usage of
EXPLAIN
):While the second doesn't:
The main reason why an index gets or not used is, first- if it can be used (this happens for both cases, as you can see on the
possible_keys
row, and second, if it considers using the key the best strategy to retrieving the rows (based on the cardinality of the columns).I have shown this graph many times:
If the filter that can be used for an index is not very selective (for example, the range is too large), the optimizer will prefer to use a full table scan, as it is a faster strategy for reading a large number of rows. If you want to know the details for your particular query, and youa re using MySQL 5.6, you can use the
OPTIMIZER_TRACE
feature to understand why one plan was preferred over another.If you believe a discarded plan is better, you can force the usage of an index by using
FORCE INDEX(idagent_pointdate)
. Be careful because manually forcing the usage of an index is a dangerous practice, as if the data cardinality changes and the usage of the index is no longer the best plan, you will continue blindly using it.Aside from your main question, there are many other issues regarding your query and indexes, that I will try to address here:
Your
idAgent
index is redundant- you can useidagent_pointdate
for everything that you can do with that index, so it is useless, and will take away resources (memory, disk) but also slow down certain tasks like the query optimizer having to take it into account and writes (as an additional index has to be updated as well). Use tools like pt-duplicate-key-checker to search for redundant indexes on your database.I do not get well what you mean with date-int and timestamp-int, but if you mean the practicality of having what basically is having redundant information, one storing second precision and another storing days: that may a useful physical optimization of the table IF you need to index days instad of timestamps. If you just need to index the timestamp using ranges, like
t.TimeStamp >= "2015-04-04" and t.TimeStamp < "2015-04-05"
(check the syntax, you are missing 1 seccond) this is again a waste of resources. There are cases like having an extra column is justified in order to index, for example,DAYOFWEEK(t.TimeStamp)
orHOUR(t.TimeStamp)
which cannot otherwise be indexed directly, as mysql lacks that functionality. I do not see its value here.You will be unable to use 2 range accesses to two separate columns for the same index. Using
IN + a list of values
is a legit workaround for this limitation (which is intrinsic of B(+)TREE indexes). This may be your main issue here.The use of the range cancels the possibility (in this particular case) of using the index for sorting- this cannot be workaround -either you use the index for filtering or for sorting.
Update: Because of B(+)TREE indexes, you can only use the first range for a multi-column index. I have another drawing of why here:
A composite index is just a BTREE ordered by the concatenated values:
The index with the reverse column order cannot be used in such an effective way because of the range at the start:
Technically, both indexes would be equally good as the should be of similar size, but for this particular query, the
idagent_pointdate
(idAgent
,PointDate
) is preferred because it has the range at the end, while forpointdate_idagent
(PointDate
,idAgent
), only the first row can be used.