Mysql – search misses indexed column

MySQL

I have a table like this

CREATE TABLE `agentpositions` (
    `idPosition` INT(11) NOT NULL AUTO_INCREMENT,
    `idAgent` INT(11) NULL DEFAULT NULL,
    `hardwareSerial` VARCHAR(128) NOT NULL,
    `LatitudePoint` DECIMAL(18,12) NOT NULL,
    `LongitudePoint` DECIMAL(18,12) NOT NULL,
    `PointDate` DATE NULL DEFAULT NULL,
    `TimeStamp` TIMESTAMP NOT NULL DEFAULT CURRENT_TIMESTAMP,
    ...
    PRIMARY KEY (`idPosition`),
    INDEX `idAgent` (`idAgent`),
    INDEX `idagent_pointdate` (`idAgent`, `PointDate`) -- next one is getting ignored and this one is getting used
    INDEX `pointdate_idagent` (`PointDate`, `idAgent`) -- this is getting ignored
)
ENGINE=InnoDB

we query this table with both (idAgent,TimeStamp) and because this table is going to get very big, we put index on both of them

but for sake of architecture (we think date-int is faster and more optimized than timestamp-int, is that event true?? ) we avoided putting index on timestamp, instead we added another field which stores only date of the timestamp, and added index on that, but I don't know why it ignores my index

my query looked like this:

SELECT idPosition , idAgent , LatitudePoint , LongitudePoint , TimeStamp AS StartTime 
FROM `agentpositions` `t` IGNORE INDEX(`idAgent`, PRIMARY)
WHERE (
   t.idAgent=880 AND t.PointDate >= "2015-04-04" 
   AND (t.TimeStamp >= "2015-04-04 00:00:00" and t.TimeStamp < "2015-04-04 23:59:59")) 
   AND t.LatitudePoint != 0 and t.LongitudePoint != 0  
ORDER BY t.TimeStamp

so after not identifying the index, I managed to change it to this which uses that index!

    SELECT idPosition , idAgent , LatitudePoint , LongitudePoint , TimeStamp AS StartTime
FROM agentpositions t IGNORE INDEX(idAgent, PRIMARY)
WHERE
t.idAgent=880 AND t.PointDate in ("2015-04-04","2015-04-05","2015-04-03" )
AND t.LatitudePoint != 0 and t.LongitudePoint != 0
order by t.TimeStamp

I want to be able to give time range like >='2014-01-01'

what is wrong with this index? how can I achive this?

UPDATE : I totally forgot to include my first index that did not get used, I updated the table

Best Answer

This is a nice a example to explain how the query optimizer works, because in my test server, with a couple or rows, your first query uses an index (look at the usage of EXPLAIN):

root@127.0.0.1(test)> EXPLAIN SELECT idPosition , idAgent , LatitudePoint , LongitudePoint , TimeStamp AS StartTime 
    -> FROM `agentpositions` `t` IGNORE INDEX(`idAgent`, PRIMARY)
    -> WHERE (
    ->    t.idAgent=880 AND t.PointDate >= "2015-04-04" 
    ->    AND (t.TimeStamp >= "2015-04-04 00:00:00" and t.TimeStamp < "2015-04-04 23:59:59")) 
    ->    AND t.LatitudePoint != 0 and t.LongitudePoint != 0  
    -> ORDER BY t.TimeStamp\G
*************************** 1. row ***************************
           id: 1
  select_type: SIMPLE
        table: t
   partitions: NULL
         type: range
possible_keys: idagent_pointdate
          key: idagent_pointdate
      key_len: 9
          ref: NULL
         rows: 1
     filtered: 7.11
        Extra: Using index condition; Using where; Using filesort
1 row in set, 1 warning (0.00 sec)

While the second doesn't:

root@127.0.0.1(test)> EXPLAIN SELECT idPosition , idAgent , LatitudePoint , LongitudePoint , TimeStamp AS StartTime
    -> FROM agentpositions t IGNORE INDEX(idAgent, PRIMARY)
    -> WHERE
    -> t.idAgent=880 AND t.PointDate in ("2015-04-04","2015-04-05","2015-04-03" )
    -> AND t.LatitudePoint != 0 and t.LongitudePoint != 0
    -> order by t.TimeStamp\G
*************************** 1. row ***************************
           id: 1
  select_type: SIMPLE
        table: t
   partitions: NULL
         type: ALL
possible_keys: idagent_pointdate
          key: NULL
      key_len: NULL
          ref: NULL
         rows: 5
     filtered: 38.40
        Extra: Using where; Using filesort
1 row in set, 1 warning (0.01 sec)

The main reason why an index gets or not used is, first- if it can be used (this happens for both cases, as you can see on the possible_keys row, and second, if it considers using the key the best strategy to retrieving the rows (based on the cardinality of the columns).

I have shown this graph many times: Index usage vs. full table scan

If the filter that can be used for an index is not very selective (for example, the range is too large), the optimizer will prefer to use a full table scan, as it is a faster strategy for reading a large number of rows. If you want to know the details for your particular query, and youa re using MySQL 5.6, you can use the OPTIMIZER_TRACE feature to understand why one plan was preferred over another.

If you believe a discarded plan is better, you can force the usage of an index by using FORCE INDEX(idagent_pointdate). Be careful because manually forcing the usage of an index is a dangerous practice, as if the data cardinality changes and the usage of the index is no longer the best plan, you will continue blindly using it.

Aside from your main question, there are many other issues regarding your query and indexes, that I will try to address here:

  • Your idAgent index is redundant- you can use idagent_pointdate for everything that you can do with that index, so it is useless, and will take away resources (memory, disk) but also slow down certain tasks like the query optimizer having to take it into account and writes (as an additional index has to be updated as well). Use tools like pt-duplicate-key-checker to search for redundant indexes on your database.

  • I do not get well what you mean with date-int and timestamp-int, but if you mean the practicality of having what basically is having redundant information, one storing second precision and another storing days: that may a useful physical optimization of the table IF you need to index days instad of timestamps. If you just need to index the timestamp using ranges, like t.TimeStamp >= "2015-04-04" and t.TimeStamp < "2015-04-05" (check the syntax, you are missing 1 seccond) this is again a waste of resources. There are cases like having an extra column is justified in order to index, for example, DAYOFWEEK(t.TimeStamp) or HOUR(t.TimeStamp) which cannot otherwise be indexed directly, as mysql lacks that functionality. I do not see its value here.

  • You will be unable to use 2 range accesses to two separate columns for the same index. Using IN + a list of values is a legit workaround for this limitation (which is intrinsic of B(+)TREE indexes). This may be your main issue here.

  • The use of the range cancels the possibility (in this particular case) of using the index for sorting- this cannot be workaround -either you use the index for filtering or for sorting.

Update: Because of B(+)TREE indexes, you can only use the first range for a multi-column index. I have another drawing of why here:

  1. A composite index is just a BTREE ordered by the concatenated values: A composite index is just an index ordered by the concatenated values

  2. The index with the reverse column order cannot be used in such an effective way because of the range at the start: The index with the opposite column order cannot be used in such an effective way because of the range at the start

Technically, both indexes would be equally good as the should be of similar size, but for this particular query, the idagent_pointdate (idAgent, PointDate) is preferred because it has the range at the end, while for pointdate_idagent (PointDate, idAgent), only the first row can be used.