MySQL InnoDB – Optimize Simple SELECT Query for 2 Million Rows

innodbmyisamMySQLoptimizationselect

I'd like to make this query the fastest possible, since it's called really often.

SELECT login_events.id 
FROM login_events 
WHERE (
  DATE(created_at) >= DATE(CURRENT_DATE) 
  AND person_id = 1
) 
LIMIT 1

It's running on a 157.7mb InnoDB table (says navicat), with ~2million rows and indexed on [created_at, person_id].

Using EXPLAIN I see it's using the index, but it says "Using where; Using index;". What can I do to make this the fastest possible? Would switching to MyISAM gain me anything?

Best Answer

First, what you need to focus on are the three fields in the query

id
created_at
person_id

The index you have (created_at,person_id) will make the query do an index scan across all the days of created_at after CURRENT_DATE looking for the person_id.

SUGGESTION #1 : You will definitely need a different index

MyISAM

If login_events is MyISAM, this is the index you need

ALTER TABLE login_events ADD INDEX person_date_ndx (person_id,created_at,id);

This changes the query because the query will look for the specific person_id and scan all days for person_id 1 only. The reason id is included in the index ? The query will retrieve the id from the index only file rather than the table. That way, all 3 fields are retrieved from the index file instead of 2 fields from the index and 1 from the table.

InnoDB

If login_events is InnoDB, this is the index you need

ALTER TABLE login_events ADD INDEX person_date_ndx (person_id,created_at);

The reason I recommend this is the same, but you do not need to include id. Why? All index pages include an index point back to the clustered index so retrieval of an index will intrinsically access the row anyway, thus accessing id. Adding id to the index would simply be redundant.

SUGGESTION #2 : Change the Date Comparison

From the expression

DATE(created_at) >= DATE(CURRENT_DATE)

I can tell that created_at is either DATETIME or TIMESTAMP.

The expression forces the query to convert every row's DATETIME value of created_on into a DATE.

Therefore, instead of

SELECT login_events.id 
FROM login_events 
WHERE (
  DATE(created_at) >= DATE(CURRENT_DATE) 
  AND person_id = 1
) 
LIMIT 1

express the date comparison as a time comparison starting from midnight of today

SELECT login_events.id 
FROM login_events 
WHERE (
  created_at >= (DATE(NOW()) + INTERVAL 0 SECOND)
  AND person_id = 1
) 
LIMIT 1

CAVEAT

Since the table is so small, either storage engine would be fine. I would give the edge to MyISAM.

Give it a Try !!!

Related Solutions

Mysql – why/how does the number of matched columns influences the way of excecuting a query

I can provide with a general explanation, but it may not apply specifically to your particular case:

The way decision making works is by evaluation cost of execution plan, then picking up what is hopefully the cheapest plan. This you already know.

When it comes to indexing, though, stuff are getting interesting. The way to evaluate the usefulness or viability of an index is to estimate the selectivity given some value.

For the moment, forget about your FULLTEXT index, and let's assume a simple index on some column col1, and another index on some column col2. Given the following two queries:

SELECT * FROM t WHERE col1 < 10 and col2 = 4;
SELECT * FROM t WHERE col1 BETWEEN 100 AND 110 and col2 = 4;

It may happen that the query is evaluated differently in these two cases. Why? Because it may happen that col2 = 4 returns more rows than col1 < 10, in which case we prefer to use index on col1. But then, it may return less rows than col1 BETWEEN 100 AND 110, in which case we prefer the index on col2.

Your case is not very much different. MySQL estimates the number of rows returned by some index query. When you use more columns, MySQL gets the impression your index is likely to result with few rows. So it chooses to start with TableA, then joins what should be very few rows with TableB.

But if MySQL believes the index to return many rows, it may prefer starting with TableB. Why is that? Because you are sorting on indexed columns of TableB. Sorting is a lot of work, too. So MySQL may choose to first sort the rows, then join to TableA and filter by fulltext index. It may not be a bad idea if the fulltext search yields with many rows anyhow.

Mysql – Why does MySQL (InnoDB) table get faster after OPTIMIZE TABLE, but then don’t work

OPTIMIZE rebuilds the table. This (for InnoDB) squeezes out some of the fragmentation and wasted space. This is unlikely to make a noticeable difference in any query.

Also, OPTIMIZE does an ANALYZE. This has a chance of changing the statistics, thereby leading to a different (better or worse) EXPLAIN plan.

Since ANALYZE is much faster (on InnoDB) than OPTIMIZE, just do the ANALYZE.

Various non-ANALYZE actions cause an ANALYZE to be done.

ANALYZE randomly probes the BTrees, gathering stats. Sometimes the resulting stats are poor. There is effectively no way to prevent this from happening. Several partial hacks have been created over the years; 5.6.7 gets close to eliminating this problem with ANALYZE. Here's one of them: http://dev.mysql.com/doc/refman/5.6/en/innodb-parameters.html#sysvar_innodb_stats_persistent_sample_pages