Mysql – Why is determining existence (or not) of rows matching certain criteria using a ‘LIMIT 1’ query on indexed columns examining more than 1 row

indexinnodblimitsMySQLperformancequery-performance

Consider a table created like this:

CREATE TABLE `someTable` (
  `id` bigint(11) unsigned NOT NULL AUTO_INCREMENT,
  `owner_id` bigint(11) unsigned NOT NULL,
  `device_id` bigint(11) unsigned NOT NULL,
  `reviewed` tinyint(1) NOT NULL DEFAULT '1',
  `created_at` timestamp NOT NULL DEFAULT CURRENT_TIMESTAMP,
PRIMARY KEY (`id`),
KEY `owner_id reviewed` (`owner_id`,`reviewed`),
KEY `device_id reviewed` (`device_id`,`reviewed`)
);

This table records events that happen to a device which has an owner. There are ~1000 owners, each of which can have 1-100 devices. Each device can have millions of events in the table.

We need two queries to tell us:

  1. Does a given owner_id have unreviewed events on any device?
  2. Does a given device_id have unreviewed events?

We have been using:

  1. SELECT 1 FROM someTable s WHERE s.reviewed = 0 AND s.owner_id = 0 LIMIT 1;
  2. SELECT 1 FROM someTable s WHERE s.reviewed = 0 AND s.device_id = 0 LIMIT 1;

But we have found that as the number of events grows these queries are slowing down unexpectedly. Running EXPLAIN on them says that we will examine many rows. I admit that I was expecting that it would only need to examine 1 row. My thinking was that MySQL could look up the corresponding index ('owner_id reviewed' index or 'device_id reviewed' index) and tell immediately whether or not there were any corresponding rows.

I understand that EXPLAIN's rows column is just an estimate and that it may not be entirely accurate for queries with LIMIT 1, but I have found that the runtimes of these queries is increasing into many seconds (sometimes minutes) as the data set increases, which leads me to believe that MySQL is indeed examining more than one row.

What am I doing wrong here? How can I restructure my queries (or indexes) to do this efficiently?

We are running MySQL 5.6.21 on Linux. The table is innodb. I have created a simple SQL fiddle here: http://sqlfiddle.com/#!9/f0f4f1/1

Best Answer

LIMIT provides a window into the result set. LIMIT 1 is equivalent to LIMIT 0, 1; return 1 row starting at offset 0. Explain plan reports the estimated number of rows the LIMIT is running against. However, only one match should be examined.

Testing on your SQL Fiddle shows the query runs against the INDEX (KEY) which should be the fastest access method. I have found queries of these form are sometimes faster than queries returning a constant.

SELECT owner_id
FROM someTable s 
WHERE reviewed = 0
AND owner_id = 0
LIMIT 1;

I would check the I/O on the device(s) storing your data. I/O contention can cause issues on larger databases.

I would also consider a query in the form:

SELECT MIN(event_id)
FROM someTable s 
WHERE reviewed = 0
AND owner_id = 0; 

With an (owner_id, review, event_id) indexed. This should run against the index and examine only one row of the index.