MySQL Performance – Why SELECT Gets Slower Further Down the Table

index-tuningMySQLperformancequery-performance

My table:

CREATE TABLE `wp_users` (
  `ID` bigint(20) unsigned NOT NULL AUTO_INCREMENT,
  `user_login` varchar(60) NOT NULL DEFAULT '',
  `user_pass` varchar(64) NOT NULL DEFAULT '',
  `user_nicename` varchar(50) NOT NULL DEFAULT '',
  `user_email` varchar(100) NOT NULL DEFAULT '',
  `user_url` varchar(100) NOT NULL DEFAULT '',
  `user_registered` datetime NOT NULL DEFAULT '0000-00-00 00:00:00',
  `user_activation_key` varchar(60) NOT NULL DEFAULT '',
  `user_status` int(11) NOT NULL DEFAULT '0',
  `display_name` varchar(250) NOT NULL DEFAULT '',
  PRIMARY KEY (`ID`),
  KEY `user_login_key` (`user_login`),
  KEY `user_nicename` (`user_nicename`),
  KEY `registration_date` (`user_email`,`user_registered`),
  KEY `user_registered` (`user_registered`)
) ENGINE=MyISAM AUTO_INCREMENT=176329 DEFAULT CHARSET=utf8;

In a stored procedure I need to loop over every user in the table and preform some other queries and I noticed as it gets further in the loop the queires take longer and longer to preform. What I would expect to take a few hours is now taking days…

To loop over one row at a time I am using this query:

SELECT `ID`, `user_email` FROM `wp_users` ORDER BY `ID` ASC LIMIT @count,1;

@count starts at 0 and works its way up to the number of users in the database. At a count of 0 the query takes only a few milliseconds but as soon as the count starts increasing the time it takes to do the same query rises to when Im on a count of 170003 for example the query takes over 1 second…

Doing and explain I see:

EXPLAIN SELECT SQL_NO_CACHE `ID`, `user_email` FROM `wp_users` ORDER BY `ID` ASC LIMIT 0,1;
1   SIMPLE  wp_users    index   NULL    PRIMARY 8   NULL    1

EXPLAIN SELECT SQL_NO_CACHE `ID`, `user_email` FROM `wp_users` ORDER BY `ID` ASC LIMIT 170003,1;
1   SIMPLE  wp_users    ALL NULL    NULL    NULL    NULL    175862  Using filesort

Why is this happening?

I have found that splitting the query into two is quicker:

SELECT `ID` FROM `wp_users` ORDER BY `ID` ASC LIMIT 170003,1;
SELECT `user_email` FROM `wp_users` WHERE ID = 170469;

and only takes a few milli seconds again…

Is doing it the two queries the only quick way of doing this?

Best Answer

FIRST QUERY

SELECT SQL_NO_CACHE `ID`, `user_email` FROM `wp_users` ORDER BY `ID` ASC LIMIT 0,1;

Now, look back at the EXPLAIN plan for it:

EXPLAIN SELECT SQL_NO_CACHE `ID`, `user_email` FROM `wp_users` ORDER BY `ID` ASC LIMIT 0,1;
1   SIMPLE  wp_users    index   NULL    PRIMARY 8   NULL    1

How many rows do the Query Optimizer See? 1

SECOND QUERY

SELECT SQL_NO_CACHE `ID`, `user_email` FROM `wp_users` ORDER BY `ID` ASC LIMIT 170003,1;

Now, look back at the EXPLAIN plan for it:

EXPLAIN SELECT SQL_NO_CACHE `ID`, `user_email` FROM `wp_users` ORDER BY `ID` ASC LIMIT 170003,1;
1   SIMPLE  wp_users    ALL NULL    NULL    NULL    NULL    175862  Using filesort

How many rows do the Query Optimizer See? 175862

When the Query Optimizer sees too many rows to have to process, it decide to "throw indexes under the bus". MySQL will then do a full table scan, as indicated by the expression ALL in the EXPLAIN plan. The filesort occurs because you had ORDER BY ID.

SUGGESTION

Usually, it is recommended to retrieve rows by cursors. Personally, I hate using cursors. I have an alternative that is a little unorthodox. I would create a temp table with all the email addresses, treating it like an array but accessing it like a table.

Here is that paradigm

DELIMITER $$
DROP PROCEDURE IF EXISTS ProcessEmail;
CREATE PROCEDURE ProcessEmail()
BEGIN

    DROP TABLE IF EXISTS EmailArray; 
    CREATE TABLE EmailArray
    (
        ndx INT NOT NULL AUTO_INCREMENT PRIMARY KEY,
        user_email varchar(100) NOT NULL DEFAULT ''
    ) ENGINE=MEMORY;
    INSERT INTO EmailArray (user_email) SELECT user_email FROM wp_users;

    SET @ndxnext = 0;
    SELECT COUNT(1) INTO @ndxlast FROM IDArray;
    WHILE @ndxnext < @ndxlast DO
      SET @ndxnext = @ndxnext + 1;
      SELECT user_email INTO @GivenEmail FROM EmailArray WHERE ndx = @ndxnext;
      ...
      ... Process whatever you need with the @GivenEmail 
      ...
    END WHILE;

    DROP TABLE IF EXISTS EmailArray; 
END;
$$
DELIMITER ;

GIVE IT A TRY !!!