MySQL Query Optimization : Indexing and Pagination

MySQLoptimizationorder-by

I'm adding pagination to my 'latest news' php script and running in to an issue.

SELECT sid, title, time, bodytext, author, url FROM news WHERE approved=1 ORDER BY time desc LIMIT $start, $limit

I have indexes on approved, time columns.

The problem is when I do an EXPLAIN on the query, it shows it is using the 'approved' index – but it still has to scan all 1000 rows to get the 10 new entries I want to display.

Anyway to optimze the query so that doesn't happen?

Best Answer

At this point, you will have to create an index on three columns

ALTER TABLE news ADD INDEX authored_time_sid_ndx (authored,time,sid);

You must then refactor the query just slightly to reduce two things the MySQL Query Optimizer is expected to handle:

  • the number of rows read
  • the amount of data collected

Here is your original query:

SELECT sid, title, time, bodytext, author, url FROM news WHERE approved=1 ORDER BY time desc LIMIT $start, $limit

What you can do is gather all the keys first:

SELECT sid FROM news WHERE approved=1 ORDER BY time desc LIMIT $start, $limit

This will perform a full index scan rather than a full table scan because all three columns are in the index. That's a lot less baggage to walk around with for temp table generation because you are scanning just 3 columns in a smaller resource (the index) as opposed to 6 columns as from a bigger resource (the table) as well as not sticking url and bodytext (which are probably VARCHAR(300) and TEXT fields) into the mix just for the sake of gathering keys.

Next, make the SELECT sid query into an inline table and connect them back to the news table using only the fetched keys. EXAMPLE : Suppose your LIMIT variables are 200,10. This means you want to move to the 201st row of the news table and get 10 keys from that point. This means that no matter which page you are on, you want to collect 10 keys at a time, and only 10 keys at a time.

Here is the new and improved query:

SELECT A.sid, A.title, A.time, A.bodytext, A.author, A.url
FROM
news A INNER JOIN
(SELECT sid FROM news WHERE approved=1 ORDER BY time desc LIMIT $start, $limit) B
USING (sid);

The only real cost here is doing a full index scan for keys rather than a full table scan. The benefit is that once you get the 10 keys from the inline SELECT, then only 10 titles, bodytexts, authors and urls are retrieved.

Give it a Try !!!

CAVEAT

As a rule of thumb, anytime you index a column whose cardinality is very low (i.e., Male/Female (2), Single/Married/Divorced/Widowed (4), True/False (2), 0/1 (2)) and the number of rows for any one value exceeds 5% of the total rows in the table, the MySQL Query Optimizer will rule out any and all indexes and you will perform can full table scan or a bad index gets chosen and you end up with a full index scan. This is why it is very imperative that you find the right column distribution, or at the very least perform index scans instead of table scans.

UPDATE 2011-08-08 11:13 EDT

What was I thinking ? This was my original propsed index:

ALTER TABLE news ADD INDEX authored_time_sid_ndx (authored,time,sid);

The field should be approved not authored. This is what it should be:

ALTER TABLE news ADD INDEX approved_time_sid_ndx (approved,time,sid);

The original query you gave was

SELECT sid, title, time, bodytext, author, url FROM news WHERE approved=1 ORDER BY time desc LIMIT $start, $limit

The index I am proposing (approved_time_sid_ndx) will include approved, time, and sid.

The answer that you just submitted has this query :

SELECT sid, title, time, bodytext, author, url FROM news WHERE approved = 1 AND sid > 0 ORDER BY sid desc LIMIT 500, 10;

This being the case, the index you need now should be this:

ALTER TABLE news ADD INDEX approved_sid_ndx (approved,sid);

Both approved and sid should be together. IF they are not together, the MySQL Query Optimizer may decide to perform an internal index merge of the primary key and an index where approved is the first (or only) column. In fact, your new query should be refactored as follows:

SELECT A.sid, A.title, A.time, A.bodytext, A.author, A.url
FROM
news A INNER JOIN
(SELECT sid FROM news WHERE approved=1 and sid > 0 ORDER BY sid desc LIMIT 500,10) B
USING (sid);

Remember, you want an index that encompasses as many DB columns as possible that are embedded in WHERE and ORDER BY clauses.