From your question, let's assume your table is called TNAME
You should collect the ids that are valid
First create a table to hold IDs and a process status per ID
CREATE TABLE idlist
(
id INT NOT NULL,
processed INT NOT NULL DEFAULT 0,
PRIMARY KEY (key),
UNIQUE KEY processed (processed,key)
) ENGINE=MyISAM;
Next, preload load all new IDs
INSERT IGNORE INTO idlist (id) SELECT id FROM TNAME WHERE SETNO=1 AND flag='valid';
This will load ids that are valid. IDs that were already loaded are bypassed
Using this idlist, pickup the first 200,000 ids
CREATE TABLE idlist200K SELECT id FROM idlist WHERE processed=0 ORDER BY id LIMIT 200000;
ALTER TABLE idlist200K ADD PRIMARY KEY (id);
Process the 200K rows with an INNER JOIN with queries like this:
SELECT B.* FROM idlist200K A INNER JOIN TNAME B;
When done processing, mark the IDs as processed:
UPDATE idlist A INNER JOIN idlist200K B USING (id) SET A.processed = 1;
DROP TABLE idlist200K;
All you have to do from here is automate all these steps, except the initial table creation
INSERT IGNORE INTO idlist (id) SELECT id FROM TNAME WHERE SETNO=1 AND flag='valid';
CREATE TABLE idlist200K SELECT id FROM idlist WHERE processed=0 ORDER BY id LIMIT 200000;
ALTER TABLE idlist200K ADD PRIMARY KEY (id);
SELECT B.* FROM idlist200K A INNER JOIN TNAME B; -- Whatever processing
UPDATE idlist A INNER JOIN idlist200K B USING (id) SET A.processed = 1;
DROP TABLE idlist200K;
Give it a Try !!!
To speed up the execution of your query, you need to create a compound index:
ALTER TABLE email_messages
ADD INDEX idx1(is_sent, is_cancelled, time_created);
More information: Multiple-Column Indexes
Best Answer
At this point, you will have to create an index on three columns
You must then refactor the query just slightly to reduce two things the MySQL Query Optimizer is expected to handle:
Here is your original query:
What you can do is gather all the keys first:
This will perform a full index scan rather than a full table scan because all three columns are in the index. That's a lot less baggage to walk around with for temp table generation because you are scanning just 3 columns in a smaller resource (the index) as opposed to 6 columns as from a bigger resource (the table) as well as not sticking url and bodytext (which are probably VARCHAR(300) and TEXT fields) into the mix just for the sake of gathering keys.
Next, make the SELECT sid query into an inline table and connect them back to the news table using only the fetched keys. EXAMPLE : Suppose your LIMIT variables are 200,10. This means you want to move to the 201st row of the news table and get 10 keys from that point. This means that no matter which page you are on, you want to collect 10 keys at a time, and only 10 keys at a time.
Here is the new and improved query:
The only real cost here is doing a full index scan for keys rather than a full table scan. The benefit is that once you get the 10 keys from the inline SELECT, then only 10 titles, bodytexts, authors and urls are retrieved.
Give it a Try !!!
CAVEAT
As a rule of thumb, anytime you index a column whose cardinality is very low (i.e., Male/Female (2), Single/Married/Divorced/Widowed (4), True/False (2), 0/1 (2)) and the number of rows for any one value exceeds 5% of the total rows in the table, the MySQL Query Optimizer will rule out any and all indexes and you will perform can full table scan or a bad index gets chosen and you end up with a full index scan. This is why it is very imperative that you find the right column distribution, or at the very least perform index scans instead of table scans.
UPDATE 2011-08-08 11:13 EDT
What was I thinking ? This was my original propsed index:
The field should be approved not authored. This is what it should be:
The original query you gave was
The index I am proposing (approved_time_sid_ndx) will include approved, time, and sid.
The answer that you just submitted has this query :
This being the case, the index you need now should be this:
Both approved and sid should be together. IF they are not together, the MySQL Query Optimizer may decide to perform an internal index merge of the primary key and an index where approved is the first (or only) column. In fact, your new query should be refactored as follows:
Remember, you want an index that encompasses as many DB columns as possible that are embedded in WHERE and ORDER BY clauses.