MySQL: ORDER BY useless

MySQLorder-by

CREATE TEMPORARY TABLE tt;
INSERT INTO tt; many times
SELECT FROM tt ORDER BY id LIMIT ?,1000;

The temp table is created at the beginning of the script having an "id" auto_increment column (and others). Part 1 of the script fills up the table using various SELECTs. Part 2 should process all selected rows in chunks of 1000 but will never ever write the table again.

I assume that these two commands are the same

SELECT FROM tt ORDER BY id LIMIT ?,1000;
SELECT FROM tt LIMIT ?,1000;

but is this true? Is there any chance that mySQL will retrive the rows in a different order on multiple SELECTs even if the table is not written in between?

Sorting the temp table is the longest part while reading it and I'd be happy to get rid of the ORDER BY part but I don't want to miss a record because the order changed between SELECT LIMIT 0,1000 and SELECT LIMIT 1000,1000.

Best Answer

I guess this depends on the storage-engine. I can tell for MyISAM, a select without any order will return rows sequentially as they occur in file (even if held in memory).

As you ordered data during write, it will be ordered by id within MyISAM-temptables. So you can safely assume it will always return correct results as long as you don't delete from myISAM-temp.

But the new question is - is temptable created in myIsam-format? I assume the default storage engine will be used. This used to be MyISAM, but was changed to InnoDB after MySQL 5.1.

I don't know how innodb would handle an unsorted select.

-- edit

http://dev.mysql.com/doc/refman/5.1/en/internal-temporary-tables.html

In big (on disk) temporary tables myISAM is used. It is not intentional behavior rows are served sequentially in order as written into, but it is by design (as documented) reliable as long as myISAM table is appended only.

This is most probably not true for any "hashed table" like InnoDB, BDB, and eventually MEMORY. But when memory table is converted to "on disk table", current order is "frozen on the disk", and therefore reliable again.

But I guess we have an other kind of problem here:

  • The 'real' problem seems to be your "order by" feels too slow, and you would like to get around that. Do you by chance know your table size and sessions max_heap_table_size? This give you a hint if your table has been converted to myisam. You can also play with Index-Hash-method (BTREE, HASH) to look for optimization.

  • Are you aware that LIMIT N,M will get painfully slow because it has to select N rows, discard them, and then select M rows to return?