Mysql – Database Offset & Limit without order by

MySQLpostgresqlquery

To my understanding, in MySQL/Postgres database there is no guarantee for the order of values with a select statement without an order-by clause, thus one should be getting rows arbitrarily. If so, then why are there usually no duplicates when I perform such query?

SELECT * FROM `table` LIMIT 0,5
SELECT * FROM `table` LIMIT 5,5
SELECT * FROM `table` LIMIT 10,5

Aren't these queries safe to use, considering that one only cares about not fetching duplicates and does not mind about the ordering? If so, how then?

Best Answer

Your syntax won't work with PostgreSQL, but I can answer your question for PostgreSQL anyway:

Your query will often work as you intend because the three statements will often use the same execution plan. There is no guarantee for that though.

One possibility where even a sequential scan in PostgreSQL will return a different order is if there is already a sequential scan of the table in progress: Then PostgreSQL will start scanning where the other statement is currently scanning the table, i.e. somewhere in the middle, so that both scan can execute somewhat synchronized to reduce the I/O operations necessary.

Related Solutions

Mysql – How to make MySQL auto increment in random order

Here's what I would do. I hope I did understand your question correctly. It's a workaround and perhaps not very elegant:

create a random field, and then an AUTO_INCREMENT field ("ID") as the single field of a primary key, order by RND etc
create a temporary table with MIN(ID) for each combination of Col1/2 (first, but randomly):


SELECT
  MIN(ID) AS MIN_ID, Col1, Col2 
FROM 
  table 
GROUP BY 
 Col1, Col2

add a second INT field to be the first_index
update the table and set first_index to be the difference between the minimum value and the running value of the autoincremented field:


UPDATE 
  table t, temp_table tmp
SET 
  t.first_index = (t.ID - tmp.MIN_ID) + 1
WHERE 
  t.Col1=tmp.Col1 AND t.Col2=tmp.Col2

optionally remove the random and ID field, and add a new PK with the three fields Col1/Col2/first_index (auto incremented) for following inserts

Postgresql – Does inserting in order have the same effect as clustering

Postgres physically writes tuples sequentially as INSERTed. If you do that to a new table or a table without dead tuples, you get just the same result you would achieve with CLUSTER on an index with the same sort order as your INSERT. The effect of CLUSTER deteriorates with later writes to the table in just the same way (and stays intact while you never DELETE or UPDATE - or INSERT breaking the desired order).

Some answers focus on those effects from later writes and miss the point of the question. The answer to your question is basically:

YES, inserting in order does have the same effect as clustering.

based on some conditions:

queries against this table will be filtered down by a foreign key ID column.

Meaning you access row with the same ID at once, not a range of sequential IDs. Then all you need is tuples clustered per ID, the physical order between IDs is meaningless and irrelevant.

And:

generating this data in chunks related to the foreign key

Meaning, "chunks" include all rows for the same ID in sequence. There are no other rows for the same ID inserted sooner or later. So something like:

INSERT INTO big_table (source_table1_id,a,b,c)
SELECT s1.source_table1_id, ... 
FROM   source_table1 s1
...
WHERE s1.source_table1_id BETWEEN 123 and 125 -- example
ORDER BY s1.source_table1_id

And:

I won't ever need to UPDATE it in normal usage; portions may be deleted and regenerated.

The part about DELETE is the only mildly problematic part. If you would never delete either, you would be done here. If by "portions" you mean all rows with said ID at once, you are still good, mostly. While deleting and inserting in the same transaction, there is no fragmentation within IDs. (Deleted tuples are not "dead" yet and not overwritten in the same transaction.)

Dead tuples start bloating the table, and later inserts can start filling physical holes, which is where fragmentation can start. The bloat with dead tuples has various accumulating bad effects, but index access on all rows for a given ID is mostly unaffected.

But all that is orthogonal to your question, since the same considerations apply to CLUSTER.

Have you considered pg_repack, which can do the same as CLUSTER, just without exclusive lock on the table. They just added Postgres 9.6 to the list of supported versions this week.

Best Answer

Related Solutions

Mysql – How to make MySQL auto increment in random order

Postgresql – Does inserting in order have the same effect as clustering

Related Question