MySQL – Limit 1000,25 vs Limit 25 Offset 1000

limitsMySQL

Recently I've found out that MySQL has a offset feature. I've been trying to find documentation about the results of offset, or the difference in between offset and the limit variant, but I can't seem to find what I'm looking for.

Lets say I have 10.000 rows in a table and I want 25 results, from row 1.000. As far as I got so far, I could do both to get the same result:

SELECT id,name,description FROM tablename LIMIT 1000,25
SELECT id,name,description FROM tablename LIMIT 25 OFFSET 1000

What I'd like to know is the difference between the two.

Does this actually do the same or is my understanding wrong?
Is one slower/faster in larger tables
Does the result of offset change when I do WHERE column=1 (say column has >100 different values)
Does the result of offset change when I do ORDER BY column ASC (asuming it has random values)

I have the feeling offset skips the first X rows found in the database, disregarding sorting and the where.

Best Answer

In terms of operation

SELECT id,name,description FROM tablename LIMIT 1000,25
SELECT id,name,description FROM tablename LIMIT 25 OFFSET 1000

there is absolutely no difference in the statements

siride's comment:

from https://dev.mysql.com/doc/refman/5.6/en/select.html

For compatibility with PostgreSQL, MySQL also supports the LIMIT row_count OFFSET offset syntax.

is exactly the point.

LIMIT 1000,25 means LIMIT 25 OFFSET 1000

From the same Documentation

LIMIT row_count is equivalent to LIMIT 0, row_count

YOUR ACTUAL QUESTIONS

Does this actually do the same or is my understanding wrong?

Is one slower/faster in larger tables

Since both queries are the same, there is no difference

Does the result of offset change when I do WHERE column=1 (say column has >100 different values)

Does the result of offset change when I do ORDER BY column ASC (asuming it has random values)

Using LIMIT does not change any result sets. They simply navigate within the result set.

This query

SELECT id,name,description FROM tablename ORDER BY id LIMIT 1000,25

would be different from

SELECT * FROM (SELECT id,name,description FROM tablename LIMIT 1000,25) A ORDER BY id;

because the LIMIT is being applied at a different stage.

The first query returns nothing if tablename has less 1000 rows

The second query returns nothing if the subquery has less 1000 rows

CONCLUSION

You will have to sculpt the query to make sure you are sorting data at the right stage

Related Solutions

Mysql – Different MySQL Datafile Sizes After Restoration

This makes all the sense in the world to me.

InnoDB creates data pages and index pages that are 16K each. If rows of data are being inserted, updated, deleted, committed, and rolled back, you are going to have FRAGMENTATION !!!

There are two cases where you can have internal fragmentation:

A single row could be written in multiple data pages because certain column values make a row too big to fit in the data page.
Having a TEXT column with 32K of data in it.

In those two cases, a single row spanning multiple data pages would have to be chained like a linked list. The internally generated list of data pages would always have to be navigated when the row is read.

Giving credit where credit is due, PostgreSQL implemented a very brilliant mechanism called TOAST (The Oversized-Attribute Storage Technique) to keep oversized data outside of tables to stem the tide of this kind of internal fragmentation.

Have used mysqldump to make a file with CREATE TABLE statements, followed by lots of INSERTs, you get a fresh table with no unused space along with contiguous data and index pages when loading the mysqldump into a new server.

For my explanantions, let's assume you have an InnoDB table in the CUSTODIA database called userinfo

If you would like to compress a table, you have three(3) options

Option 1

OPTIMIZE TABLE CUSTODIA.userinfo;

Option 2

ALTER TABLE CUSTODIA.userinfo ENGINE=InnoDB;

Option 3

CREATE TABLE CUSTODIA.userinfo2 LIKE CUSTODIA.userinfo;
INSERT INTO CUSTODIA.userinfo2 SELECT * FROM CUSTODIA.userinfo;
DROP TABLE CUSTODIA.userinfo;
ALTER TABLE CUSTODIA.userinfo2 RENAME CUSTODIA.userinfo;

CAVEAT : Option 3 is no good on a table with constraints. Option 3 is perfect for MyISAM.

Now for your questions:

Question 1. Why is there this difference between original and restored database size?

As explained above

Question 2. Is it safe to assume that restored database is OK, although this difference in size?

If you want to make absolutely sure that the data on both servers are identical, simply run this command on both DB servers:

CHECKSUM TABLE CUSTODIA.userinfo;

Hopefully, the checksum value is identical for the same table on both servers. If you have dozens, even hundreds, of tables, you may have to script it.

Question 3 : How does MySQL calculate data_length? Is it an estimate?

You are using the correct method in summing up the data_length and index_length. Based on my explanation of fragmentation, it is an estimate.

Question 4. Can I safely reduce production's ibdata file size to 3.6GiB with no down-time?

GOOD NEWS !!! You absolutely can compress it !!! In fact, how would like to compress it to a fraction of that number ??? Follow these two links because I addressed this issue in StackOverflow and ServerFault.

https://stackoverflow.com/questions/3927690/howto-clean-a-mysql-innodb-storage-engine/4056261#4056261

https://serverfault.com/questions/230551/mysql-innodb-innodb-file-per-table-cons/231400#231400

BAD NEWS !!! Sorry, but you will have a 3-5 minute window of downtime for rebuilding ib_logfile0 and ib_logfile1 as well shrinking ibdata1 once and for all. It's well worth it since it will be a one-time operation.

MySQL – How to Handle Records Turnover

Not tested, but the idea is using same query twice with different LIMITs (depending on %HOURS passed) in a UNION.

(
SELECT * FROM villa_table v
 ORDER BY villa_order ASC, v.ID
 LIMIT %HOURS, 999999999999
) UNION ALL (
SELECT * FROM villa_table v
 ORDER BY villa_order ASC, v.ID
 LIMIT 0, %HOURS
)

You'll need to fill in %HOURS in your script language or stored procedure. Also once %HOURS is larger than the COUNT(*) of villa_table you'll need to restart it from 0.

Note how the parentheses are necessary.

Also note that the ORDER BY fields must uniquely identify rows (i.e. append the PRIMARY KEY!) to prevent possible ambiguous sorting.