MySQL Query – Fixing Duplicate Records in Ordered Paginated Query

MySQLmysql-5.6order-byselect

Some records show up on multiple result "pages" with the following query:

SELECT 
    `description`,
    `wallpaper`
FROM
    `bbr_bar`
WHERE
    `country_id` = '2510769'
ORDER BY
    `online` DESC

For instance, record with id 99 shows up on the first "page" (LIMIT 10 OFFSET 0) but also on the last one LIMIT 10 OFFSET 50. Obviously this should not happen.

The problem disappears under the the following circumstances:

  • if I don't filter the query, or don't order it (both country_id and online are int(11), not foreign keys)
  • if I remove one of the two fields from the SELECT clause (both varchar(255), not foreign keys)
  • if I delete some tables or fields in the schema. However, afters hours of effort, I am still unable to say for sure which tables/fields have an impact and which ones don't. Looks like random (even if I know it is not). Hence I don't paste here any CREATE TABLE or INSERT statement as I don't think they are relevant.

Only thing I can say for sure is that if I mysqldump all tables and restore them in a new schema, the problem is still there.

(There are other fields in the table but adding/removing them has no effect on the result).

Questions:

  1. Am I right if I conclude that something is corrupted in my schema?
  2. How could I fix this?

(MySQL version: 5.6.24 Win32 x86)

Best Answer

I have some bad news for you: According to MySQL, IT'S NOT A BUG !!!

There are two bug reports on this behavior and is considered non-critical

Here is the rationale expressed in Bug #69732

Without a distinct ORDER BY the result order is undefined. Period.

Even if the same query, executed twice, results results in different order there is nothing wrong with that. You may be used to what looks like deterministic results but that is actually an illusion. It may be what you see in the usual case but there is nothing that guarantees this. Result order can change as statistics get updated, as index trees are reshuffled, as data is partitioned across different physical machines so that result order depends on network latencies ...

So if you want pagination then make sure you ORDER BY on something guaranteed to be UNIQUE, or live with the fact that the sort order of identical values in a non-unique sequence is not deterministic and can change at any time without prior notice.

In your particular case, the online field is probably not distinct enough. Even if you index bbr_bar by (country_id,online), this may still be not distinct enough. You may have to involve another column to drive the sort order (perhaps an auto increment field or a timestamp field).

Give it a try!!!!