MySQL – Duplicate Results When Sorting by LIMIT and OFFSET

MySQL

The fiddle and the query is here so it's easier to find, read on for the question itself.

SQL Fiddle if you guys wanna mess around with it

I'm unable to reproduce the issue using the fiddle.

Here is the query you can use in the fiddle

SELECT n.*, ns.notification_id AS is_read FROM notifications n
LEFT OUTER JOIN notification_status ns
ON n.id = ns.notification_id
LEFT JOIN notification_user_role nur
ON n.id = nur.notification_id
WHERE
(
  n.esb_consultant_id = 19291
  OR
  n.esb_consultant_id = 'role'
)
AND nur.user_role_id = 'pl_sso_regional_vice_president'
AND n.creation_date <= NOW()
AND n.expiration_date >= NOW()
ORDER BY n.creation_date DESC, (is_read IS NULL) DESC, n.priority ASC
LIMIT 0, 10

I've also placed it lower in this post, but here it's easier to catch the eyes.


I'll try to keep this as brief as possible.

I'm working on a notification system. I have 3 tables described below.

I'm trying to get notifications with a LIMIT of 10, paginated, 10 per page(so an OFFSET of 10). I'm using ajax to load the next 10.

They are to be ordered by priority(from 1 to 6, 1 being displayed first, while 6th being displayed last).

All unread notifications must be displayed first(priority still applies), while read notifications must be displayed last(priority still applies).

Notifications are per role. A user can have several roles(thus the need for another table).

The notification_status table described below is used to keep track of which notifications are read.

Which ever notification is NOT in the notification_status table is NOT read. This is very important. I didn't make this decision. I just have to live with it.


To put it into the big picture lets have an example:

Assume we have 14 notifications:

5 of them will be priority 1, unread.

4 of them will be priority > 1, unread.

3 of them will be priority 1, read.

2 of them will be priority > 1, read.


The expected display order is the following.

5 unread priority 1

4 unread priority > 1

1 read priority 1

ajax starts here since we have 10 per page

2 read priority 1

2 read priority > 1


The table structure is as follows.

notifications

+-------------------+------------------+------+-----+---------+----------------+
| Field             | Type             | Null | Key | Default | Extra          |
+-------------------+------------------+------+-----+---------+----------------+
| id                | int(10) unsigned | NO   | PRI | NULL    | auto_increment |
| type_id           | int(10) unsigned | NO   |     | NULL    |                |
| sticky            | int(10) unsigned | NO   |     | NULL    |                |
| priority          | int(10) unsigned | NO   |     | NULL    |                |
| esb_consultant_id | varchar(40)      | NO   |     |         |                |
| message_id        | varchar(100)     | NO   |     |         |                |
| esb_params        | varchar(255)     | YES  |     |         |                |
| creation_date     | datetime         | YES  |     | NULL    |                |
| expiration_date   | datetime         | YES  |     | NULL    |                |
+-------------------+------------------+------+-----+---------+----------------+

notification_user_role

+-----------------+------------------+------+-----+---------+-------+
| Field           | Type             | Null | Key | Default | Extra |
+-----------------+------------------+------+-----+---------+-------+
| user_role_id    | varchar(150)     | NO   |     |         |       |
| notification_id | int(10) unsigned | NO   | MUL | NULL    |       |
+-----------------+------------------+------+-----+---------+-------+

notification_status

+-------------------+------------------+------+-----+---------+-------+
| Field             | Type             | Null | Key | Default | Extra |
+-------------------+------------------+------+-----+---------+-------+
| esb_consultant_id | varchar(20)      | NO   |     |         |       |
| notification_id   | int(10) unsigned | NO   | MUL | NULL    |       |
+-------------------+------------------+------+-----+---------+-------+

The query I'm using to retrieve the results:

SELECT n.*, ns.notification_id AS is_read FROM notifications n
LEFT OUTER JOIN notification_status ns
ON n.id = ns.notification_id
LEFT JOIN notification_user_role nur
ON n.id = nur.notification_id
WHERE 
(
  n.esb_consultant_id = :consultant_id 
  OR 
  n.esb_consultant_id = :role_all
)
AND nur.user_role_id = :consultant_role
AND n.creation_date <= NOW()
AND n.expiration_date >= NOW()
ORDER BY n.creation_date DESC, (is_read IS NULL) DESC, n.priority ASC
LIMIT $offset, $limit

$offset is the page times 10 – so if the page is 0(first page) the offset is 0, if the page is 1(first ajax call) the offset is 10 and so on

$limit is the limit, it's always 10.

:consultant_id is the user id – unique

:role_all is a simple string all. It is used for when certain notifications are for all roles(such as a birthday notification). All users have this notification, regardless of role since all of them have a birthday.


The problem:

Whenever I'm doing an ajax call I get certain notifications that are duplicated. I'll just post a screenshot of it since it's easier than drawing it.

Do note that the ajax itself is just part of the way I'm retrieving the results but is not responsible for the duplicates themselvs, I'm absolutely sure. It's not a display issue either, I've double and triple checked that.

Before Ajax

After ajax

What I've noticed is that if I were to remove this part

ORDER BY n.creation_date DESC, (is_read IS NULL) DESC, n.priority ASC

from the query. It works fine. No duplicates.

The dump of the query above, with limit removed and order by removed:

Dump

Sorry for the image but it's easier.


I'm using PHP to query the database.

  public function all($consultant_id, $consultant_role, $offset = 0) {
    $limit = 10;
    $offset = $offset * 10;

    $query = <<<SQL
SELECT n.*, ns.notification_id AS is_read FROM notifications n
LEFT OUTER JOIN notification_status ns
ON n.id = ns.notification_id
LEFT JOIN notification_user_role nur
ON n.id = nur.notification_id
WHERE 
(
  n.esb_consultant_id = :consultant_id 
  OR 
  n.esb_consultant_id = :role_all
)
AND nur.user_role_id = :consultant_role
AND n.creation_date <= NOW()
AND n.expiration_date >= NOW()
ORDER BY n.creation_date DESC, (is_read IS NULL) DESC, n.priority ASC
LIMIT $offset, $limit
SQL;
    $return = $this->connection
      ->query($query
        , [
          ':consultant_role' => $consultant_role,
          ':consultant_id'   => $consultant_id,
          ':role_all'        => NotificationStatus::PL_N_ALL,
        ]
      )->fetchAll(\PDO::FETCH_ASSOC);

    foreach($return as $item) { // this is added simply for display purposes
      echo $item['id'] . '<br>';
    }

    return $return;
  }

The above is a copy+paste of the code used to retrieve the results. The function simply returns the results for displaying, no other magic going around.

The foreach is added to simply display the results in the browser.

Here is the image of the output. Notification 10 is duplicated.

Here is the exact same code only with
ORDER BY n.creation_date DESC, (is_read IS NULL) DESC, n.priority ASC removed. Limit and offset still apply here.


I'm not that good at mysql or sql in general and I'm not sure where the problem itself lies.

Any pointing in the right direction is greatly appreciated. Even workarounds or "hacks" I'm fine with.


mysql> SELECT * FROM notification_status WHERE notification_id = 10;
Empty set (0.00 sec)

mysql> SELECT * FROM notification_user_role WHERE notification_id = 10;
+--------------------------------+-----------------+
| user_role_id                   | notification_id |
+--------------------------------+-----------------+
| pl_sso_regional_vice_president |              10 |
+--------------------------------+-----------------+
1 row in set (0.00 sec)

mysql> SELECT * FROM notifications WHERE id = 10;
+----+---------+--------+----------+-------------------+-------------------------------------+------------+---------------------+---------------------+
| id | type_id | sticky | priority | esb_consultant_id | message_id                          | esb_params | creation_date       | expiration_date     |
+----+---------+--------+----------+-------------------+-------------------------------------+------------+---------------------+---------------------+
| 10 |       2 |      0 |        4 | role              | pl_n_325f1676e8a263c86432edc7b9f09c | NULL       | 2018-02-01 00:00:00 | 2018-02-28 00:00:00 |
+----+---------+--------+----------+-------------------+-------------------------------------+------------+---------------------+---------------------+
1 row in set (0.00 sec)

mysql>

Best Answer

Do you have notifications associated with 2 or more roles? That would explain the duplicate results.

From the schema we can infer that a notification could be associated with many statues and also with many roles. Therefore if a notification is associated with many - say 3 - roles, then every one of its statuses will appear many (3) times. The notifications that are associated with 1 role do not show any duplication.

We also notice that the SELECT list does not include any columns from the role table. That helps for a solution: convert the JOIN to table role to an EXISTS subquery. The query becomes:

SELECT 
    n.*, ns.notification_id AS is_read 
FROM 
    notifications n
    LEFT OUTER JOIN notification_status ns
    ON n.id = ns.notification_id
WHERE EXISTS
      ( SELECT 1
        FROM notification_user_role nur
        WHERE n.id = nur.notification_id 
          AND nur.user_role_id = :consultant_role
      )
  AND n.esb_consultant_id IN (:consultant_id, :role_all)
  AND n.creation_date <= NOW()
  AND n.expiration_date >= NOW()
ORDER BY 
    n.creation_date DESC, (is_read IS NULL) DESC, n.priority ASC
LIMIT 
    $offset, $limit 

The other reason might be that you have many statuses for a notification. In that case, the solution depends on what you want to do about it. (i.e. what is means for a notification to have 2 or more statuses, should it appear once or twice in the results, etc).
(but that's not the problem in this case)


The third reason that you may get "duplicates" is when you don't have a definite (full) ORDER BY clause. That is, when the ORDER BY cannot resolve all ties. In this case, when there are ties (say in positions 8 to 28), the LIMIT 10 OFFSET 0 means "get me 10 rows, order by that". But the "that" is only good enough to specify the first 7 rows. The last 8th, 9th and 10th are tied (with 18 more!) so MySQL decides and picks 3 of them arbitrarily. (Still no duplicates)

But in the next query, when you ask for rows 11th to 20 with OFFSET 10 LIMIT 10, MySQL resolves the ties in a different way and may give you the 3 rows that you had seen before again (now appearing as 11th, 14th and 15th!).

Why does it do that? Because it can and you haven't told her a definitive and full way to do the ORDER BY.
By the way, I wouldn't call these "duplicates" as they appear in different result sets / calls. The slightly misleading terminology was probably the reason the cause couldn't easily be identified.

The solution is easy however. Just add one more column (that is unique) in the ORDER BY clause:

ORDER BY 
    n.creation_date DESC, (is_read IS NULL) DESC, n.priority ASC,
    n.id    -- to resolve ties

(or, as the OP clarified in the discussion in comments and the chat room, a different order by was needed, still with the n.id in the end):

ORDER BY
    (is_read IS NULL) DESC, n.priority ASC, n.creation_date DESC,
    n.id DESC