PostgreSQL Left Outer Join with Order By – Avoiding Duplicates in Pagination

pagingpostgresql

I have two tables api_user and api_user with api_user having foreign key to user table. The schemas for the two tables are as listed below.

                                 Table "public.api_user"
   Column    |           Type           |                       Modifiers                       
--------------+--------------------------+-------------------------------------------------------
 id           | integer                  | not null default nextval('api_user_id_seq'::regclass)
 is_admin     | boolean                  | not null
 is_agent     | boolean                  | not null
 is_guide     | boolean                  | not null
Indexes:
    "api_user_pkey" PRIMARY KEY, btree (id)
Referenced by:
    TABLE "api_userprofile" CONSTRAINT "api_userprofile_user_id_5a1c1c92_fk_api_user_id" FOREIGN KEY (user_id) REFERENCES api_user(id) DEFERRABLE INITIALLY DEFERRED


                                         Table "public.api_userprofile"
         Column         |          Type           |                          Modifiers                           
------------------------+-------------------------+--------------------------------------------------------------
 id                     | integer                 | not null default nextval('api_userprofile_id_seq'::regclass)
 percent_complete       | numeric(3,0)            | not null
 display_name           | character varying(128)  | not null
 city                   | character varying(64)   | not null
 user_id                | integer                 | not null
Indexes:
    "api_userprofile_pkey" PRIMARY KEY, btree (id)
    "api_userprofile_user_id_key" UNIQUE CONSTRAINT, btree (user_id)
Foreign-key constraints:
    "api_userprofile_user_id_5a1c1c92_fk_api_user_id" FOREIGN KEY (user_id) REFERENCES api_user(id) DEFERRABLE INITIALLY DEFERRED

When I run the following query:

select 
    api_user.id, 
    api_userprofile.display_name, 
    api_userprofile.city
FROM "api_user" 
LEFT OUTER JOIN "api_userprofile" ON ("api_user"."id" = "api_userprofile"."user_id") 
WHERE ((("api_user"."is_admin" = false 
    AND "api_userprofile"."percent_complete" >= 60.0 
    AND "api_userprofile"."id" IS NOT NULL)) 
    AND "api_user"."is_guide" = true)
ORDER BY "api_userprofile"."city" ASC LIMIT 20;

it returns:

id  |       display_name        |           city           
-----+---------------------------+--------------------------
 299 | Mohsin Khan               | Agra
  93 | Rizwan Mohd               | Agra
 126 | Abdhesh Sharma            | Agra
  39 | Rashid Ahmed              | Agra
 244 | Nishkam Sharma            | Ajmer
  42 | Parminder Mahla           | Amritsar
 131 | Prashant Hullatti         | Ballry
 241 | Pankaj Anand              | Bangalore
  89 | Niraj K. Singh            | Bodhgaya, Nalanda, Patna
 204 | Ravi Rocks                | Bokaro
  19 | Ian Lotriet               | Cape Town
  15 | Ivy Almacin               | Cape Town
  38 | Dr Brahm Prakaah Tripathi | Delhi
 130 | Virendra Singh            | Delhi
 271 | Satish Jain               | Delhi
 110 | Vikas Agarwal             | Delhi
 114 | Devi Singh Rathore        | Delhi
  58 | Dilip Singh Chanpawat     | Delhi
  95 | Anam Kumar Dhasmana       | Delhi
  51 | Gopal Sharma              | Delhi

Running the query again with offset 20 returns:

 id  |       display_name        |    city    
-----+---------------------------+------------
  95 | Anam Kumar Dhasmana       | Delhi
 114 | Devi Singh Rathore        | Delhi
 252 | Tarun Pratap Singh        | Delhi
 258 | Rajesh Kumar Pal          | Delhi
 255 | Chandan Singh Shekhawat   | Delhi
 268 | Amit Kumar                | Delhi
 100 | Ketan Mehra               | Delhi
 286 | Vikash Poonia             | Delhi
  61 | Belinda Schempers         | Durban
  67 | Pieter Janse Van Rensburg | Hoedspruit
 140 | Dr Hari Krishna Somanchi  | Hyderabad
 197 | Sstya Prabha              | Hyderabad
 118 | Dalpat Jodha              | Jaipur
 253 | Yash Shekhawat            | Jaipur
 120 | Govind Sharma             | Jaipur
 257 | Abhimanyu Singh           | Jaipur
  99 | Ghanshyam Singh           | Jaisalmer
 308 | Nitin Lobo                | Jodhpur
 124 | Rajendra Singh            | Jodhpur
  55 | Umed Gehlot               | Jodhpur

As seen from the output, there are some duplicates (See user with id 114) which is returned in both first query and the next query with offset 20.

Using distinct seems to work fine but why does it return duplicates when ordering by a field on the related table (userprofile)?

Apparently if I order by user.id then also it seems to work fine and does not return duplicates.

Here the relation between user and userprofile is a one-to-one with no user.id being referenced in userprofile.user_id more than once (enforced by the framework (django)).

Best Answer

There's no "duplicates" here, you're running a query twice. If you only order by city, that doesn't define the order of id's and names within the same city (e.g. Delhi). So results within a city could potentially be returned in a different order every time.