I have two tables api_user and api_user with api_user having foreign key to user table. The schemas for the two tables are as listed below.
Table "public.api_user"
Column | Type | Modifiers
--------------+--------------------------+-------------------------------------------------------
id | integer | not null default nextval('api_user_id_seq'::regclass)
is_admin | boolean | not null
is_agent | boolean | not null
is_guide | boolean | not null
Indexes:
"api_user_pkey" PRIMARY KEY, btree (id)
Referenced by:
TABLE "api_userprofile" CONSTRAINT "api_userprofile_user_id_5a1c1c92_fk_api_user_id" FOREIGN KEY (user_id) REFERENCES api_user(id) DEFERRABLE INITIALLY DEFERRED
Table "public.api_userprofile"
Column | Type | Modifiers
------------------------+-------------------------+--------------------------------------------------------------
id | integer | not null default nextval('api_userprofile_id_seq'::regclass)
percent_complete | numeric(3,0) | not null
display_name | character varying(128) | not null
city | character varying(64) | not null
user_id | integer | not null
Indexes:
"api_userprofile_pkey" PRIMARY KEY, btree (id)
"api_userprofile_user_id_key" UNIQUE CONSTRAINT, btree (user_id)
Foreign-key constraints:
"api_userprofile_user_id_5a1c1c92_fk_api_user_id" FOREIGN KEY (user_id) REFERENCES api_user(id) DEFERRABLE INITIALLY DEFERRED
When I run the following query:
select
api_user.id,
api_userprofile.display_name,
api_userprofile.city
FROM "api_user"
LEFT OUTER JOIN "api_userprofile" ON ("api_user"."id" = "api_userprofile"."user_id")
WHERE ((("api_user"."is_admin" = false
AND "api_userprofile"."percent_complete" >= 60.0
AND "api_userprofile"."id" IS NOT NULL))
AND "api_user"."is_guide" = true)
ORDER BY "api_userprofile"."city" ASC LIMIT 20;
it returns:
id | display_name | city
-----+---------------------------+--------------------------
299 | Mohsin Khan | Agra
93 | Rizwan Mohd | Agra
126 | Abdhesh Sharma | Agra
39 | Rashid Ahmed | Agra
244 | Nishkam Sharma | Ajmer
42 | Parminder Mahla | Amritsar
131 | Prashant Hullatti | Ballry
241 | Pankaj Anand | Bangalore
89 | Niraj K. Singh | Bodhgaya, Nalanda, Patna
204 | Ravi Rocks | Bokaro
19 | Ian Lotriet | Cape Town
15 | Ivy Almacin | Cape Town
38 | Dr Brahm Prakaah Tripathi | Delhi
130 | Virendra Singh | Delhi
271 | Satish Jain | Delhi
110 | Vikas Agarwal | Delhi
114 | Devi Singh Rathore | Delhi
58 | Dilip Singh Chanpawat | Delhi
95 | Anam Kumar Dhasmana | Delhi
51 | Gopal Sharma | Delhi
Running the query again with offset 20 returns:
id | display_name | city
-----+---------------------------+------------
95 | Anam Kumar Dhasmana | Delhi
114 | Devi Singh Rathore | Delhi
252 | Tarun Pratap Singh | Delhi
258 | Rajesh Kumar Pal | Delhi
255 | Chandan Singh Shekhawat | Delhi
268 | Amit Kumar | Delhi
100 | Ketan Mehra | Delhi
286 | Vikash Poonia | Delhi
61 | Belinda Schempers | Durban
67 | Pieter Janse Van Rensburg | Hoedspruit
140 | Dr Hari Krishna Somanchi | Hyderabad
197 | Sstya Prabha | Hyderabad
118 | Dalpat Jodha | Jaipur
253 | Yash Shekhawat | Jaipur
120 | Govind Sharma | Jaipur
257 | Abhimanyu Singh | Jaipur
99 | Ghanshyam Singh | Jaisalmer
308 | Nitin Lobo | Jodhpur
124 | Rajendra Singh | Jodhpur
55 | Umed Gehlot | Jodhpur
As seen from the output, there are some duplicates (See user with id 114) which is returned in both first query and the next query with offset 20.
Using distinct seems to work fine but why does it return duplicates when ordering by a field on the related table (userprofile)?
Apparently if I order by user.id then also it seems to work fine and does not return duplicates.
Here the relation between user and userprofile is a one-to-one with no user.id being referenced in userprofile.user_id more than once (enforced by the framework (django)).
Best Answer
There's no "duplicates" here, you're running a query twice. If you only order by city, that doesn't define the order of id's and names within the same city (e.g. Delhi). So results within a city could potentially be returned in a different order every time.