PostgreSQL DISTINCT ON forces a specific order

distinctpostgresql

Why DISTINCT ON is sorting the rows? E.g. the query

SELECT DISTINCT ON (a) a
FROM (
  VALUES (2), (3), (1)
) x(a);

will have the result

1
2
3

while without DISTINCT ON

SELECT a
FROM (
  VALUES (2), (3), (1)
) x(a);

I'll get the correct order

2
3
1

How can I preserve the natural order of defined values? By "natural order" I mean "the order in which the values were defined". I understand that DBMS may change the order if I don't specify it explicitly with an ORDER BY, but I was not expecting that DISTINCT ON my interfere in this way.

Best Answer

Why DISTINCT ON is sorting the rows?

Distinct removes "duplicate" rows from the result set. That's a whole lot easier to do if the items being compared arrive neatly sorted into order.

Check the Explain Plan for this query - it will probably contain extra "Sort" and "Unique" steps. Guess what those are doing.

"select distinct" is always a Red Flag for me. It is almost always used poorly, a band-aid to get around a bigger problem and introduces problems of its own. Remember, the performance of ...

select distinct a, b, c 
. . . 

... is roughly comparable to that of ...

select a, b, c 
. . . 
group by a, b, c 
order by a, b, c 

Ask yourself why [you think] you need to use "distinct".