PostgreSQL Order By – Order of Returned Rows with IN Statement

order-bypostgresql

I know that the order of returned rows is not guaranteed with the IN statement in Postgres. For example if I do this:

SELECT users.id FROM users WHERE users.id IN (13589, 16674, 13588)

I may get this result:

16674
13588
13589

However, I want returned rows to respect the order in the IN list, so I found few solutions online, such as:

SELECT users.id FROM users WHERE users.id IN (13589, 16674, 13588)
ORDER BY POSITION(id::text in '(13589, 16674, 13588)')

SELECT users.id FROM users WHERE users.id IN (13589, 16674, 13588)
ORDER BY id = 13589 desc,
         id = 16674 desc,
         id = 13588 desc;

I wonder if there is a nicer way to do this, or better yet more efficient?

Best Answer

`WITH ORDINALITY` in Postgres 9.4+

Introduced with Postgres 9.4. The manual:

When a function in the FROM clause is suffixed by WITH ORDINALITY, a bigint column is appended to the output which starts from 1 and increments by 1 for each row of the function's output. This is most useful in the case of set returning functions such as unnest().

SELECT u.*
FROM   unnest('{13589, 16674, 13588}'::int[]) WITH ORDINALITY AS x(id, order_nr)
JOIN   users u USING (id)
ORDER  BY x.order_nr;

array or set?

x IN (set) statements are rewritten internally in Postgres to x = ANY (array), which is equivalent:

SELECT users.id FROM users WHERE users.id = ANY ('{13589, 16674, 13588}')

You can see for yourself with EXPLAIN.

Postgres 9.3 or earlier

For now, to preserve the order of elements, you could:

SELECT u.*
FROM  (
   SELECT arr, generate_subscripts(arr, 1) AS order_nr
   FROM  (SELECT '{13589, 16674, 13588}'::int[]) t(arr)
   ) x
JOIN   users u ON u.id = x.arr[x.order_nr]
ORDER  BY x.order_nr;

db<>fiddle here

Related Solutions

SQL ORDER BY – Implementations in Subqueries

You're going to have to make your application not put the ORDER BY inside the subquery (maybe it has an option to not use a needless subquery in the first place). As you've already discovered, this syntax is not supported in SQL Server without TOP. And with TOP, unless you want to leave some rows out, using TOP 100 PERCENT is going to render the ORDER BY optimized away anyway.

And in Oracle and PostGres, just because the syntax is supported, does not mean it is obeyed. And just because you observe it as being obeyed in some scenario, does not mean that it will continue to be obeyed as new versions come out or with subtle changes to your data, statistics, the query itself, or the environment.

I can assure you that, without a doubt, if you want a guarantee about order, you need to put the ORDER BY on the outermost query. This should be a doctrine you hold close no matter what platform you're using.

You are asking for a link that officially states that something is not supported. This is like looking in your car owner's manual for an official statement that your car cannot fly.

PostgreSQL – Select into Specific Array Positions with array_agg()

Your answer basically gets the job done:

SELECT b.id, array_agg(b.stock) AS stock
FROM  (
   SELECT i.id, COALESCE(i_s.stock, 0) AS stock
   FROM   item i
   CROSS  JOIN unnest('{1,2}'::int[]) n
   LEFT   JOIN item_stock i_s ON i.id = i_s.item_id AND n.n = i_s.shop_id
   ORDER  BY i.id, n.n
   ) b
GROUP  BY b.id;

Two notable changes:

Order is not guaranteed without ORDER BY in the subquery or as additional clause to array_aggregate() (typically more expensive). And that's the core element of your question.
unnest('{1,2}'::int[]) instead of generate_series(1,2) as requested shop IDs will hardly be sequential all the time.

I also moved the set-returning function from the SELECT list to a separate table expression attached with CROSS JOIN. Standard SQL form, but that's just a matter of clarity and taste, not a necessity. At least in Postgres 10 or later. See:

What is the expected behaviour for multiple set-returning functions in SELECT clause?

Doing the same with LEFT JOIN LATERAL and an ARRAY constructor might be a bit faster as we don't need the outer GROUP BY and the ARRAY constructor is typically faster, too:

SELECT i.id, s.stock
FROM   item i
CROSS  JOIN LATERAL (
   SELECT ARRAY(
      SELECT COALESCE(i_s.stock, 0)
      FROM   unnest('{1,2}'::int[]) n
      LEFT   JOIN item_stock i_s ON i_s.shop_id = n.n
                                AND i_s.item_id = i.id
      ORDER  BY n.n
      ) AS stock
   ) s;

And if you have more than just the two shops, a nested crosstab() should provide top performance:

SELECT i.id, COALESCE(stock, '{0,0}') AS stock
FROM   item i
LEFT   JOIN (
   SELECT id, ARRAY[COALESCE(shop1, 0), COALESCE(shop2, 0)] AS stock
   FROM   crosstab(
     $$SELECT item_id, shop_id, stock
       FROM   item_stock
       WHERE  shop_id = ANY ('{1,2}'::int[])
       ORDER  BY 1,2$$

     , $$SELECT unnest('{1,2}'::int[])$$
      ) AS ct (id int, shop1 int, shop2 int)
   ) i_s USING (id);

Needs to be adapted in more places to cater for different shop IDs.

PostgreSQL Crosstab Query

db<>fiddle here

Index

Make sure you have at least an index on item_stock (shop_id, item_id) - typically provided by a PRIMARY KEY on those columns. For the crosstab query, it also matters that shop_id comes first. See:

Is a composite index also good for queries on the first field?

Adding stock as another index expression may allow faster index-only scans. In Postgres 11 or later consider an INCLUDE item to the PK like so:

PRIMARY KEY (shop_id, item_id) INCLUDE (stock)

But only if you need it a lot, as it makes the index a bit bigger and possibly more susceptible to bloat from updates.