PostgreSQL – Preserve Order of Array Elements After Join

arrayctepostgresqlrecursive

I have a query that returns a CTE looking like

+-----------+-------------+
|   node_id | ancestors   |
|-----------+-------------|
|         1 | []          |
|         2 | []          |
|         3 | [1]         |
|         4 | [2]         |
|         5 | [4, 2]      |
+-----------+-------------+

What I want to do is join with the nodes table and replace the ids that are in the ancestors column with another column on the nodes table. Here's my query so far:

WITH RECURSIVE tree AS (
  -- snip --
)
SELECT node.entity_id AS id,
       array_remove(array_agg(parent_nodes.entity_id), NULL) AS ancestors
FROM tree
JOIN entity.nodes AS node ON node.id = tree.node_id
LEFT OUTER JOIN entity.nodes AS parent_nodes ON parent_nodes.id = ANY(tree.ancestors)
GROUP BY node.id;

The problem with that query is that is loses the order of the original ancestors array. Is there a way to perform the join while keeping the original order during the array_agg function?

Best Answer

The problem with your query is the join condition id = ANY(ancestors). Not only does it not preserve original order, it also eliminates duplicate elements in the array. (An id could match 10 elements in ancestors, it would still be picked once only.) Not sure if the logic of your query would allow duplicate elements, but if it does I am pretty sure you want to preserve all instances - you want to keep "original order" after all.

Assuming current Postgres 9.4+ for lack of information, I suggest a completely different approach:

SELECT n.entity_id, p.ancestors
FROM   tree t
JOIN   nodes n ON n.id = t.node_id
LEFT   JOIN LATERAL (
   SELECT ARRAY (
      SELECT p.entity_id
      FROM   unnest(t.ancestors) WITH ORDINALITY a(id, ord)
      JOIN   entity.nodes p USING (id)
      ORDER  BY ord
      ) AS ancestors
   ) p ON true;

You query only works as intended if nodes.id is defined as primary key and nodes.entity_id is unique as well. Information is missing in the question.

Normally, this simpler query without explicit ORDER BY works as well, but there are no guarantees (Postgres 9.3+)...

SELECT n.entity_id, p.ancestors
FROM   tree t
JOIN   nodes n ON n.id = t.node_id
LEFT   JOIN LATERAL (
   SELECT ARRAY (
      SELECT p.entity_id
      FROM   unnest(t.ancestors) id
      JOIN   entity.nodes p USING (id)
      ) AS ancestors
   ) p ON true;

You can make this safe as well. Detailed explanation:

PostgreSQL unnest() with element number

SQL Fiddle demo for Postgres 9.3.

Opional optimization

You join to entity.nodes twice - to substitute for node_id and ancestors alike. An alternative would be to fold both into one array or one set and join only once. Might be faster, but you have to test.
For these alternatives we need the ORDER BY in any case:

Add node_id to the ancestors array before we unnest ...

SELECT p.arr[1] AS entity_id, p.arr[2:2147483647] AS ancestors
FROM   tree t
LEFT   JOIN LATERAL (
   SELECT ARRAY (
      SELECT p.entity_id
      FROM   unnest(t.node_id || t.ancestors) WITH ORDINALITY a(id, ord)
      JOIN   entity.nodes p USING (id)
      ORDER  BY ord
      ) AS arr
   ) p ON true;

Or add node_id to the unnested elements of ancestors before we join ...

SELECT p.arr[1] AS entity_id, p.arr[2:2147483647] AS ancestors
FROM   tree t
LEFT   JOIN LATERAL (
   SELECT ARRAY (
      SELECT p.entity_id
      FROM  (
         SELECT t.node_id AS id, 0 AS ord
         UNION ALL
         SELECT * FROM unnest(t.ancestors) WITH ORDINALITY
         ) x
      JOIN   entity.nodes p USING (id)
      ORDER  BY ord
      ) AS arr
   ) p ON true;

You did not show our CTE, this might be optimized further ...

`WITH ORDINALITY` in Postgres 9.4 or later

The new feature simplifies this class of problems. The above query can now simply be:

SELECT *
FROM   regexp_split_to_table('I think Postgres is nifty', ' ') WITH ORDINALITY x(word, rn);

Or, applied to a table:

SELECT *
FROM   tbl t, regexp_split_to_table(t.my_column, ' ') WITH ORDINALITY x(word, rn);

Details:

PostgreSQL unnest() with element number

About the implicit LATERAL join:

What is the difference between LATERAL and a subquery in PostgreSQL?

Postgres 9.3 or older - and more general explanation

For a single string

You can apply the window function row_number() to remember the order of elements. However, with the usual row_number() OVER (ORDER BY col) you get numbers according to the sort order, not the original position in the string.

You could simply omit ORDER BY to get the position "as is":

SELECT *, row_number() OVER () AS rn
FROM   regexp_split_to_table('I think Postgres is nifty', ' ') AS x(word);

Performance of regexp_split_to_table() degrades with long strings. unnest(string_to_array(...)) scales better:

SELECT *, row_number() OVER () AS rn
FROM   unnest(string_to_array('I think Postgres is nifty', ' ')) AS x(word);

However, while this normally works and I have never seen it break in simple queries, Postgres asserts nothing as to the order of rows without an explicit ORDER BY.

To guarantee ordinal numbers of elements in the original string, use generate_subscript() (improved with comment by @deszo):

SELECT arr[rn] AS word, rn
FROM   (
   SELECT *, generate_subscripts(arr, 1) AS rn
   FROM   string_to_array('I think Postgres is nifty', ' ') AS x(arr)
   ) y;

For a table of strings

Add PARTITION BY id to the OVER clause ...

Demo table:

CREATE TEMP TABLE strings(string text);
INSERT INTO strings VALUES
  ('I think Postgres is nifty')
 ,('And it keeps getting better');

I use ctid as ad-hoc substitute for a primary key. If you have one (or any unique column) use that instead.

SELECT *, row_number() OVER (PARTITION BY ctid) AS rn
FROM  (
   SELECT ctid, unnest(string_to_array(string, ' ')) AS word
   FROM   strings
   ) x;

This works without any distinct ID:

SELECT arr[rn] AS word, rn
FROM  (
   SELECT *, generate_subscripts(arr, 1) AS rn
   FROM  (
      SELECT string_to_array(string, ' ') AS arr
      FROM   strings
      ) x
   ) y;

SQL Fiddle.

Answer to question

SELECT z.arr, z.rn, z.word, d.meaning   -- , partofspeech -- ?
FROM  (
   SELECT *, arr[rn] AS word
   FROM  (
      SELECT *, generate_subscripts(arr, 1) AS rn
      FROM  (
         SELECT string_to_array(string, ' ') AS arr
         FROM   strings
         ) x
      ) y
   ) z
JOIN   dictionary d ON d.wordname = z.word
ORDER  BY z.arr, z.rn;

Best Answer

Opional optimization

Related Solutions

PostgreSQL tree structure and recursive CTE optimization

PostgreSQL Array Sorting – How to Preserve Original Order of Elements in Unnested Array

WITH ORDINALITY in Postgres 9.4 or later

Postgres 9.3 or older - and more general explanation

For a single string

For a table of strings

Answer to question

Related Question

`WITH ORDINALITY` in Postgres 9.4 or later