The problem with your query is the join condition id = ANY(ancestors)
. Not only does it not preserve original order, it also eliminates duplicate elements in the array. (An id
could match 10 elements in ancestors
, it would still be picked once only.) Not sure if the logic of your query would allow duplicate elements, but if it does I am pretty sure you want to preserve all instances - you want to keep "original order" after all.
Assuming current Postgres 9.4+ for lack of information, I suggest a completely different approach:
SELECT n.entity_id, p.ancestors
FROM tree t
JOIN nodes n ON n.id = t.node_id
LEFT JOIN LATERAL (
SELECT ARRAY (
SELECT p.entity_id
FROM unnest(t.ancestors) WITH ORDINALITY a(id, ord)
JOIN entity.nodes p USING (id)
ORDER BY ord
) AS ancestors
) p ON true;
You query only works as intended if nodes.id
is defined as primary key and nodes.entity_id
is unique as well. Information is missing in the question.
Normally, this simpler query without explicit ORDER BY
works as well, but there are no guarantees (Postgres 9.3+)...
SELECT n.entity_id, p.ancestors
FROM tree t
JOIN nodes n ON n.id = t.node_id
LEFT JOIN LATERAL (
SELECT ARRAY (
SELECT p.entity_id
FROM unnest(t.ancestors) id
JOIN entity.nodes p USING (id)
) AS ancestors
) p ON true;
You can make this safe as well. Detailed explanation:
SQL Fiddle demo for Postgres 9.3.
Opional optimization
You join to entity.nodes
twice - to substitute for node_id
and ancestors
alike. An alternative would be to fold both into one array or one set and join only once. Might be faster, but you have to test.
For these alternatives we need the ORDER BY
in any case:
Add node_id
to the ancestors
array before we unnest ...
SELECT p.arr[1] AS entity_id, p.arr[2:2147483647] AS ancestors
FROM tree t
LEFT JOIN LATERAL (
SELECT ARRAY (
SELECT p.entity_id
FROM unnest(t.node_id || t.ancestors) WITH ORDINALITY a(id, ord)
JOIN entity.nodes p USING (id)
ORDER BY ord
) AS arr
) p ON true;
Or add node_id
to the unnested elements of ancestors
before we join ...
SELECT p.arr[1] AS entity_id, p.arr[2:2147483647] AS ancestors
FROM tree t
LEFT JOIN LATERAL (
SELECT ARRAY (
SELECT p.entity_id
FROM (
SELECT t.node_id AS id, 0 AS ord
UNION ALL
SELECT * FROM unnest(t.ancestors) WITH ORDINALITY
) x
JOIN entity.nodes p USING (id)
ORDER BY ord
) AS arr
) p ON true;
You did not show our CTE, this might be optimized further ...
With tools of the basic Postgres installation only, you might unnest()
and count in a LATERAL
subquery:
SELECT i.name, i.user_ids_who_like, x.ct
FROM items i
, LATERAL (
SELECT count(*) AS ct
FROM unnest(i.user_ids_who_like) uid
WHERE uid = ANY('{3,4,11}'::int[])
) x
ORDER BY x.ct DESC; -- add PK as tiebreaker for stable sort order
We don't need a LEFT JOIN
to preserve rows without match because count()
always returns a row - 0 for "no match".
Assuming integer
arrays without NULL values or duplicates, the intersection operator &
of the intarray
module would be much simpler:
SELECT name, user_ids_who_like
, array_length(user_ids_who_like & '{3,4,11}', 1) AS ct
FROM items
ORDER BY 3 DESC NULLS LAST;
I added NULLS LAST
to sort empty arrays last - after the reminder from your later question:
Install intarray
once per database for this.
Use the overlap opertaor &&
in the WHERE
clause to rule out rows without any overlap:
SELECT ...
FROM ...
WHERE user_ids_who_like && '{3,4,11}'
ORDER BY ...
Why? Per documentation:
intarray
provides index support for the &&
, @>
, <@
, and @@
operators,
as well as regular array equality.
Applies to standard array operators in a similar fashion. Details:
Alternatively and more radically, a normalized schema with a separate table instead of the array column user_ids_who_like
would occupy more disk space, but offer simple solutions with plain btree indexes for these problems.
Best Answer
WITH ORDINALITY
in Postgres 9.4 or laterThe new feature simplifies this class of problems. The above query can now simply be:
Or, applied to a table:
Details:
About the implicit
LATERAL
join:Postgres 9.3 or older - and more general explanation
For a single string
You can apply the window function
row_number()
to remember the order of elements. However, with the usualrow_number() OVER (ORDER BY col)
you get numbers according to the sort order, not the original position in the string.You could simply omit
ORDER BY
to get the position "as is":Performance of
regexp_split_to_table()
degrades with long strings.unnest(string_to_array(...))
scales better:However, while this normally works and I have never seen it break in simple queries, Postgres asserts nothing as to the order of rows without an explicit
ORDER BY
.To guarantee ordinal numbers of elements in the original string, use
generate_subscript()
(improved with comment by @deszo):For a table of strings
Add
PARTITION BY id
to theOVER
clause ...Demo table:
I use
ctid
as ad-hoc substitute for a primary key. If you have one (or any unique column) use that instead.This works without any distinct ID:
SQL Fiddle.
Answer to question