If you really have to modify these data rarely, then you can simply store the result of the CTE in a table, and run queries against this table. You can define indexes based on your typical queries.
Then TRUNCATE
and repopulate (and ANALYZE
) as necessary.
On the other hand, if you can put the CTE in separate stored procedures rather than a view, you can easily put your conditions in the CTE part rather then the final SELECT
(which is basically what you do querying against tree_view_1
), so that much less rows will be involved in the recursion. From the query plan it looks like that PostgreSQL estimates row numbers based on some far-from-true assumptions, probably producing suboptimal plans - this effect can be reduced somewhat with the SP solution.
EDIT I may miss something, but just noticed that in the non-recursive term you don't filter the rows. Possibly you want to include only root nodes there (WHERE parent_id IS NULL
) - I'd expect much less rows and recursions this way.
EDIT 2 AS it slowly became clear for me from the comments, I misthought the recursion in the original question going the other way. Here I mean starting from the root nodes and going deeper in the recursion.
WITH ORDINALITY
in Postgres 9.4 or later
The new feature simplifies this class of problems. The above query can now simply be:
SELECT *
FROM regexp_split_to_table('I think Postgres is nifty', ' ') WITH ORDINALITY x(word, rn);
Or, applied to a table:
SELECT *
FROM tbl t, regexp_split_to_table(t.my_column, ' ') WITH ORDINALITY x(word, rn);
Details:
About the implicit LATERAL
join:
Postgres 9.3 or older - and more general explanation
For a single string
You can apply the window function row_number()
to remember the order of elements. However, with the usual row_number() OVER (ORDER BY col)
you get numbers according to the sort order, not the original position in the string.
You could simply omit ORDER BY
to get the position "as is":
SELECT *, row_number() OVER () AS rn
FROM regexp_split_to_table('I think Postgres is nifty', ' ') AS x(word);
Performance of regexp_split_to_table()
degrades with long strings. unnest(string_to_array(...))
scales better:
SELECT *, row_number() OVER () AS rn
FROM unnest(string_to_array('I think Postgres is nifty', ' ')) AS x(word);
However, while this normally works and I have never seen it break in simple queries, Postgres asserts nothing as to the order of rows without an explicit ORDER BY
.
To guarantee ordinal numbers of elements in the original string, use generate_subscript()
(improved with comment by @deszo):
SELECT arr[rn] AS word, rn
FROM (
SELECT *, generate_subscripts(arr, 1) AS rn
FROM string_to_array('I think Postgres is nifty', ' ') AS x(arr)
) y;
For a table of strings
Add PARTITION BY id
to the OVER
clause ...
Demo table:
CREATE TEMP TABLE strings(string text);
INSERT INTO strings VALUES
('I think Postgres is nifty')
,('And it keeps getting better');
I use ctid
as ad-hoc substitute for a primary key. If you have one (or any unique column) use that instead.
SELECT *, row_number() OVER (PARTITION BY ctid) AS rn
FROM (
SELECT ctid, unnest(string_to_array(string, ' ')) AS word
FROM strings
) x;
This works without any distinct ID:
SELECT arr[rn] AS word, rn
FROM (
SELECT *, generate_subscripts(arr, 1) AS rn
FROM (
SELECT string_to_array(string, ' ') AS arr
FROM strings
) x
) y;
SQL Fiddle.
Answer to question
SELECT z.arr, z.rn, z.word, d.meaning -- , partofspeech -- ?
FROM (
SELECT *, arr[rn] AS word
FROM (
SELECT *, generate_subscripts(arr, 1) AS rn
FROM (
SELECT string_to_array(string, ' ') AS arr
FROM strings
) x
) y
) z
JOIN dictionary d ON d.wordname = z.word
ORDER BY z.arr, z.rn;
Best Answer
The problem with your query is the join condition
id = ANY(ancestors)
. Not only does it not preserve original order, it also eliminates duplicate elements in the array. (Anid
could match 10 elements inancestors
, it would still be picked once only.) Not sure if the logic of your query would allow duplicate elements, but if it does I am pretty sure you want to preserve all instances - you want to keep "original order" after all.Assuming current Postgres 9.4+ for lack of information, I suggest a completely different approach:
You query only works as intended if
nodes.id
is defined as primary key andnodes.entity_id
is unique as well. Information is missing in the question.Normally, this simpler query without explicit
ORDER BY
works as well, but there are no guarantees (Postgres 9.3+)...You can make this safe as well. Detailed explanation:
SQL Fiddle demo for Postgres 9.3.
Opional optimization
You join to
entity.nodes
twice - to substitute fornode_id
andancestors
alike. An alternative would be to fold both into one array or one set and join only once. Might be faster, but you have to test.For these alternatives we need the
ORDER BY
in any case:Add
node_id
to theancestors
array before we unnest ...Or add
node_id
to the unnested elements ofancestors
before we join ...You did not show our CTE, this might be optimized further ...