Postgresql – Optimze performance for LIMIT / OFFEST on collected result of multiple queries

optimizationorder-byperformancepostgresqlpostgresql-performanceunion

I have large tables in Postgres 10.4 (many millions rows) and three piped SELECT queries. The percentage of rows contributed to the total result for each SELECT is roughly:

  • SELECT 1 : ~5 % rows
  • SELECT 2 : ~5 % rows
  • SELECT 3 : ~90 % rows

I can use UNION and ORDER BY and a RANK column, but it's very slow because Postgres collects rows from all subqueries and applies ORDER BY RANK on the result set.

Is there a way to have SELECT 1 executed, and only if that does not provide enough rows SELECT 2 is executed, etc.

Best Answer

There is a general technique to achieve this, with UNION ALL and LIMIT:

SELECT 1 ...
UNION ALL
SELECT 2 ...
UNION ALL
SELECT 3 ...
LIMIT N;

Postgres evaluates nested SELECTs in order and stops as soon as enough rows have been returned. The rest is never executed.

This optimization does not happen with an outer ORDER BY, which forces Postgres to collect all candidate rows and sort before applying the LIMIT. Nor does it work for UNION (instead of UNION ALL) which also considers all rows before removing duplicates and, finally, the LIMIT.

You need parentheses around each nested SELECT that has ORDER BY or LIMIT in addition to the outer LIMIT.

Related: