PostgreSQL – Algorithm for Offset/Limit Over UNION ALL Result Set

database-internalsexecution-planpostgresqlpostgresql-9.4

I have two result sets:

rs1:

     id       name
   serial     text
________________
     1        Nick
....................
  1233112     Pete

rs2:

     id       name
   serial     text
________________
  123121      Mike
....................
 221233112   Junior

If we write a query:

SELECT *
FROM
(

    SELECT *
    FROM rs1

    UNION ALL

    SELECT *
    FROM rs2

) as rs
OFFSET 100000 LIMIT 10;

How the result of the query computed? What algorithm is going to be used?

I believe that the server performs lazy evaluation in the sense of not loading the entire union, iterating over it and then returning the required result.

I would be glad if you described algorithms using in other SQL servers as an addition.

UPD: I'm quite new in database internals and don't know if it's possible to ask the sql-server itself about the exectuin plan of the query.

Here is a query plan:

"Limit  (cost=77.11..77.88 rows=10 width=522)"
"  ->  Append  (cost=0.00..140.18 rows=1818 width=522)"
"        ->  Seq Scan on tbl  (cost=0.00..70.09 rows=909 width=522)"
"        ->  Seq Scan on tbl tbl_1  (cost=0.00..70.09 rows=909 width=522)"

The saqme query with order by id:

"Limit  (cost=33.16..36.42 rows=10 width=522)"
"  ->  Merge Append  (cost=0.56..593.16 rows=1818 width=522)"
"        Sort Key: tbl.id"
"        ->  Index Scan Backward using pk_tabl on tbl (cost=0.28..285.21 rows=909 width=522)"
"        ->  Index Scan Backward using pk_tbl on tbl tbl_1  (cost=0.28..285.21 rows=909 width=522)"

Best Answer

When limiting clauses like OFFSET are applied depends on the query, particularly what the result is ordered by.

If you are ordering by columns that are also being used to filter the results (in a WHERE clause for instance) and are indexed then the limitation will often be applied early. The query planner will try to find just the first rows that match the search condition. Otherwise the limit is usually applied after everything else, so the server is likely to spool the results into a temporary store then sort and apply your OFFSET/LIMIT filter.

For a UNION query on its own, it depends on the brains in the query planner so you might have to look at its plan output to know (I'm not a postgres expert), and again it may depend on the rest of the query details. However the work is done (which will affect performance) the output will be the same though: the limitation will be applied to the complete result.

In the example you give the UNION query is actually providing a derived table, named rs. The limitation will definitely be applied to the whole result here so it is functionally the same as if you had done SELECT * FROM <some_table> OFFSET 100000 LIMIT 10. Because they are the same you could rewrite your example as simply:

SELECT * FROM rs1
UNION ALL
SELECT * FROM rs2
OFFSET 100000 LIMIT 10;

and the results would be the same.

One important note: you have not specified any ordering clause here. This may be because you are giving a minimal example so forgive me if I'm telling you something you already know, but without an ordering clause you should assume the output order to be arbitrary, possibly random, and likely to change between runs of the same query. Sometimes this is fine (you just want the data and you don't care about the order) but as soon as you are using things like OFFSET/LIMIT/TOP/... you do care about the order because the query after OFFSET 10000 LIMIT 10 is likely to be OFFSET 10010 LIMIT 10 and you want that to definitely want ten different rows by some ordering (not an arbitrary ten that could in theory end up being the same ten as last time). In practise the ordering of output is usually more stable than this, but it isn't always by any means so you should always provide an ordering clause if a sudden change in output order would cause undesirable behaviour.