Postgresql – Filtering UNION ALL result is much slower than filtering each subquery

performancepostgresqlpostgresql-performanceunionview

(edit: see end for a simpler example)

I'm searching in a table named "cases" (135k rows, 29 columns). Some of the rows in this table have a type of parent-child relationship (of different types), which means that for these records a mix of parent/child fields must be used for filtering and display.

I have identified four different parent-child relationships and created views for them:

  • caselist_no_specials: not a child record, use record data as is; 116106 rows total.
  • caselist_disputes_with_ipr: child record; 138 rows total.
  • caselist_mark_children: child record; 18132 rows total.
  • caselist_design_children: child record; 671 rows total.

The results from these views do not overlap and together cover 100% of the table.

When I select a union of all of them, and separately filter each view, the query takes about 9 ms. Selecting a union of all views and filtering the result of that takes about 500 ms.

I have also tested this without views, inlining the queries they contain, which did not produce a measurable improvement.

This is the fast query (explain):

  SELECT  c.*
    FROM  caselist_no_specials c
    JOIN  case_clients cacl ON cacl.case_id = c.main_id
   WHERE  cacl.client_id = 12046

   UNION ALL

  SELECT  c.*
    FROM  caselist_disputes_with_ipr c
    JOIN  case_clients cacl ON cacl.case_id = c.main_id
   WHERE  cacl.client_id = 12046

   UNION ALL

  SELECT  c.*
    FROM  caselist_mark_children c
    JOIN  case_clients cacl ON cacl.case_id = c.main_id
   WHERE  cacl.client_id = 12046

   UNION ALL

  SELECT  c.*
    FROM  caselist_design_children c
    JOIN  case_clients cacl ON cacl.case_id = c.main_id
   WHERE  cacl.client_id = 12046

ORDER BY  sort_nr,
          id;

As you can see, the join and filter is duplicated for each view. Trying to avoid the duplication yielded this query, which takes a lot longer (explain):

  SELECT  x.*
    FROM  (
              SELECT * FROM caselist_no_specials
              UNION ALL
              SELECT * FROM caselist_disputes_with_ipr
              UNION ALL
              SELECT * FROM caselist_mark_children
              UNION ALL
              SELECT * FROM caselist_design_children
          ) x
    JOIN  case_clients cacl ON cacl.case_id = x.main_id
   WHERE  cacl.client_id = 12046
ORDER BY  x.sort_nr,
          x.id;

Is it possible to somehow let PostgreSQL know that the filter/join on the outer query can be applied to the inner subquery?

Or is there any other way to avoid filtering each view separately? The user-facing form for this query has over 20 filter fields, and there can be JOINs with up to 14 additional tables.

PostgreSQL is version 9.4.7 running on Linux.


EDIT: I've created a much simplified example, simply partitioning the original table using 3 views, but not involving other tables (and the query plans for both variants). In retrospect, this is what I should have used as an example in the first place.

Best Answer

What you're asking the DB to do in Query one is: Give me ALL from table A FILTERED Give me ALL from table B FILTERED Give me ALL from table C FILTERED Give me ALL from table D FILTERED And then Union.

In the second query you first get all the data, and only after that you do the join and the filter. JOIN and WHERE on a UNION query, which doesn't really enable you to index anything, obviously runs slower. (It has nothing to do with the Server variant or the OS).