PostgreSQL – Slow MAX and MIN Query in View

execution-planperformancepostgresqlpostgresql-performanceview

I have a view and it's a left join between two tables (table_a has hundreds of millions of rows, table_b has ~2 millions). The query

SELECT MIN("id") AS "min_id",MAX("id") AS "max_id" FROM "the_view" LIMIT 1

is very slow on the view (return in 30~60 seconds), but runs fast on both table.

Both table are using id as primary key. They joins together on id as well. I have the same structure in mysql, and it runs well (return in ms).

Is there anything I should set in Postgres to make this work?

Here's the table schema:

create table table_a (
id int primary key,
name varchar(40)
);

create table table_b(
id int primary key,
age_group varchar(5)
);

create view the_view as
select a.id, a.name, b.age_group
from table_a as a
left join table_b as b using(id);

Any ideas where to look into? Thank you.

Here's the explain analyze on the slow query:

psql=# explain analyze
       SELECT MIN("id") AS "min_id",MAX("id") AS "max_id" FROM "the_view" LIMIT 1;
  Limit  (cost=3296039.61..3296039.62 rows=1 width=4) (actual time=265398.380..265398.381 rows=1 loops=1)
   ->  Aggregate  (cost=3296039.61..3296039.62 rows=1 width=4) (actual time=265398.374..265398.374 rows=1 loops=1)
    ->  Index Only Scan using table_a_pkey on table_a a  (cost=0.57..2763865.85 rows=106434752 width=4) (actual time=0.158..133179.832 rows=106434713 loops=1)
        Heap Fetches: 0
Total runtime: 265398.506 ms

Best Answer

Bound to query

If you can only change the view, not the query: this is 100 % equivalent using a correlated subquery instead of the LEFT JOIN:

CREATE VIEW the_view_new AS
SELECT a.id, a.name
    , (SELECT age_group FROM table_b WHERE id = a.id) AS age_group
FROM   table_a a;

Your query as is just reads top and bottom row from the index now, IOW blazingly fast. This is a workaround, gets more complicated with more columns and may exhibit weak spots with other queries.

Better query

Your query:

SELECT MIN("id") AS "min_id",MAX("id") AS "max_id" FROM "the_view" LIMIT 1;
  • Only needs table_a. In the view, table_b is joined with a LEFT JOIN. Obviously, the query planner does realize that table_b is not needed for the result of min() and max() after a LEFT JOIN (contrary to what I assumed at first). The query plan does not mention table_b.

  • It can also only return a single row, so LIMIT 1 is only complicating matters for the query planner further, to no effect. (Seems not to be the tipping point here.)

  • The obfuscation confuses the planner enough to read the whole 100 million rows from the index (rows=106434752), while it would only need to look up first and last row. That's a whole lot of pointless work.

This is simpler, cleaner and faster, while returning the same:

SELECT MIN(id) AS min_id, MAX(id) AS max_id FROM table_a;

As you can see in the EXPLAIN ANALYZE output of this SQL Fiddle with 100k rows in table_a:

Simple query

(actual time=0.010..0.010 rows=1 loops=1)

Your query:

(actual time=0.012..400.498 rows=100000 loops=1)

This looks like a weakness of the query planner in any case. We should repeat the test with pg 9.4 and possibly file a bug report ...

Bound to use the view

If you are bound to use the view (as commented), there is a workaround to convince the query planner (id must be NOT NULL - true in your case since PK):

SELECT (SELECT id AS min_id FROM the_view ORDER BY id ASC  LIMIT 1) AS min_id
     , (SELECT id AS max_id FROM the_view ORDER BY id DESC LIMIT 1) AS max_id;

Check the query plans in the fiddle: two times rows=1.

Aside: I suggest you take a look at the chapter Identifiers and Key Words in the manual.