PostgreSQL – Slow MAX and MIN Query in View

execution-planperformancepostgresqlpostgresql-performanceview

I have a view and it's a left join between two tables (table_a has hundreds of millions of rows, table_b has ~2 millions). The query

SELECT MIN("id") AS "min_id",MAX("id") AS "max_id" FROM "the_view" LIMIT 1

is very slow on the view (return in 30~60 seconds), but runs fast on both table.

Both table are using id as primary key. They joins together on id as well. I have the same structure in mysql, and it runs well (return in ms).

Is there anything I should set in Postgres to make this work?

Here's the table schema:

create table table_a (
id int primary key,
name varchar(40)
);

create table table_b(
id int primary key,
age_group varchar(5)
);

create view the_view as
select a.id, a.name, b.age_group
from table_a as a
left join table_b as b using(id);

Any ideas where to look into? Thank you.

Here's the explain analyze on the slow query:

psql=# explain analyze
       SELECT MIN("id") AS "min_id",MAX("id") AS "max_id" FROM "the_view" LIMIT 1;
  Limit  (cost=3296039.61..3296039.62 rows=1 width=4) (actual time=265398.380..265398.381 rows=1 loops=1)
   ->  Aggregate  (cost=3296039.61..3296039.62 rows=1 width=4) (actual time=265398.374..265398.374 rows=1 loops=1)
    ->  Index Only Scan using table_a_pkey on table_a a  (cost=0.57..2763865.85 rows=106434752 width=4) (actual time=0.158..133179.832 rows=106434713 loops=1)
        Heap Fetches: 0
Total runtime: 265398.506 ms

Best Answer

Bound to query

If you can only change the view, not the query: this is 100 % equivalent using a correlated subquery instead of the LEFT JOIN:

CREATE VIEW the_view_new AS
SELECT a.id, a.name
    , (SELECT age_group FROM table_b WHERE id = a.id) AS age_group
FROM   table_a a;

Your query as is just reads top and bottom row from the index now, IOW blazingly fast. This is a workaround, gets more complicated with more columns and may exhibit weak spots with other queries.

Better query

Your query:

SELECT MIN("id") AS "min_id",MAX("id") AS "max_id" FROM "the_view" LIMIT 1;

Only needs table_a. In the view, table_b is joined with a LEFT JOIN. Obviously, the query planner does realize that table_b is not needed for the result of min() and max() after a LEFT JOIN (contrary to what I assumed at first). The query plan does not mention table_b.
It can also only return a single row, so LIMIT 1 is only complicating matters for the query planner further, to no effect. (Seems not to be the tipping point here.)
The obfuscation confuses the planner enough to read the whole 100 million rows from the index (rows=106434752), while it would only need to look up first and last row. That's a whole lot of pointless work.

This is simpler, cleaner and faster, while returning the same:

SELECT MIN(id) AS min_id, MAX(id) AS max_id FROM table_a;

As you can see in the EXPLAIN ANALYZE output of this SQL Fiddle with 100k rows in table_a:

Simple query

(actual time=0.010..0.010 rows=1 loops=1)

Your query:

(actual time=0.012..400.498 rows=100000 loops=1)

This looks like a weakness of the query planner in any case. We should repeat the test with pg 9.4 and possibly file a bug report ...

Bound to use the view

If you are bound to use the view (as commented), there is a workaround to convince the query planner (id must be NOT NULL - true in your case since PK):

SELECT (SELECT id AS min_id FROM the_view ORDER BY id ASC  LIMIT 1) AS min_id
     , (SELECT id AS max_id FROM the_view ORDER BY id DESC LIMIT 1) AS max_id;

Check the query plans in the fiddle: two times rows=1.

Aside: I suggest you take a look at the chapter Identifiers and Key Words in the manual.

Best Answer

Bound to query

Better query

Bound to use the view

Related Solutions

Postgresql – Optimizing ORDER BY in a full text search query

Postgresql – How to speed up a Postgres query containing lots of Joins with an ILIKE condition

Related Question