I have a view and it's a left join between two tables (table_a has hundreds of millions of rows, table_b has ~2 millions). The query
SELECT MIN("id") AS "min_id",MAX("id") AS "max_id" FROM "the_view" LIMIT 1
is very slow on the view (return in 30~60 seconds), but runs fast on both table.
Both table are using id as primary key. They joins together on id as well. I have the same structure in mysql, and it runs well (return in ms).
Is there anything I should set in Postgres to make this work?
Here's the table schema:
create table table_a (
id int primary key,
name varchar(40)
);
create table table_b(
id int primary key,
age_group varchar(5)
);
create view the_view as
select a.id, a.name, b.age_group
from table_a as a
left join table_b as b using(id);
Any ideas where to look into? Thank you.
Here's the explain analyze on the slow query:
psql=# explain analyze
SELECT MIN("id") AS "min_id",MAX("id") AS "max_id" FROM "the_view" LIMIT 1;
Limit (cost=3296039.61..3296039.62 rows=1 width=4) (actual time=265398.380..265398.381 rows=1 loops=1)
-> Aggregate (cost=3296039.61..3296039.62 rows=1 width=4) (actual time=265398.374..265398.374 rows=1 loops=1)
-> Index Only Scan using table_a_pkey on table_a a (cost=0.57..2763865.85 rows=106434752 width=4) (actual time=0.158..133179.832 rows=106434713 loops=1)
Heap Fetches: 0
Total runtime: 265398.506 ms
Best Answer
Bound to query
If you can only change the view, not the query: this is 100 % equivalent using a correlated subquery instead of the
LEFT JOIN
:Your query as is just reads top and bottom row from the index now, IOW blazingly fast. This is a workaround, gets more complicated with more columns and may exhibit weak spots with other queries.
Better query
Your query:
Only needs
table_a
. In the view,table_b
is joined with aLEFT JOIN
. Obviously, the query planner does realize thattable_b
is not needed for the result ofmin()
andmax()
after aLEFT JOIN
(contrary to what I assumed at first). The query plan does not mentiontable_b
.It can also only return a single row, so
LIMIT 1
is only complicating matters for the query planner further, to no effect. (Seems not to be the tipping point here.)The obfuscation confuses the planner enough to read the whole 100 million rows from the index (
rows=106434752
), while it would only need to look up first and last row. That's a whole lot of pointless work.This is simpler, cleaner and faster, while returning the same:
As you can see in the EXPLAIN ANALYZE output of this SQL Fiddle with 100k rows in
table_a
:Simple query
Your query:
This looks like a weakness of the query planner in any case. We should repeat the test with pg 9.4 and possibly file a bug report ...
Bound to use the view
If you are bound to use the view (as commented), there is a workaround to convince the query planner (
id
must be NOT NULL - true in your case since PK):Check the query plans in the fiddle: two times
rows=1
.Aside: I suggest you take a look at the chapter Identifiers and Key Words in the manual.