How to control the order of a query regarding to inner join

informixjoin;performancequery-performance

I have 3 tables.

They have

table_a 5m records

table_b 20m records

table_c 20m records

I'm performing the query below.

SELECT * FROM table_a a 
INNER JOIN table_b b ON a.a_id = b.a_id AND b.b_come='100' 
INNER JOIN table_c c ON b.c_id = c.c_id AND c.c_type IN ('A','B')
WHERE date(a.a_dt) = '01/01/2012 00:00:00.000'

The query is quite slow. I think it is because the data in its nature they have

about 1000 records of a.a_id = b.a_id

and

about 20m*20m records of b.c_id = c.c_id.

I think I should get the result of a.a_id = b.a_id then do the one of b.c_id = c.c_id so that the performance could be a lot better. But it seems INNER JOIN has no concept of order. How should I do this to improve the performance?

Best Answer

The easiest way to influence a query plan and make it run faster is to have good indexes and ensure the statistic on those indexes are up-to-date. In your case have indexes on the primary and foreign keys in each table. It would be best if you included b_come and c_type in their respecitive indeces, too. (Sorry, I'm not strong on Informix; if the syntax allows the include clause or filtered indexes investigate and implement these.)

Wrapping the date() function around a_dt generally precludes any use of an index on that column. If makes it "non-SARGable" in the jargon. It may be better to structure it as

WHERE a.a_dt >= '01/01/2012 00:00:00.000'
and   a.a_dt <  '01/02/2012 00:00:00.000' -- or whatever the day after 1/1 is in your locale.

Bound to query

If you can only change the view, not the query: this is 100 % equivalent using a correlated subquery instead of the LEFT JOIN:

CREATE VIEW the_view_new AS
SELECT a.id, a.name
    , (SELECT age_group FROM table_b WHERE id = a.id) AS age_group
FROM   table_a a;

Your query as is just reads top and bottom row from the index now, IOW blazingly fast. This is a workaround, gets more complicated with more columns and may exhibit weak spots with other queries.

Better query

Your query:

SELECT MIN("id") AS "min_id",MAX("id") AS "max_id" FROM "the_view" LIMIT 1;

Only needs table_a. In the view, table_b is joined with a LEFT JOIN. Obviously, the query planner does realize that table_b is not needed for the result of min() and max() after a LEFT JOIN (contrary to what I assumed at first). The query plan does not mention table_b.
It can also only return a single row, so LIMIT 1 is only complicating matters for the query planner further, to no effect. (Seems not to be the tipping point here.)
The obfuscation confuses the planner enough to read the whole 100 million rows from the index (rows=106434752), while it would only need to look up first and last row. That's a whole lot of pointless work.

This is simpler, cleaner and faster, while returning the same:

SELECT MIN(id) AS min_id, MAX(id) AS max_id FROM table_a;

As you can see in the EXPLAIN ANALYZE output of this SQL Fiddle with 100k rows in table_a:

Simple query

(actual time=0.010..0.010 rows=1 loops=1)

Your query:

(actual time=0.012..400.498 rows=100000 loops=1)

This looks like a weakness of the query planner in any case. We should repeat the test with pg 9.4 and possibly file a bug report ...

Bound to use the view

If you are bound to use the view (as commented), there is a workaround to convince the query planner (id must be NOT NULL - true in your case since PK):

SELECT (SELECT id AS min_id FROM the_view ORDER BY id ASC  LIMIT 1) AS min_id
     , (SELECT id AS max_id FROM the_view ORDER BY id DESC LIMIT 1) AS max_id;

Check the query plans in the fiddle: two times rows=1.

Aside: I suggest you take a look at the chapter Identifiers and Key Words in the manual.

Best Answer

Related Solutions

Sql-server – Forcing Join Order

PostgreSQL – Slow MAX and MIN Query in View

Bound to query

Better query

Bound to use the view

Related Question