How is a join performed by a database engine

database-internalsjoin;

How is a join between two tables actually performed by a database engine?

I am sure that listing one tuple against all tuples of the other table cannot be the way to perform the join; it's just a way of understanding what the output will look like. Otherwise, for two tables containing 1000 tuples each, a join would prepare an intermediate data set of 1000*1000 tuples! That's hard to believe.

Best Answer

There are a number of ways depending on what the DBMS thinks you want versus what is helped in the database.

Read a row from the first table, then read any matching rows from the second table. This is the preferred method when you are requesting a very few rows and there are indexes to support the read of the second table.
Matching index scan, select the required set from an index of the first table, then match this set to an index of the second table (usually after a sort), then fetch the required rows. Usually this method is used where a substantial number of rows are requested in a particular sequence.
Brute force, get all the rows from the first table and sort them into the right sequence, then get all the rows from the second table and sort them into the right sequence, then merge the results. Usually this method is used when there are no usable indexes to support the join. Its a performance pig and only used where nothing else will do.

There are many variations on these three methods which vary from RDBMS to RDBMS and the more expensive commercial databases have dozens of subtle optimizations which they will use depending on the circumstances.

Related Solutions

Postgresql join – too long when no results found

I would create an index on (monitor_id, timestamp). It should be enough.

If not, I'd use LATERAL JOIN.

SELECT
    T.id
    ,T.timestamp
    ,T.value
FROM
    controller_monitor
    INNER JOIN LATERAL
    (
        SELECT
            controller_monitor_reading.id
            ,controller_monitor_reading.timestamp
            ,controller_monitor_reading.value
        FROM
            controller_monitor_reading
        WHERE
            controller_monitor.id = controller_monitor_reading.monitor_id
            AND controller_monitor_reading.timestamp < '2015-08-13T13:54:35.139702'::timestamp
        ORDER BY
            controller_monitor_reading.timestamp DESC LIMIT 1
    ) AS T ON true
WHERE
    controller_monitor.tag = 'DPVESELAYA.PP14905.BOXDOOR'

Since controller_monitor.tag is unique, there should be one seek in controller_monitor table plus one seek in controller_monitor_reading table using index on (monitor_id, timestamp).

update

I'm not familiar with partitioned tables and it looks like you have your main table controller_monitor_reading partitioned into controller_monitor_reading_p2015_03, controller_monitor_reading_p2015_04, ... etc. Make sure that index on (monitor_id, timestamp) exists on all of these tables.

DB2 LUW: how to influence query planner’s choice of join

Hash joins require an "equijoin predicate". So I rewrote the query as an explicit join (instead of IN...(subselect)), and instead of using A.key = B.key as the join condition, I used A.key > B.key - 1 AND A.key < B.key +1.

Best Answer

Related Solutions

Postgresql join – too long when no results found

DB2 LUW: how to influence query planner’s choice of join

Related Question