How is a join performed by a database engine

database-internalsjoin;

How is a join between two tables actually performed by a database engine?

I am sure that listing one tuple against all tuples of the other table cannot be the way to perform the join; it's just a way of understanding what the output will look like. Otherwise, for two tables containing 1000 tuples each, a join would prepare an intermediate data set of 1000*1000 tuples! That's hard to believe.

Best Answer

There are a number of ways depending on what the DBMS thinks you want versus what is helped in the database.

  1. Read a row from the first table, then read any matching rows from the second table. This is the preferred method when you are requesting a very few rows and there are indexes to support the read of the second table.
  2. Matching index scan, select the required set from an index of the first table, then match this set to an index of the second table (usually after a sort), then fetch the required rows. Usually this method is used where a substantial number of rows are requested in a particular sequence.
  3. Brute force, get all the rows from the first table and sort them into the right sequence, then get all the rows from the second table and sort them into the right sequence, then merge the results. Usually this method is used when there are no usable indexes to support the join. Its a performance pig and only used where nothing else will do.

There are many variations on these three methods which vary from RDBMS to RDBMS and the more expensive commercial databases have dozens of subtle optimizations which they will use depending on the circumstances.