Inserting to table: Insert rows with Left Outer Join or Insert rows then Update

optimizationperformance

I have a table of Orders and another table of Deliveries. The Orders table will contain a reference to the Deliveries table if there is a corresponding row.
Not every row in Orders will have a row in Deliveries (digital media, warranties, etc), but if an order does have a delivery, it will have only one. I am trying to combine data from these two tables into a third table (OrderInformation) to be used for reporting.

I can create the final table one of two ways:

  • Inserting all orders into the OrderInformation table first and then updating only the rows that have a matching Deliveries record in a second statement.

    OR

  • Inserting all values into the table at once using a LEFT OUTER JOIN between Orders and Deliveries.

I tested this out on my own data and the 'LEFT OUTER JOIN' query consistently outperformed the 'UPDATE' query by 200%, which surprised me.

Which option would perform better in general? And why does it perform better that way? Is performance contingent on join conditions, the type of data involved, etc?

Best Answer

When Orders and Deliveries are both properly indexed, the second query will generally perform better. There are two main considerations here:

Generally speaking, and without seeing the details of your tables - what probably happens with the INSERT/UPDATE construct is that it will first allocate a number of pages in the database in the initial INSERT. But when you UPDATE those records, the added new information has to fit inside those existing rows in the OrderInformation table's pages, which causes page splits, which are very I/O intensive and can fragment your table and/or indexes.

The second query (single INSERT using LEFT JOIN) is considerably better practice, since it minimizes the locking requirements. Imagine if another simultaneous process creates a new row in Orders and then a related row in Deliveries between the time of the INSERT and the UPDATE statements in the first example - you'll get a foreign key constraint violation unless you've put all of this in a potentially large transaction, which itself implies a bit of overhead.