Oracle – WHY do an ORDER BY for INSERTs AS SELECT

best practicesinsertoracleorder-by

I have an application that was migrated from MSSQL to Oracle 11G.

It needs re-design but that's outside the scope of the project.

In short, the application gets legacy data, builds a parent table from it, then creates extract tables from those two tables. The extracts are used by Oracle BI EE as fact tables.

What I don't understand is, why would you add ORDER BY to the extract INSERTs? Is there any practical ORACLE reason, or OBIEE reason, to order INSERTs for source tables?

— everything below here is added detail:

A more detailed view of the flow of the data is:
Take legacy detail data (CHAR) and read / convert into an Oracle table (HISTORY). Table includes multiple records per individual over time, with a unique identifier per individual (OFFICER_ID).

Append some current data.

Parse the HISTORY table one OFFICER ID at a time, ordered by dates, extract some calculated data over the time for that officer and add his/her record to a separate table (OFFICER).
— basically, make a parent table from details. Both tables are indexed with separate indexes for OFFICER_ID, the table record id (PK), as well as some date fields to allow ordering by date.

From there, we just build one extract after another as separate tables, using INSERT INTO extract_xxx AS SELECT x,y,z FROM OFFICER, HISTORY, blah, blah ORDER BY OFFICER_ID .

Each extract features a join on OFFICER to HISTORY, some additional lookup tables and conditions for the particular extract.

Each extract table includes the OFFICER.OFFICER_ID, the OFFICER record number, and the HISTORY record number, and whatever additional fields. Each extract has its own sets of indexes, but all include separate indexes against the OFFICER.OFFICER_ID, the copy of the OFFICER record ID, and the copy of the HISTORY record id, and the additional fields. Each extract table INSERT has an ORDER BY.

The extract tables are fact tables for an Oracle BI EE front end for the end users.

The initial import of the HISTORY data is the only INSERT that isn't ordered. The parent table is ordered by default, because they build the OFFICER table by ordered distinct officer ID, in date order.

With the initial ordering, all of the individual table indexes, and the fact that OBIEE has its own powerful grouping, ordering, etc. capabilities, WHY would they put an ORDER BY on each extract table?

Or are these maybe left over from the original implementation? I have limited access to the original (MSSQL) authors, but I don't know if they know some of the reasons they did what they did for MSSQL — their migration was from FoxPro, which was migrated from Unix.

This was much longer than I envisioned.


Things I forgot to mention:
One reason I'm interested is to optimize the inserts.

1) This is pretty much a static database.
They will run it once a year to add another 75k records, and regenerate everything. This would normally not be a problem, but the way it runs now, it can take DAYS (well, WEEKS) to generate the extracts.

2) The database and ETL runs are also isolated — I'm not sure what the correct term is. The ETL runs on its own, no other processes are accessing the tables to write or update. Only when everything is done do they use OBIEE to see the data.

For the customer, they won't care if it takes days and days… they only update in yearly. For the developers, they need a run completed in enough time to see the results. Weeks is not an option. DAYS is not an option. And the guy they have to wrangle code (me) was supposed to be a app developer, not a code-anthropoligist/clairvoyant DBA.

I think that there is definitely something wrong with the instance, or constraints, or indexes… I had a four cycle run finishing in under 20 hours… and I thought that THAT was too long. Some developer submitted some code changes, and I found a constraint that disabled at one point and never re-enabled… and I re-enabled it. So… I'm looking for any place I can recoup the lost performance, without leaving things disabled.

Thank you for the insights… if I ever get enough time, I'd like to try several things, including running w/ straight queries w/out the ORDER BYs – and see what the difference in the ETL is, and what the effects on the OBIEE are.

Best Answer

It can make sense to insert data in order. There's lots of caveats to this though. If the data isn't frequently updated and if you're using certain types of tables or indexes (e.g. IOT, clustered indexes).

Being in order means that if you're doing a range scan of the ordered column (e.g. BETWEEN x AND Y) then the data is more likely to be in a contiguous set of blocks.