Most effective architecture to apply mulitple functions

oracle-11g-r2

I have a table that is the only source for a second table. Currently the second table is created using a CTAS (create table as select) statement that pulls in all the columns from the first table and includes a dozen more columns that are functions that we have written. These functions take a column or two from the source table and transform them into new data elements.

In some cases, the columns that are used as parameters for the functions are sparsely populated. The functions are compiled natively and utilize the result_cache. The first thing that happens in each function is a null check on the input parameters and a return of NULL.

I have a few questions:

  • I have full control over the architecture of the source table (as I create it in an earlier process). Is there anything I can do with the source tables (index, partitions, etc.) that will help the CTAS run more efficiently?
  • Should I be using a CTAS at all or would an UPDATE be more efficicent (my gut here says no way, but I'd like some validation)
  • Is there something else I can do to the functions to make them more efficient?
  • Is there anything else I should be thinking about?

Best Answer

I have full control over the architecture of the source table (as I create it in an earlier process). Is there anything I can do with the source tables (index, partitions, etc.) that will help the CTAS run more efficiently?

Optimizing a CTAS is generally the same as optimizing the SELECT part of the CTAS. Run an explain on it to see what it's doing. The CREATE part is just sequentially writing the output of the SELECT so it wouldn't be the cause of slowness.

Should I be using a CTAS at all or would an UPDATE be more efficicent (my gut here says no way, but I'd like some validation)

It depends on the size of the UPDATE. If the target table is indexed and you have few updates then the amount of work to do the UPDATE would be less. Specifics matter here as there's not general rule for which is faster. For an extreme example, if you have 100M rows and you're only changing one row via and indexed UPDATE, the UPDATE would clearly be faster.

In the end what counts is the total amount of I/O operations that will happen (both reads and writes). You can either analyze it by studying the EXPLAIN plans or test both approaches to get stats on which takes longer.

Is there something else I can do to the functions to make them more efficient?

Well you could remove them entirely. Switching between SQL and PL/SQL is fairly quick and isn't really much of a bottle next if it's in the SELECT columns. If it's part of the WHERE clause and you're filtering a column on a the result of a PL/SQL function it can be very slow because they're usually not indexed (though Oracle does allow for indexes on deterministic functions).

If it makes logical sense, you can mark the function(s) as DETERMINISTIC. This tells Oracle that the result will always be the same for the same inputs (like a mathematical function, e.g. x^2). This means Oracle can cache the results between function calls. That may speed things up.

You can also try wrapping the function calls in a CASE/WHEN block to only call the function when it's not null. Again I don't think it makes that much difference as SQL -> PL/SQL context switches are pretty fast but it's something to try out:

CASE
    WHEN my_column IS NULL THEN NULL
    ELSE my_func(my_column)
END AS result_column

Is there anything else I should be thinking about?

If the functions are querying lookup tables (ex: fn_country_name(country_cd IN VARCHAR2)) then you can consider replacing them with outer joins to the lookup tables directly. That should be faster than a function call as the optimizer can use a hash join (vs. a nested loop if it's in the function call).

Finally if things are really slow/confusing it's good to take a step back and see what the process is supposed to do end to end. You may need to go back a step in your processing chain to make things more efficient.