Are there significant differences with 500+ million row tables in Oracle

oracle-11g-r2

I am in a database designer in a data warehouse environment. I am used to dealing with tables with a maximum of 1 millions rows and am now faced with tables with more than half a billion rows. Are there any significant differences with the tools in the "efficiency toolbox"? Can I trust my previous knowledge of indexes, partitions, and the like or are some of these specific tools more of a hindrance than a help with such large data? Any other tips for dealing with the tables?

(Already found a great post on Updating 700 million rows to same value)

Best Answer

The fundamentals of indexing etc. all work in exactly the same way, so strictly speaking the only difference is the cost of getting this wrong!

That said, here's a (not necessarily complete) list of things worth bearing in mind:

B-tree indexes are likely to have an extra level in them, so the cost of using them is slightly higher. However, in a DW you should be using bitmap indexes (assuming you've got enterprise edition)
It'll take a lot longer to calculate stats for the whole table - to the point where it may not be possible in the normal overnight window. This can be overcome by
- Using a smaller estimate_percent when gathering stats so less of the table is sampled.
- Using incremental stats gathering (only relevant if you have global indexes on partitioned tables however)
Histograms for indexes are limited to 254 buckets. More rows likely means more distinct values, meaning that "nearly popular" values can be a larger issue for skewed data.
The chances your whole table will fit in the buffer cache approaches nil, meaning you're more likely to have more physical (disk) reads. Your normal working set may also be too large to be cached.
Partitioning can be your friend - if you get it right! If you're usually modifying and query data across multiple partitions then it can cost you more than plain tables.
Materialized views can be very useful for reducing your working set. e.g. if you have 10+ years worth of data, but the vast majority of user queries are just against the past 2 years, then creating an MV limited to just this data can be a big help.
The bigger the dataabase, the less likely it is the business will (be able to) fund a test database that is a full duplicate of the live environment. This makes it harder to reproduce performance issues in test as slow queries may be due to the scale and/or physical storage of the data. You can't count on being able to extrapolate query results from a much smaller test DB to corresponding performance in live.

If you're not already familiar with reading and understanding execution plans I'd spend some time learning these: your bound to run into performance issues at some point so knowing how to diagnose the problem correctly will become more important as it's harder to add new indexes or make schema changes when your row counts are larger.

Related Solutions

Row level restore with Oracle Recovery Manager (RMAN)

The answer to your question is no, however....

It sounds like a flashback query is what you need. Query the data as of a time when it existed and when it returns the correct data, insert it into the current table. This solution does require space in the UNDO tablespace sufficient to meet your UNDO_RETENTION requirements. It also doesn't use RMAN, but is significantly simpler than importing the entire schema.

INSERT INTO T1 (
   SELECT * FROM T1 AS OF TIMESTAMP sysdate-1 
   WHERE MyDateColumn = to_date('05/25/2011','MM/DD/YYYY')
);

How make convincing case to DBA to use Oracle external tables for 500 MB to 5 GB files

IMO, an external table is much easier to manage and flexible than a SQLLoader script. So if you're already doing a recurring SQLLoader load, I see no possibilities of evil in switching to external tables.

By doing the external table, even though the base of the table resides in a text file outside the database, the data can be accessed from within the database, no need for a separate SQLLoader tool/scripts. So you can write procedures/packages to manipulate the data straight from the external table. No need to load it anywhere...as long as it's in the directory you have set, it's already "loaded".

As a DBA, I'd much prefer managing a DML script using an external table than having to manage a SQLLoader script. But SQLLoader has been around a while....if your DBA has too, it may be a tough sell. :)

Best Answer

Related Solutions

Row level restore with Oracle Recovery Manager (RMAN)

How make convincing case to DBA to use Oracle external tables for 500 MB to 5 GB files

Related Question