A lot of data in the table is only accessed for 4 days or so, how to handle it afterwards

oracletable

We have a table that accumulates messages from local air traffic. It's approximately 5k-20k data rows a day that will only be actively accessed for a week at most. Afterwards it's just a log in case of some emergency (like inspection) and basically a very dead chunk of data. Like, archaeologically dead.

What is the optimal way to deal with it? Our "DB guy" argues that "it's all indexed" and there is no harm in having millions upon millions of accumulated unused records, but I am not really convinced. Is this really a good idea to allow the table to grow without limits?

UPD: We are using Oracle 11. The access patterns are like this:

1) A couple of applications keep track of the new rows added, querying on id > lastId and indexed TM interval of last day each approximately 2-3 times a minute

2) A couple of rows per minute are added to this table and then actively parsed into other tables

3) every time a user selects a message or does a search (I guess 10-20 times a minute per app running) a query is done into this table with custom where on approx. 10 indexed fields.

Best Answer

Yes, there is a cost to having lots of cold data in actively-queried tables. One obvious case is that bigger tables have deeper indexes (i.e. more pages between the root and the leaves) so require more IO to read and write.

I'd suggest you use partitioning. Keep the active partitions in the current table and swap the others out. Ideally you'd then use a maintenance task to move the historical data to a separate DB for archive. Make it clear to your external users what the time cut-off is and allow them to query either accordingly.

Related Solutions

Oracle 11g – Best Way to Delete Very Large Recordsets

The logic with 'A' and 'B' might be "hidden" behind a virtual column on which you could do the partitioning:

alter session set nls_date_format = 'yyyy-mm-dd';
drop   table tq84_partitioned_table;

create table tq84_partitioned_table (
  status varchar2(1)          not null check (status in ('A', 'B')),
  date_a          date        not null,
  date_b          date        not null,
  date_too_old    date as
                       (  case status
                                 when 'A' then add_months(date_a, -7*12)
                                 when 'B' then            date_b
                                 end
                        ) virtual,
  data            varchar2(100) 
)
partition   by range  (date_too_old) 
( 
  partition p_before_2000_10 values less than (date '2000-10-01'),
  partition p_before_2000_11 values less than (date '2000-11-01'),
  partition p_before_2000_12 values less than (date '2000-12-01'),
  --
  partition p_before_2001_01 values less than (date '2001-01-01'),
  partition p_before_2001_02 values less than (date '2001-02-01'),
  partition p_before_2001_03 values less than (date '2001-03-01'),
  partition p_before_2001_04 values less than (date '2001-04-01'),
  partition p_before_2001_05 values less than (date '2001-05-01'),
  partition p_before_2001_06 values less than (date '2001-06-01'),
  -- and so on and so forth..
  partition p_ values less than (maxvalue)
);

insert into tq84_partitioned_table (status, date_a, date_b, data) values 
('B', date '2008-04-14', date '2000-05-17', 
 'B and 2000-05-17 is older than 10 yrs, must be deleted');


insert into tq84_partitioned_table (status, date_a, date_b, data) values 
('B', date '1999-09-19', date '2004-02-12', 
 'B and 2004-02-12 is younger than 10 yrs, must be kept');


insert into tq84_partitioned_table (status, date_a, date_b, data) values 
('A', date '2000-06-16', date '2010-01-01', 
 'A and 2000-06-16 is older than 3 yrs, must be deleted');


insert into tq84_partitioned_table (status, date_a, date_b, data) values 
('A', date '2009-06-09', date '1999-08-28', 
 'A and 2009-06-09 is younger than 3 yrs, must be kept');

select * from tq84_partitioned_table order by date_too_old;

-- drop partitions older than 10 or 3 years, respectively:

alter table tq84_partitioned_table drop partition p_before_2000_10;
alter table tq84_partitioned_table drop partition p_before_2000_11;
alter table tq84_partitioned_table drop partition p2000_12;

select * from tq84_partitioned_table order by date_too_old;

OracleBulkCopy – Functionality and Performance Optimization

A few more days of reading and experimentation and I was able to (mostly) answer a lot of these:

I found this buried in the ODP.NET documentation (ironically not in the OracleBulkCopy docs):

The ODP.NET Bulk Copy feature uses a direct path load approach, which is similar to, but not the same as Oracle SQL*Loader. Using direct path load is faster than conventional loading (using conventional SQL INSERT statements).

So it appears that it does use direct path.
This I was able to verify by doing a large bulk copy operation and getting the index properties from SQL Developer. The index did appear as UNUSABLE while the bulk copy was in progress. However, I've also discovered that OracleBulkCopy.WriteToServer will refuse to run if the index starts in an UNUSABLE state, so clearly there's more going on here, because if it were as simple as disabling and rebuilding the index then it shouldn't care about initial state.
It does make a difference specifically if the index is also a constraint. Found this little gem in the documentation linked above:
Enabled Constraints
During an Oracle bulk copy, the following constraints are automatically enabled by default:
- NOT NULL
- UNIQUE
- PRIMARY KEY (unique-constraints on not-null columns)
NOT NULL constraints are checked at column array build time. Any row that violates the NOT NULL constraint is rejected.

UNIQUE constraints are verified when indexes are rebuilt at the end of the load. The index is left in an Index Unusable state if it violates a UNIQUE constraint.
The documentation is a little hazy on what happens during the load, especially with primary keys, but one thing is absolutely certain - it behaves differently with a primary key vs. without one. Since the OracleBulkCopy will happily allow you violate index constraints (and punt the index into UNUSABLE state when it's done), my hunch is that it's building the PK index during the bulk copy but simply not validating it until afterward.
I'm not sure whether the difference observed is within Oracle itself or just a quirk of the OracleBulkCopy. The jury's still out on this one.
OracleBulkCopy will throw an exception if an index is initially in the UNUSABLE state, so it's really a moot point.
If there are other factors, indexes (and especially PK indexes) are still the most important, as I found out by:
Creating a global temporary table with the same schema (using CREATE AS), then bulk copying into the temporary table, and finally doing a plain old INSERT from the temp table into the real table. Since the temp table has no index, the bulk copy happens very fast, and the final INSERT is also fast because the data is already in a table (I haven't tried the append hint yet, since a 5M row table-to-table copy already takes less than 1 minute).

I'm not yet sure of the potential ramifications of (ab)using the temporary table space this way, but so far it hasn't given me any trouble, and it's much safer than the alternative by way of preventing corruption of either the rows or indexes.

The success of this also pretty clearly demonstrates that the PK index is the problem, as that is the only practical difference between the temp table and the permanent table - both started with zero rows during the performance tests.

Conclusion: Don't bother trying to bulk copy more than around 100k rows into an indexed Oracle table using ODP.NET. Either drop the index (if you don't actually need it) or "preload" the data into a different (non-indexed) table.

Best Answer

Related Solutions

Oracle 11g – Best Way to Delete Very Large Recordsets

OracleBulkCopy – Functionality and Performance Optimization

Related Question