Capturing Table Changes without killing the server

I have an Application that uses Oracle. Among many things, this application has timesheet entry. Supervisors enter their employee's hours, allocating to charge codes, under the appropriate rate (straight or overtime) – and save it.

We need to report (upload) hour data each day to a third party. If hours are changed after uploaded, we need to submit an adjustment upload, with the delta.

At first I thought I'd handle this with triggers. Add a timestamp column to the timecard table, and everytime we upload, we update the timestamp of all unstamped timecards. Then the trigger would fire whenever an insert or update takes place .. and if there is a timestamp, it adds the delta between the old and new values and stores it in a new table – which we upload as adjustments. In my head – this was nice and tidy.

However, upon closer examination of the data (looking at ROWIDs), I realized that whenever ANY value in a given timesheet changes, and a SAVE operation is executed – the application issues a DELETE operation, followed by an INSERT operation. So any timestamps would be wiped out – and all I'd manage to do is balloon my adjustments table with a bunch of useless data.

So then I started to think that if I create a snapshot of the Timecard table everytime the hours are reported .. then I could simply compare the timecard to the snapshot – and obtain the adjustments. Then I'd simply refresh the snapshot after the adjustments are pulled.

The snapshot option worries me – on my test server where I'm trying this stuff out – I only have 2500 records to play with – on production its nearly a million. Plus, I am not certain if deleting all records, then doing an INSERT INTO statement on a table that size is really smart?

I looked into CREATE SNAPSHOT – as I figured the log would be more efficient than creating a duplicate table – but I didn't see that you could query the snapshot the same way you could a regular table.

Another option would be to copy the timecard table structure to a new table (timecardsnap), and then create a trigger on the timecard table (for each row) and then introduce some logic to check if it has been exported yet – and if it has, to insert the delta .. if it hasn't to update the existing data. This presents a problem if the charge code is changed however – as I'd end up duplicating values. Given its a DELETE / INSERT – two separate operations – Its not easy to take a look at the old vs new values.

What would be the best way of tackling this problem that wouldn't negatively impact server performance?

Edit: Oracle 10g.

UPDATE – Jan 3 – 5pm MST

I started playing with the idea of a temporary table. I basically broke the possible operations of the table into the three possible operation types, and asked myself what action should be performed for each.

On Delete – We need to back out all the hours that were just deleted
On Insert – We need to add the new hours to the next export, minus what has already been uploaded. We also need to make sure no previous adds that hadn't yet exported are present.

To accomplish this I create two tables: One table to store PENDING changes and one to store EXPORTED data. The pending table changes as rows are added and deleted in the application, whereas the EXPORTED remains static.

I use the term "Split" to refer to our key fields – EMPID, WORKDAY, COSTCODE, DEPT, since they basically define at what level of granularity we record the values (hours) at.

So now that I have the two tables – I create a trigger. The trigger essentially says:

If DELETE operation:

Multiply all hours just deleted (OLD) by -1
Add hours to PENDING table.

If INSERT operation:

DELETE all hours from the PENDING table that match the same split
Total the hours from the EXPORTED table that match the same split.
Add the delta of the newly inserted hours (NEW) and the already exported hours to the PENDING table.

Then, whenever an export occurs, we simply roll up all the hours in the PENDING table by the splits, and append to the EXPORTED table.

That SHOULD do the trick. I'll also write an UPDATE trigger just in case the application does do updates on the TIMECARD table somewhere else… just to be certain.

UPDATE – Jan 3 – 5:45pm MST

I read up on flashback and got a test query working – its fast and I like the fact that there's no chance for anything to be missed. I think I'm going to go with this for the FINAL solution – but in the meantime I'll implement what I have. I'll set up another table to record the TIMESTAMPS of when the data is exported.. this way if and when the DBA increases the flashback space available – I'll already have a record of my export points.

Best Answer

Just from a logical standpoint you're going to have to have 2 copies of the data, or a log of each change (including the deletes/inserts). You can use the tables/triggers like you said, use flashback query (and that will work on 11g by the way, assuming that all the archive logs for the day are still online-- and, in my experience, it's not that expensive of a task depending on the query)

You can also approach it a little differently, although this method is more complex, it might be faster to some extent, especially if there's some limit to which rows are "in scope"-- meaning, that if you can put some kind of code to ignore the vast majority of the rows in the table (like a rule that they cannot update hours that are more than a month old or something similar).

You create a second table that has all the rows that are "in scope" for the update-- it's just a plain old copy of the other table as it sits right now.

When your job runs, it first deletes records from the table that are "out of scope" and then runs a query that gets the edited/deleted rows..

select t1.a as a1, t2.a as a2, t1.b as b1, t2.b as b2 
from t1
left join t2 on t2.id=t1.id
...

Start building your file with this info Be sure to UPDATE the t2 with the new values (and perhaps delete records that were deleted)

Then you run another that gets the inserted rows Add this info to your file as well Likewise, insert those rows into table 2.

In the end, it's a coinflip on whether or not this is faster-- but at least it'll run when you can schedule it-- but the flashback query is basically the same thing and letting oracle handle it.

It's much more simple and perhaps less error prone to just use flashback query in my opinion.

Best Answer

Related Solutions

Postgresql – the best way to automatically create a snapshot of the table at the certain time

Mysql – PHPMyAdmin very slow inserts

Related Question