Why is an Oracle Package Variable Intermittently Incorrect after Multiple Updates

debuggingoracleoracle-11gplsqltrigger

I am supporting an application that runs on Oracle Database 11g Enterprise Edition Release 11.2.0.4.0 – 64bit Production.

We have a table that stores hourly data that uses triggers call a package to sync to a monthly table that stores a total amount.

When the hourly table is updated, the on update trigger saves the old amount to a table package variable in this format: primary key fields, amount.

TYPE HOURLY_CHANGE_TAB IS TABLE OF HOURLY_REC INDEX BY BINARY_INTEGER;

TYPE HOURLY_REC IS RECORD (
      KEY       NUMBER(10),
      SUB_KEY   NUMBER,
      MONTH     VARCHAR2(6),
      AMOUNT NUMBER);

Amount is set to -old_amount. It then adds the new amount to amount. It does this for each hour record that gets updated. So if there are 120 hours in a month, it would subtract the old amount 120 times from the variable, and add the new amount 120 times to get the total change. An after statement trigger or manual call would update the months record with this total and clear the package variable. When we are updating a large number of hours, we set a flag to disable the month update and then manually call the update method once at the end for performance. Regardless, the issue has already happened by the time this method gets called.

When I update a deal that runs for multiple years, the total amount is off by new amount – old amount in a random number of months. So if old amount was 100 per hour and new amount is 10 per hour then the amount we push to the month record is -90 what it should be. The majority of the months would be correct. I have seen zero, one, two, and four months broken after each update / rollback testing run. Each time I run the exact same update and then rollback after checking for the error.

I logged every time the package variable changes to a new table and saw the following:
LOGGED DATA

KEY and SUB_KEY are keys in this example. Month is the monthly record our hour is part of. The amount column shows the amount to be added to the package variable. Index is the index of the table variable that we are updating for this row. Starting amount it the value of the amount column at the start of logging the update. Calc Amount is what the package variable should store after we add the amount to the starting amount. Ending Amount is the value retrieved from the package variable after the update is saved. Row index is the incrementing unique sequence in the logging table to keep track of the order in which the updates ran. The correct amount is a window function summing the amount ordered by row_index, partitioned by the keys, to show what the package variable should store on every row.

Notice what happens between rows 2 and 3. The ending amount on 2 is the correct value of 540. The starting amount on 3 is 550, shifted by new volume(20) – old volume(10). This relationship held true for an update of 100 to 10, the shifted amount was -90, not 10. And yet there is no gap in the row index. We go straight from 16188 to 16189 without a gap. I created a copy of the package and removed everything but the method to update the package variable. There is no other path to update the table variable except via the method that is logged. The package variable is defined within the package body so no other package can update it, and I deleted all other methods in the package but the one to update it. I can't fathom how this variable is changing between the end of one method and the start of the next.

CREATE OR REPLACE PACKAGE BODY TEST_PKG AS

   PKG_HOUR_COUNT  NUMBER := 0;
   HOUR_TAB            HOURLY_CHANGE_TAB ;
...

Again, the number of months affected by this issue is random, and they are all broken by the same amount, one month isn't offsetting another by some off by one error.

Note that I have endeavored to reduce the following code to a minimum version of the code. Table names and columns have been changed. If I attempt to run the trigger code in a loop versus running an update the issue does not happen.

I was able to create a new version of the package, delete everything not related to these calls and reproduce the issue and was successful. I tried to create a mini copy of the table with just the rows I needed and may have been able to reproduce the issue once. I know that's vague but it hasn't worked since on the mini copied table.

SUB_KEY is not used in the package variable since it only defines the hours in this table so all hours with the same KEY and SUB_KEy will be in the same month, regardless of SUB_SUB_KEY.

Table Definition:

CREATE TABLE ORDER_HOURS
(KEY NUMBER(12,0) NOT NULL ENABLE, 
    SUB_KEY NUMBER(6,0) NOT NULL ENABLE, 
    SUB_SUB_KEY NUMBER(3,0) NOT NULL ENABLE, 
    GMT_TIME DATE NOT NULL, 
    DAY DATE NOT NULL ENABLE, 
    HOUR VARCHAR2(4 BYTE) NOT NULL ENABLE, 
    AMOUNT NUMBER(10,3) NOT NULL ENABLE, 
    CONFIRMED_FLAG VARCHAR2(1 BYTE) DEFAULT 'N' NOT NULL ENABLE,        
     CONSTRAINT HOUR_PK PRIMARY KEY (KEY, SUB_KEY, SUB_SUB_KEY, GMT_TIME)
  USING INDEX 
   ) PARTITION BY RANGE ("DAY") INTERVAL (NUMTOYMINTERVAL(3,'MONTH'))
   ( PARTITION p0 VALUES LESS THAN (TO_DATE('1-1-2007', 'DD-MM-YYYY')),
      PARTITION p1 VALUES LESS THAN (TO_DATE('1-1-2008', 'DD-MM-YYYY')),
      PARTITION p2 VALUES LESS THAN (TO_DATE('1-7-2009', 'DD-MM-YYYY')),
      PARTITION p3 VALUES LESS THAN (TO_DATE('1-1-2010', 'DD-MM-YYYY')) ); 

Trigger:

CREATE OR REPLACE TRIGGER UPDATE_HOUR BEFORE
  UPDATE OF AMOUNT
    ON ORDER_HOURS
    FOR EACH ROW    
    BEGIN 
        IF :NEW.CONFIRMED_FLAG = 'Y' AND :OLD.CONFIRMED_FLAG = 'Y' 
            THEN RAISE_APPLICATION_ERROR(-20914, 'You cannot Update.');
        ELSE
            IF :NEW.CONFIRMED_FLAG = 'A' THEN
                :NEW.CONFIRMED_FLAG := 'Y';
            END IF;
            IF TEST_PKG.GET_UPDATE_HOURS_FLAG = 'Y' THEN
                TEST_PKG.UPDATE_HOURS(:OLD.KEY, :NEW.KEY, :OLD.SUB_KEY, :NEW.SUB_KEY, :OLD.SUB_SUB_KEY, :NEW.SUB_SUB_KEY, :OLD.DAY, :NEW.DAY, :OLD.AMOUNT, :NEW.AMOUNT);
            END IF;
        END IF;
    END;
/

Logging Table:

CREATE TABLE HOURS_DATA(
KEY   number,
SUB_KEY   number,
MONTH   VARCHAR2(4000),
AMOUNT NUMBER,
VARIABLE_INDEX NUMBER,
STARTING_AMOUNT NUMBER,
CALC_AMOUNT NUMBER,
ENDING_AMOUNT NUMBER,
ROW_INDEX NUMBER,
CREATE_DATE DATE
);

Trimed down package, only portion remaining that touches HOUR_TAB:

CREATE OR REPLACE PACKAGE TEST_PKG AS

UPDATE_HOURS_FLAG VARCHAR2(1) DEFAULT 'Y';

FUNCTION GET_UPDATE_HOURS_FLAG RETURN VARCHAR2;

PROCEDURE UPDATE_HOURS(
   OLD_KEY NUMBER,
   NEW_KEY NUMBER,
   OLD_SUB_KEY NUMBER,
   NEW_SUB_KEY NUMBER,
   OLD_SUB_SUB_KEY NUMBER,
   NEW_SUB_SUB_KEY NUMBER,
   OLD_DAY DATE,
   NEW_DAY DATE,
   OLD_AMOUNT NUMBER,
   NEW_AMOUNT NUMBER);

 PROCEDURE UPDATE_HOUR_TAB (
   KEY  NUMBER,
   SUB_KEY NUMBER,
   MONTH VARCHAR2,
   AMOUNT NUMBER);

PROCEDURE FLUSH_HOUR_TAB;

TYPE HOURLY_REC IS RECORD (
      KEY       NUMBER(10),
      SUB_KEY   NUMBER,
      MONTH     VARCHAR2(6),
      AMOUNT NUMBER);   

TYPE HOURLY_CHANGE_TAB IS TABLE OF HOURLY_REC INDEX BY BINARY_INTEGER;      

END TEST_PKG;
/

CREATE OR REPLACE PACKAGE BODY TEST_PKG AS

   PKG_HOUR_COUNT  NUMBER := 0;
   HOUR_TAB            HOURLY_CHANGE_TAB ;

FUNCTION GET_UPDATE_HOURS_FLAG RETURN VARCHAR2 IS

BEGIN
   RETURN UPDATE_HOURS_FLAG;
END GET_UPDATE_HOURS_FLAG;

PROCEDURE UPDATE_HOUR_TAB (
   KEY  NUMBER,
   SUB_KEY NUMBER,
   MONTH VARCHAR2,
   AMOUNT NUMBER) IS

   CNT NUMBER;
   STARTING_AMOUNT number := 0;
   CALC_AMOUNT number := 0;

BEGIN

   CNT := HOUR_TAB.FIRST;

   WHILE CNT IS NOT NULL LOOP
      EXIT WHEN HOUR_TAB(CNT).KEY = KEY AND HOUR_TAB(CNT).SUB_KEY = SUB_KEY AND HOUR_TAB(CNT).MONTH = MONTH;
      CNT := HOUR_TAB.NEXT(CNT);
   END LOOP;

   IF CNT IS NULL THEN
      PKG_HOUR_COUNT := PKG_HOUR_COUNT + 1;
      HOUR_TAB(PKG_HOUR_COUNT).KEY := KEY;
      HOUR_TAB(PKG_HOUR_COUNT).SUB_KEY := SUB_KEY;
      HOUR_TAB(PKG_HOUR_COUNT).MONTH := MONTH;
      HOUR_TAB(PKG_HOUR_COUNT).AMOUNT := AMOUNT;
   ELSE
      STARTING_AMOUNT := HOUR_TAB(CNT).AMOUNT;
      CALC_AMOUNT := HOUR_TAB(CNT).AMOUNT + AMOUNT;
      HOUR_TAB(CNT).AMOUNT := HOUR_TAB(CNT).AMOUNT + AMOUNT;      
   END IF;

   IF CNT IS NULL THEN
    CNT := PKG_HOUR_COUNT;
    END IF;
   INSERT INTO HOURS_DATA
   VALUES(KEY,
   SUB_KEY,
   MONTH,
   AMOUNT,
   CNT,
   STARTING_AMOUNT,
   CALC_AMOUNT,
   HOUR_TAB(CNT).AMOUNT,
   (SELECT COUNT(*) FROM HOURS_DATA),
   SYSDATE);

END UPDATE_HOUR_TAB;

PROCEDURE UPDATE_HOURS(
   OLD_KEY NUMBER,
   NEW_KEY NUMBER,
   OLD_SUB_KEY NUMBER,
   NEW_SUB_KEY NUMBER,
   OLD_SUB_SUB_KEY NUMBER,
   NEW_SUB_SUB_KEY NUMBER,
   OLD_DAY DATE,
   NEW_DAY DATE,
   OLD_AMOUNT NUMBER,
   NEW_AMOUNT NUMBER) IS

BEGIN

      UPDATE_HOUR_TAB (
         OLD_KEY,
         OLD_SUB_KEY,
         TO_CHAR(OLD_DAY,'YYYYMM'),
         -OLD_AMOUNT);      

      UPDATE_HOUR_TAB (
         NEW_KEY,
         NEW_SUB_KEY,
         TO_CHAR(NEW_DAY,'YYYYMM'),
         NEW_AMOUNT); 

END UPDATE_HOURS;

PROCEDURE FLUSH_HOUR_TAB IS
   CNT NUMBER;

BEGIN

   CNT := HOUR_TAB.FIRST;

   WHILE CNT IS NOT NULL LOOP   
      CNT := HOUR_TAB.NEXT(CNT);
      --UPDATE MONTHS, DATA ALREADY BROKEN AT THIS POINT
   END LOOP;

   HOUR_TAB.DELETE;
   PKG_HOUR_COUNT := 0;

END FLUSH_HOUR_TAB;

END TEST_PKG;
/

Insert Script for test data:

DECLARE
    ORDER_KEY NUMBER := 10;
    SUB_KEY NUMBER := 0;
    SUB_SUB_KEY NUMBER := 1;
    V_START_TIME TIMESTAMP WITH TIME ZONE := '01-FEB-19 09.00.00 AM UTC';
    V_END_TIME TIMESTAMP WITH TIME ZONE := '31-DEC-22 08.00.00 AM UTC';
    V_CURRENT_TIME TIMESTAMP WITH TIME ZONE;
    V_DAY DATE;
    V_HOUR NUMBER(2);
BEGIN    
    V_CURRENT_TIME := V_START_TIME;
    WHILE V_CURRENT_TIME <= V_END_TIME LOOP
    --DBMS_OUTPUT.PUT_LINE(V_CURRENT_TIME);

    V_DAY := CAST(V_CURRENT_TIME
                AT TIME ZONE ('US/PACIFIC') AS DATE);                
    V_HOUR := TO_CHAR(V_DAY, 'HH24') + 1;

    --EXCLUDE WEEKENDS                
    IF MOD(TO_CHAR(V_DAY, 'J'), 7) + 1 NOT IN(6, 7) THEN
        --DBMS_OUTPUT.PUT_LINE(V_DAY);
        INSERT INTO ORDER_HOURS
        VALUES(ORDER_KEY, SUB_KEY, SUB_SUB_KEY, V_CURRENT_TIME, TRUNC(V_DAY), V_HOUR, 10, 'N');
    END IF;

    V_CURRENT_TIME := V_CURRENT_TIME + INTERVAL '1' HOUR;

END LOOP;    
    COMMIT;
END;
/

Update I ran – only causes error on original table:

DECLARE
    P_AMOUNT NUMBER := 20;
    P_KEY NUMBER := 10;
BEGIN

FOR UPDATE_HOUR IN(SELECT * FROM ORDER_HOURS
                    WHERE KEY = P_KEY                       
                    AND DAY >= '1-MAR-20'
                    ) LOOP                           
                                    UPDATE ORDER_HOURS
                                    SET AMOUNT = P_AMOUNT
                                    WHERE KEY = P_KEY
                                        AND DAY = UPDATE_HOUR.DAY
                                        AND HOUR = UPDATE_HOUR.HOUR
                                        AND (AMOUNT <> P_AMOUNT);
--Running the trigger code directly has never worked
--            IF :NEW.CONFIRMED_FLAG = 'Y' AND :OLD.CONFIRMED_FLAG = 'Y' 
--                THEN RAISE_APPLICATION_ERROR(-20914, 'You cannot Update.');
--            ELSE
--                IF :NEW.CONFIRMED_FLAG = 'A' THEN
--                    :NEW.CONFIRMED_FLAG := 'Y';
--                END IF;
--                IF TEST_PKG.GET_UPDATE_HOURS_FLAG = 'Y' THEN
--                    TEST_PKG.UPDATE_HOURS(:OLD.KEY, :NEW.KEY, :OLD.SUB_KEY, :NEW.SUB_KEY, :OLD.SUB_SUB_KEY, :NEW.SUB_SUB_KEY, :OLD.DAY, :NEW.DAY, :OLD.AMOUNT, :NEW.AMOUNT);
--                END IF;
--            END IF;                                                        
    END LOOP;
    --DOES THE MONTHLY UPDATE BASED ON PACKAGE VARIABLE AND CLEARS THE VARIABLE
    TEST_PKG.FLUSH_HOUR_TAB;
END;
/

Code to check logs for errors:

SELECT HD.*
FROM (
SELECT HD.*, SUM(AMOUNT) OVER(PARTITION BY KEY, SUB_KEY, MONTH ORDER BY ROW_INDEX) CORRECT_AMOUNT, COUNT(*) OVER(PARTITION BY SUB_KEY) SEQ_COUNT
FROM HOURS_DATA HD
) HD
WHERE HD.ENDING_AMOUNT <> CORRECT_AMOUNT
ORDER BY ROW_INDEX;

I have checked to see if there are more rows than there should be with an the same index or sub_key but the number of sub_key rows and index_rows is always the same. Right now my only hypothesis is a bug in Oracle. I am happy to provide more data or code if needed, but since I have not been able to duplicate this on another table, I am not sure how to help someone else reproduce it themselves.

The Order Hours table has 294,574,145 rows, filling 138 partitions, and using 7045 MB for data. When I run the attached code against the mini table ORDER_HOUR, it does not error out. When I only change the table name to the actual table with all the data, I do get the error. This makes me suspect that this is a bug with Oracle around triggers on large partitioned tables since the only difference is the table the trigger runs against. This makes it more difficult for someone to test against and try to reproduce if the issue is table specific. Any suggestions around cause or additional traces to run is appreciated.

While I may be able to solve this by changing how we update the monthly table, or update the month table after every update to an hour despite the performance cost(there are triggers on the month table that do a monthly cost calculation so each update is expensive), this issue is strange enough that I would like to solve it if possible.

Thank you for your help and patience reading through this wall of text.

Best Answer

I'd investigate if the issue is caused by update restart

the restart is done when:

a) consistent read (the search component of your modification) says "this row, when we started, was one you were interested in"

b) the current read (get the row as of right now, we have to modify the CURRENT value) is done

c) a comparision of the current read to the consistent read says "the row was modified".

When the update is restarted, your update_hours procedure in the trigger will run twice for the same row. Is this ok for your code?

What if you rolled back the update in the session and then called flush_hour_tab (rollback does not affect package variables). Would that be ok?

If you lock the table before running your script, you should get a correct answer if the issue is update restart. In that case save, the changes to standard database table (instead of update_hour_tab) and update your monthly table from there.