Oracle – Why SET UNUSED COLUMN Causes Internal UPDATE SET X=X and Huge Log Generation

oraclepartitioning

On Oracle 11.2.0.4.1, we tried to run a simple ALTER TABLE PARTITIONED_LARGE SET UNUSED COLUMN XXXXX; on a large ref-partitioned child table to decommission a column.

The table has hundreds of millions of rows across thousands of partitions. Most partitions have been truncated and no segments left; probably about 220 of them have data. As per standard practice the point of the SET UNUSED COLUMN was to avoid redo generation and long downtime, while allowing us to regain space moving forward.

Something seems to have gone horribly awry; it started generating dozens of GB of redo logs; eventually filling up UNDOTBS; then failed with the below and started rolling back.

Fri Apr 24 23:48:13 2015
ORA-01555 caused by SQL statement below (SQL ID: 1ksq8yfdxv3ys, Query Duration=13500 sec, SCN: 0x0b5b.6392295a):
update "SCHEMA"."PARTITIONED_LARGE" set "XXXXX" = "XXXXX"

Seeing the statement above it's pretty obvious why it's generating undo/redo, but why did Oracle decide internally to do such an update during a SET UNUSED COLUMN?!

Other points that may or may not be relevant

  • at the time the command was started, a restore point had been created for flashback purposes. This restore point was dropped part way through when flashback log size became a problem
  • the column is defined as BINARY_DOUBLE DEFAULT 0 CONSTRAINT XXXXX_NN NOT NULL
  • the SET UNUSED had been earlier tested on a test DB with about 75% of the data of the DB in question (but not exactly the same data) and had no issues; ran within seconds.

Can anyone explain this behaviour, either in terms of expected behaviour or known issues?

Best Answer

Oracle said it wasn't a known issue, and basically asked us to re-run with tracing on. Unfortunately on a large production database, and given large downtime required to rollback last time, this wasn't really an option.

We had suspicion that it is related to this bug fixed in 11.2.0.4.2

Bug 17325413 - Drop column with DEFAULT value and NOT NULL definition ends up with dropped column data hitting disk leading to corruption (Doc ID 17325413.8)

This problem is introduced in 11.2.0.4 and 12.1.0.1 by the fix for bug 14222244

Drop of a column with a DEFAULT value and a NOT NULL definition ends up with dropped column 
data being written to disk leading to block corruptions.
This causes problems for generating undo which cannot be applied; a ROLLBACK fails.

Rediscovery Notes
 If a NOT NULLable column added via "add column optimization" is writing to disk, then it
 is likely you have run into the bug.

 This can result in block corruption causing ORA-600 block check errors with check code 
 errors 6264, 6110, or 6103 when db_block_checking is enabled:
  ORA-600 [kdBlkCheckError] [file#] [block#] [6264] --> needs _db_block_check_for_debug to catch it.  This is a ZLC logical corruption (Zero Length Column instead of NULL) 
  ORA-600 [kdBlkCheckError] [file#] [block#] [6110]
  ORA-600 [kdBlkCheckError] [file#] [block#] [6103]

or other internal errors when db_block_checking is not enabled like ORA-600 [13013], ORA-600 [17182] or external errors like ORA-12899.

Workaround

 Note that once error occurs, table has to be rebuilt: 
   Note:1527738.1 may skip the rows affected and can be used with _disable_block_checking=false as:
      alter session set "_disable_block_checking"=true;
       and in the same session use the PLSQL in Note:1527738.1 section "SKIP ORA-600 in a Table"

After patch to 11.2.0.4.6 the SET UNUSED ran fine in production without the unexpected log production. Might be worth a go if anyone else stumbles into this problem; although I suspect Oracle will eventually have to raise a proper bug for this issue once someone helps them narrow it down properly.