Oracle – ANALYZE TABLE..VALIDATE STRUCTURE Runs Forever

corruptionoracleoracle-10gperformance

So, as I've mentioned before, I am a SQL Server DBA/Consultant who is helping out a customer with their Oracle databases. Mostly, I've been able to figure out what I need to and work through things with the help of the doc and Google but I am currently trying to deal with something that seems inscrutable.

One of the databases is an Oracle 10g LIMS database that has been around (and upgraded) for at least 10 years (maybe 20). It's a critical app but they've been having numerous reliability issues with it over the past couple of years. In order to try to get our hands around what's really wrong with it and what we might be able to do about it we had the hosting/MS provider make a physical copy of it on another (identical) server. (I am pretty sure that the provider interpreted that as meaning "RMAN backup and restore").

The idea being that I could then do whatever analysis, investigation, repair attempts, and use any tools that I wanted on it without any danger of impacting production. So far, so good. I ran DBVerify (dbv) on all of the data files, no problems.

Then I ran ANALYZE TABLE .. VALIDATE STRUCTURE CASCADE ONLINE; on every table in the database. For two of these tables, the command took much longer than the other 2500 tables (combined) so I cancelled them and let the others finish, which they did with no problems and no reported errors, etc.

These two tables as it turns out are also the biggest application data tables. So then I start with the largest table (23GB, 36M rows) and decide to analyze it by parts (without the CASCADE), first the table itself then the indexes. The table finishes in 30-60 minutes (I do not remember exactly), but the ANALYZE INDEX .. VALIDATE STRUCTURE ONLINE; for the first index never finishes.

I decided to let it run for a while, and to see if I could find a way to monitor it's progress. Googling this I find many (old) claims that ANALYZE TABLE can be monitored with v$session_longops but looking at it never showed anything from my command and I came to the conclusion that this must work only for the Stats gathering function, not the Validate function. After trying a bunch of things I eventually found that by looking at the v$session row for the executing command, I could use the P1* & P2* columns to track what file and block was being read.

This showed me that it was indeed reading from the expected data files and was reading different blocks from them. This and the fact that the disk utilization on the server was 100% while the command was running (and nothing else) convinced me that it was really doing something and was not blocked/locked/hung. So I just let it run.

It has now been over 3 days (+72 hours) and it shows no signs of finishing and I am kind of at a loss here.

Additional info:

The table's DDL:

-- Unable to render TABLE DDL for object O$LIMS.N__RESULTS with DBMS_METADATA attempting internal generator.
CREATE TABLE O$LIMS.N__RESULTS 
(
  SAMPLE_ID NUMBER(10, 0) NOT NULL 
, SUBMISSION_ID NUMBER(10, 0) NOT NULL 
, RESULT_ID NUMBER(10, 0) NOT NULL 
, RESULT_VERSION NUMBER(3, 0) NOT NULL 
, TASK_ID NUMBER(10, 0) NOT NULL 
, TASK_REPETITION NUMBER(3, 0) NOT NULL 
, TASK_VERSION NUMBER(3, 0) NOT NULL 
, REQUIRED VARCHAR2(1 BYTE) NOT NULL 
, METHOD_DATAGROUP VARCHAR2(40 BYTE) NOT NULL 
, COMPONENT VARCHAR2(40 BYTE) NOT NULL 
, MEASURE VARCHAR2(40 BYTE) NOT NULL 
, UNITS VARCHAR2(40 BYTE) NOT NULL 
, STATUS VARCHAR2(20 BYTE) NOT NULL 
, PLANNED_RESULT VARCHAR2(3 BYTE) NOT NULL 
, RESULT_ORIGIN VARCHAR2(1 BYTE) NOT NULL 
, CONDITION VARCHAR2(20 BYTE) NOT NULL 
, CONDITION_LEVEL VARCHAR2(20 BYTE) 
, VALUE_TYPE VARCHAR2(20 BYTE) NOT NULL 
, NUMBER_VALUE NUMBER 
, TEXT_VALUE VARCHAR2(80 BYTE) 
, TIME_VALUE DATE 
, REASON VARCHAR2(40 BYTE) 
, INLIMIT VARCHAR2(3 BYTE) 
, INDETECTION VARCHAR2(3 BYTE) 
, INSPEC VARCHAR2(3 BYTE) 
, ENTRY_USERID VARCHAR2(20 BYTE) 
, ENTRY_DATE DATE 
, SPEC_ID NUMBER(10, 0) 
, SPEC_VERSION NUMBER(3, 0) 
, DETECTION_ID NUMBER(10, 0) 
, DETECTION_VERSION NUMBER(3, 0) 
, LIMIT_ID NUMBER(10, 0) 
, LIMIT_VERSION NUMBER(3, 0) 
, CUSTOMER_DATAGROUP VARCHAR2(40 BYTE) 
, ANALYST VARCHAR2(20 BYTE) 
, REPORT VARCHAR2(3 BYTE) 
, TIMESTAMP DATE 
, USERSTAMP VARCHAR2(20 BYTE) 
, MEASURE_LINK O$LIMS.N_UT_MEASURE 
, RESULT_PLAN_LIST_LINK O$LIMS.N_UT_RESULT_PLAN_LIST 
, TEXT O$LIMS.N_TT_TEXT 
, ATTRIBUTES O$LIMS.N_TT_ATTRIBUTES 
, SEQUENCE NUMBER(4, 0) 
, CONSTRAINT N_C_RESULTS_1 PRIMARY KEY 
  (
    RESULT_ID 
  , RESULT_VERSION 
  )
  USING INDEX 
  (
      CREATE UNIQUE INDEX O$LIMS.N_C_RESULTS_1 ON O$LIMS.N__RESULTS (RESULT_ID ASC, RESULT_VERSION ASC) 
      LOGGING 
      TABLESPACE "SQLLIMS_INDEX" 
      PCTFREE 10 
      INITRANS 2 
      STORAGE 
      ( 
        INITIAL 311296 
        NEXT 1048576 
        MINEXTENTS 1 
        MAXEXTENTS UNLIMITED 
        BUFFER_POOL DEFAULT 
      ) 
      NOPARALLEL 
  )
  ENABLE 
) 
LOGGING 
TABLESPACE "SQLLIMS_RESULTS" 
PCTFREE 25 
INITRANS 1 
STORAGE 
( 
  INITIAL 566231040 
  NEXT 1048576 
  MINEXTENTS 1 
  MAXEXTENTS UNLIMITED 
  BUFFER_POOL DEFAULT 
) 
NOCOMPRESS 
NOPARALLEL 
NESTED TABLE TEXT STORE AS RESULT_TEXT RETURN AS VALUE  
NESTED TABLE ATTRIBUTES STORE AS RESULT_ATTRIBUTES RETURN AS VALUE

The DDL of the index currently being analyzed:

-- Unable to render INDEX DDL for object O$LIMS.SYS_C0010496 with DBMS_METADATA attempting internal generator.
CREATE UNIQUE INDEX O$LIMS.SYS_C0010496 ON O$LIMS.N__RESULTS (ATTRIBUTES ASC) 
LOGGING 
TABLESPACE "SQLLIMS_RESULTS" 
PCTFREE 10 
INITRANS 2 
STORAGE 
( 
  INITIAL 65536 
  NEXT 1048576 
  MINEXTENTS 1 
  MAXEXTENTS UNLIMITED 
  BUFFER_POOL DEFAULT 
) 
NOPARALLEL

My questions:

Is it reasonable for the ANALYZE INDEX to take so long when the ANALYZE TABLE only took at most 1 hour?
What are my options at this point? Assuming I kill this, how can I proceed with checking for (logical) corruption, etc.? Is there something I can do to get this command to finish in a reasonable time, or is there some other tool or approach I can use instead?

Version, etc. information:

Host: Windows 2003 server, sp2
Oracle version: 10g r2 ps3 10.2.0.4.0
Compatibility: 10.2.0.3.0

Here are the relevant SGA parameters (I think):

sga_target: 7012876288 (6688M)
sga_max_size: 10015997952 (9552M)
db_block_size: 8192
db_cache_size: 2885681152 (2752M)
db_*k_cache_size: all are zero

Best Answer

I think the article Raj quoted (https://www.pythian.com/blog/analyze-index-validate-structure-dark-side/) describes this pretty well. "clustering factor" was also my first guess while reading the description of your problem. I also prefer to use RMAN to check for corruption.

RMAN> backup check logical validate database;

Afterwards you can query V$DATABASE_BLOCK_CORRUPTION for details on any corrupt block.

Normally there is a reason for huge clustering. You can check the following details:

How many rows does the index have?
How many blocks does it consume?

This may help you to determine if the clustering_factor is high. You can also query dba_extents to determine the clustering_factor.

What type of management does the tablespace use? I guess it's not ASSM since the database is pretty old and was upgraded several times.

Why is clustering bad for performance? When Oracle reads data from disk or cache it always reads in block. If the block is half empty you loose 50% of your reading performance. In case you do a full table or full index scan Oracle scans all blocks belonging to the segment (index or table). It does not check if a block is empty or not. Oracle reads from the first block to the last (HWM). If your index has 10mil blocks but it only needs 1mil blocks Oracle reads 9mil trash blocks.

ASSM (Automatic Segment Space Management) helps a lot to reduce/prevent clustering. If possible you should migrate Manual Segment Space Managed Tbs to ASSM Tbs.

Why checking for corruption? I always start analyzing a database by gathering ASM or statspack reports from the production database (not a test/dev db). This does not hurt anyone and will give you a lot of details on the database and it's little secrets. Corruption does usually not affect performance since the database does not try to repair it -- it just crashes the current query.

ALTERNATIVE #1 : Place a limit on ibdata1 on creation

Perhaps create a large ibdata1

ibdata1:innodb_data_file_path = ibdata1:100G

or set to large initial size and a larger max filesize

ibdata1:innodb_data_file_path = ibdata1:50G:autoextend:max:100G

ALTERNATIVE #2 : Use a Large Raw Data Device

According to MySQL 5.0 Certification Study Guide, Page 428

MySQL 5.0 Certification Study Guide

Any raw partitions in the configuration must exist but must have the modifier newraw listed after the size of the file specification. newraw tells InnoDB to initialize the partition when the server starts up. New partitions are treated as read-only after initialization. After InnoDB initializes the tablespace, stop the server change newraw to raw in the partition specfication, and restart the server. For example, to use a 10GB Unix partition named /dev/hdc6, begin with a configuration like this:

[mysqld]
innodb_data_home_dir=
innodb_data_file_path=/dev/hdc6:10Gnewraw

Start the server and let InnoDB initialize the tablespace. Then stop the server and change the configuration from newraw to raw:

[mysqld]
innodb_data_home_dir=
innodb_data_file_path=/dev/hdc6:10Graw

After changing the configuration, restart the server.

CAVEAT

With either alternate you stop worrying about growth until there is no more room. Otherwise, you must eventually deal with any imposed limits on ibdata1's size.

Apr 15, 2012 : What happens when InnoDB hits its tablespace autoextend max?
Apr 11, 2012 : How do you remove fragmentation from InnoDB tables?

Oracle PL/SQL – How to Create Script File with PL/SQL

That's relatively easy. Using the UTL_FILE package in Oracle you can create a file in the servers filesystem and write the output of any PL/SQL statement to it. You can actually also use it to read data from a file. http://psoug.org/reference/utl_file.html

You might also want to read up a little on the Directory object type as well, since that's a prerequisite for using the UTL_FILE package; http://psoug.org/reference/directories.html

-- Here's an example of a procedure that writes ascii data to an existing Oracle directory object. The user running the code needs to have read/write access to the Oracle Directory object.

CREATE PROCEDURE put_local_ascii_data (p_data  IN  CLOB,
                                    p_dir   IN  VARCHAR2,
                                    p_file  IN  VARCHAR2) IS
    -- --------------------------------------------------------------------------
      l_out_file  UTL_FILE.file_type;
      l_buffer    VARCHAR2(32767);
      l_amount    BINARY_INTEGER := 32767;
      l_pos       INTEGER := 1;
      l_clob_len  INTEGER;
    BEGIN
      l_clob_len := DBMS_LOB.getlength(p_data);

      l_out_file := UTL_FILE.fopen(p_dir, p_file, 'w', 32767);

      WHILE l_pos <= l_clob_len LOOP
        DBMS_LOB.read (p_data, l_amount, l_pos, l_buffer);
        IF g_convert_crlf THEN
          l_buffer := REPLACE(l_buffer, CHR(13), NULL);
        END IF;

        UTL_FILE.put(l_out_file, l_buffer);
        UTL_FILE.fflush(l_out_file);
        l_pos := l_pos + l_amount;
      END LOOP;

      UTL_FILE.fclose(l_out_file);
    EXCEPTION
      WHEN OTHERS THEN
        IF UTL_FILE.is_open(l_out_file) THEN
          UTL_FILE.fclose(l_out_file);
        END IF;
        RAISE;
    END;

The p_data parameter is a CLOB that would ( in this case ) contain all the ascii data being written to the p_dir directory and the p_file name ( which gets created at runtime ). You might need to do some clob conversions on your data, depending on the data types in your query results that you'd use to populate the clob object.

Best Answer

Related Solutions

Mysql – Maintenance in MYSQL when innodb_file_per_table disabled

ALTERNATIVE #1 : Place a limit on ibdata1 on creation

ALTERNATIVE #2 : Use a Large Raw Data Device

CAVEAT

Oracle PL/SQL – How to Create Script File with PL/SQL

Related Question