Oracle Index – How Frequently to Analyze Table Statistics

indexoracle

We have a Oracle 11g database running with a table that insert entries with around 100records/min. We met some spiking issue mainly due to table is performing full table scan. We have indexed the required key, however, sometime it does not take into effect.

What we do is to do a 'analyze table statistic' and it resolve the issue.

May i ask, why do oracle require a table to be analyzed? how often do we analyze a table? Currently we have set a job scheduler for every 3 hours, however, it seems like sometime within 30mins it require another statistic to be analyzed

Best Answer

Do not use analyze table statistic, this is only supported for backward compatibility (this was already the case in Oracle 10g), see ANALYZE

Do not use the COMPUTE and ESTIMATE clauses of ANALYZE to collect optimizer statistics. These clauses are supported for backward compatibility. Instead, use the DBMS_STATS package, which lets you collect statistics in parallel, collect global statistics for partitioned objects, and fine tune your statistics collection in other ways. The optimizer, which depends upon statistics, will eventually use only statistics that have been collected by DBMS_STATS. See PL/SQL Packages and Types Reference for more information on the DBMS_STATS package.

Use DBMS_STATS.GATHER_TABLE_STATS() instead, see DBMS_STATS. Be default Oracle collects statistics automatically during night time. Usually this is sufficient. Inserting data does not necessarily mean that you also need new statistics. Typically new data are similar to existing data, Min/Max value or distribution of values does not vary much.

Related Solutions

Frequent or Infrequent Commit

Not know a huge amounts about what version (I've assumed 11g) you are using and other environment issues, something like this might be useful.

I'd set the commit level to be as high as your environment can comfortably handle as the fewer commits the faster it will be. What you can confortably handle will be dictated by factors such as:

Are other users using the system
Memory available
Are you running anything else whilst this is running etc.

You can alter the constant c_bulk_limit to whatever size you can cope with.

DECLARE
   c_bulk_limit CONSTANT PLS_INTEGER := 5000;
   --
   CURSOR c_itemloc
   IS
      SELECT item, 
             loc 
        FROM dc_item_loc;
   --
   TYPE itemloctype IS TABLE OF c_itemloc%ROWTYPE
        INDEX BY PLS_INTEGER;
   itemloc_tab itemloctype;
BEGIN
   -- Open cursor
   OPEN c_itemloc;
   LOOP
      -- Fetch bulk data
      FETCH c_itemloc BULK COLLECT INTO itemloc_tab
      LIMIT c_bulk_limit;

      FORALL x IN INDICES OF itemloc_tab
         UPDATE item_loc il
            SET il.status = 'D',
                il.last_update_datetime = get_vdate,
                il.last_update_id = 'CNVOBJ_RNG'
          WHERE il.item = itemloc_tab(x).item 
            AND il.loc  = itemloc_tab(x).loc;

      -- Commit every c_bulk_limit records
      COMMIT;

      -- Loop exit criteria
      EXIT WHEN itemloc_tab.COUNT < c_bulk_limit;

   END LOOP;
   CLOSE c_itemloc;

EXCEPTION
   WHEN OTHERS
   THEN
      IF c_itemloc%ISOPEN
      THEN
         CLOSE c_itemloc;
      END IF;
      --
      DBMS_OUTPUT.put_line ('YOU FAIL');
END;

At the very least you'll be using more bulk operations and therefore significantly less context switching so it should save you time.

Of course the best method would be a single UPDATE statement but with that many records your DB might not be able to cope without interim commits.

EDIT: Having just seen @Florin's answer, it might be a good idea to use his cursor to get the ROWID of the taget update table before then using the BULK commands to perform the update as it will be the fastest access method.

E.G.

DECLARE
   c_bulk_limit CONSTANT PLS_INTEGER := 5000;
   --
   CURSOR c_itemloc
   IS
      SELECT --+ use_hash(d i)     
             i.rowid rid  
        FROM dc_item_loc d   
        JOIN item_loc i on (i.item= d.item and i.loc = d.loc) ; 
   --
   itemloc_tab DBMS_SQL.UROWID_TABLE;
BEGIN
   -- Open cursor
   OPEN c_itemloc;
   LOOP
      -- Fetch bulk data
      FETCH c_itemloc BULK COLLECT INTO itemloc_tab
      LIMIT c_bulk_limit;

      FORALL x IN INDICES OF itemloc_tab
         UPDATE item_loc
            SET status               = 'D',
                last_update_datetime = get_vdate,
                last_update_id       = 'CNVOBJ_RNG'
          WHERE rowid = itemloc_tab(x);   

      -- Commit every c_bulk_limit records
      COMMIT;

      -- Loop exit criteria
      EXIT WHEN itemloc_tab.COUNT < c_bulk_limit;

   END LOOP;
   CLOSE c_itemloc;

EXCEPTION
   WHEN OTHERS
   THEN
      IF c_itemloc%ISOPEN
      THEN
         CLOSE c_itemloc;
      END IF;
      --
      DBMS_OUTPUT.put_line ('YOU FAIL');
END;

SQL Server – Index vs Statistic

Indexes store actual data (data pages or index pages depending on the type of index we are talking about), and Statistics store data distribution. Therefore, CREATE INDEX will be the DDL to create an index (clustered, nonclustered, etc.) and CREATE STATISTICS is the DDL to create the statistics on columns within the table.

I recommend you read about these aspects of relational data. Below are a couple of beginner, introductory articles. These are very broad topics, and therefore the information on them can go very wide and very deep. Read up on the general idea of them below, and ask more specific questions when they arise.

BOL reference on Table and Index Organization
BOL reference on Clustered Index Structure
BOL reference on Nonclustered Index Structures
SQL Server Central on the Introduction to Indexes
BOL reference on Statistics

Here is a working example to see these two parts in action (commented to explain):

use testdb;
go

create table MyTable1
(
    id int identity(1, 1) not null,
    my_int_col int not null
);
go

insert into MyTable1(my_int_col)
values(1);
go 100

-- this statement will create a clustered index
-- on MyTable1.  The index key is the id field
-- but due to the nature of a clustered index
-- it will contain all of the table data
create clustered index MyTable1_CI
on MyTable1(id);
go


-- by default, SQL Server will create a statistics
-- on this index.  Here is proof.  We see a stat created
-- with the name of the index, and the consisting stat 
-- column of the index key column
select
    s.name as stats_name,
    c.name as column_name
from sys.stats s
inner join sys.stats_columns sc
on s.object_id = sc.object_id
and s.stats_id = sc.stats_id
inner join sys.columns c
on sc.object_id = c.object_id
and sc.column_id = c.column_id
where s.object_id = object_id('MyTable1');


-- here is a standalone statistics on a single column
create statistics MyTable1_MyIntCol
on MyTable1(my_int_col);
go

-- now look at the statistics that exist on the table.
-- we have the additional statistics that's not necessarily
-- corresponding to an index
select
    s.name as stats_name,
    c.name as column_name
from sys.stats s
inner join sys.stats_columns sc
on s.object_id = sc.object_id
and s.stats_id = sc.stats_id
inner join sys.columns c
on sc.object_id = c.object_id
and sc.column_id = c.column_id
where s.object_id = object_id('MyTable1');


-- what is a stat look like?  run DBCC SHOW_STATISTICS
-- to get a better idea of what is stored
dbcc show_statistics('MyTable1', 'MyTable1_CI');
go

Here is what a test sample of statistics can look like:

enter image description here

Notice that Statistics are the containment of the data distribution. They help SQL Server determine an optimal plan. A good example of this is, imagine you are going to life a heavy object. If you knew how much that weight because there was a weight marking on it, you'd determine the best way to lift and with what muscles. That's sort of what SQL Server does with statistics.

-- create a nonclustered index
-- with the key column as my_int_col
create index IX_MyTable1_MyIntCol
on MyTable1(my_int_col);
go

-- let's look at this index
select
    object_name(object_id) as object_name,
    name as index_name,
    index_id,
    type_desc,
    is_unique,
    fill_factor
from sys.indexes
where name = 'IX_MyTable1_MyIntCol';

-- now let's see some physical aspects
-- of this particular index
-- (I retrieved index_id from the above query)
select *
from sys.dm_db_index_physical_stats
(
    db_id('TestDB'),
    object_id('MyTable1'),
    4,
    null,
    'detailed'
);

We can see from the example above that the index actually contains data (depending on the type of index, leaf pages will be different).

This post has only shown a very very very brief overview of these two large aspects SQL Server. Both of these could take up chapters, and books. Read some of the references, and then you will have a better grasp.

Best Answer

Related Solutions

Frequent or Infrequent Commit

SQL Server – Index vs Statistic

Related Question