Oracle Index Tuning – Function Based Index with XPATH extractValue

indexindex-tuningoracleoracle-11g-r2xml

// Oracle Version 11.2.0.4  Standard Edition
// OS RedHat

I am trying to create a function index on a large table, based on using an XML object.

So I have done stuff such as:

CREATE INDEX test_function_itemgroup_id 
ON test_function 
( 
    extractValue
    (
        RECORD, 
        '/Product/PublishingDetail/ItemGroup/ItemGroupId', 
        'xmlns="http://www.editeur.org/onix/3.0/reference"' 
    )
)

So when I have a query such as:

select PRODUCT_ID 
from TEST_FUNCTION pr
where 
    extractValue
    ( 
        RECORD, 
        '/Product/PublishingDetail/ItemGroup/ItemGroupId',
        'xmlns="http://www.editeur.org/onix/3.0/reference"' 
    ) = 'NAME-CLASS'

The function Index is used and I know I have the right namespace for the xml.

When I do stuff such as:

CREATE INDEX test_function_nbd 
ON test_function 
( 
    extractValue
    ( 
        RECORD, 
        '/Product/Extras[ExtrasIDType="RECORDCREATEDBY"]/IDValue',
        'xmlns="http://www.editeur.org/onix/3.0/reference"' 
    )
)

select PRODUCT_ID 
from TEST_FUNCTION pr 
where 
    extractValue
    ( 
        RECORD, 
        '/Product/Extras[ExtrasIDType="RECORDCREATEDBY"]/IDValue',
        'xmlns="http://www.editeur.org/onix/3.0/reference"' 
    ) = 'Other'  

The system does a full table scan instead of using the function index.

I have used an XMLIndex, but the performance of that has been so slow and does not seem to allow the function based indexes to be used.

Has anyone got any suggestions about how to proceed with this?

Update:

Added the full xmlns as have enough reputation now

There are no other indexes on the table.

What appears is that the Optimizer seems to be converting the XPATH to
two different queries.

/Product/Extras/ExtrasIDType and
/Product/Extras/IDValue

which are not indexed by the function.

In my particular case I can't index these individually as these can have multiple values and function based indexes can't use multiple values.

Also if I have to use two function based indexes the system will do a full table scan. (According to oracle 11 documentation)

Best Answer

Use the /*+ NO_XML_QUERY_REWRITE */ hint. My environment for the below is 11.2.0.3.0 on Linux_x86-64 platform, Enterprise Edition.

create table xml_table (c1 xmltype) pctfree 99;

(PCTFREE 99, just a simple trick to spread the table over many blocks, thus making full table scan even more costly and inefficient. 1 row/block in this case.)

insert into xml_table select 
xmltype('
  <stuff>
    <something>
      <x1>' || rownum || '</x1>
      <x2>' || rownum || '</x2>
    </something>
  </stuff>
')
from dual connect by level <= 10000;
commit;

So I have a table with 10000 rows, with XMLType values like:

<stuff>
   <something>
      <x1>1</x1>
      <x2>1</x2>
   </something>
</stuff>

<stuff>
   <something>
      <x1>2</x1>
      <x2>2</x2>
   </something>
</stuff>

...

And I want to run queries like, let's say:

select * from xml_table 
where extractvalue(c1, '/stuff/something[x1=5]/x2/text()') = '5';

Create the function-based index and gather statistics:

create index xml_table_fbi1 on 
  xml_table(extractvalue(c1, '/stuff/something[x1=5]/x2/text()'));

exec dbms_stats.gather_table_stats(user, 'XML_TABLE', no_invalidate=>false);

Run the query and see what happened:

alter session set statistics_level=all;

select * from xml_table where extractvalue(c1, '/stuff/something[x1=5]/x2/text()') = '5';

C1
--------------------
<stuff>
  <something>
    <x1>5</x1>
    <x2>5</x2>
  </something>
</stuff>

Check the plan and runtime statistics:

select * from table(dbms_xplan.display_cursor(format=>'allstats last cost'));

PLAN_TABLE_OUTPUT
------------------------------------------------------------------------------------------
SQL_ID  a78z6npnp94va, child number 0
-------------------------------------
select * from xml_table where extractvalue(c1,
'/stuff/something[x1=5]/x2/text()') = '5'

Plan hash value: 3307077916

---------------------------------------------------------------------------------------------------------
| Id  | Operation            | Name      | Starts | E-Rows | Cost (%CPU)| A-Rows |   A-Time   | Buffers |
---------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT     |           |      1 |        |   226K(100)|      1 |00:00:00.43 |   10239 |
|*  1 |  FILTER              |           |      1 |        |            |      1 |00:00:00.43 |   10239 |
|   2 |   TABLE ACCESS FULL  | XML_TABLE |      1 |  10000 |  2764   (2)|  10000 |00:00:00.04 |   10239 |
|   3 |   SORT AGGREGATE     |           |  10000 |      1 |            |  10000 |00:00:00.37 |       0 |
|   4 |    NESTED LOOPS SEMI |           |  10000 |    667K|   223K  (1)|      1 |00:00:00.28 |       0 |
|   5 |     XPATH EVALUATION |           |  10000 |        |            |  10000 |00:00:00.12 |       0 |
|   6 |     XPATH EVALUATION |           |  10000 |        |            |      1 |00:00:00.14 |       0 |
---------------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   1 - filter(SYS_XMLTYPE2SQL(SYS_XQSEQ2CON(SYS_XQEXTRACT(,'/something/x2/text()')))
              ='5')

The database transformed the XPath expression, so the function-based index is not a valid candidate anymore, so it is a full table scan with over 10000 buffer gets.

While running the same with the above hint:

select /*+ NO_XML_QUERY_REWRITE */ * from xml_table where
extractvalue(c1, '/stuff/something[x1=5]/x2/text()') = '5';

C1
--------------------
<stuff>
  <something>
    <x1>5</x1>
    <x2>5</x2>
  </something>
</stuff>

Plan and runtime statistics:

select * from table(dbms_xplan.display_cursor(format=>'allstats last cost'));

PLAN_TABLE_OUTPUT
------------------------------------------------------------------------------------------------
SQL_ID  gby3tfs2z7mqm, child number 0
-------------------------------------
select /*+ NO_XML_QUERY_REWRITE */ * from xml_table where
extractvalue(c1, '/stuff/something[x1=5]/x2/text()') = '5'

Plan hash value: 2259369741

---------------------------------------------------------------------------------------------------------------------
| Id  | Operation                   | Name           | Starts | E-Rows | Cost (%CPU)| A-Rows |   A-Time   | Buffers |
---------------------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT            |                |      1 |        |     2 (100)|      1 |00:00:00.01 |       2 |
|   1 |  TABLE ACCESS BY INDEX ROWID| XML_TABLE      |      1 |      1 |     2   (0)|      1 |00:00:00.01 |       2 |
|*  2 |   INDEX RANGE SCAN          | XML_TABLE_FBI1 |      1 |      1 |     1   (0)|      1 |00:00:00.01 |       1 |
---------------------------------------------------------------------------------------------------------------------


Predicate Information (identified by operation id):
---------------------------------------------------

   2 - access("XML_TABLE"."SYS_NC00003$"='5')

Much better.

Also notice the Cost column in the plans. The cost of the first plan was 226K, but with disabling XML rewrite, the cost of the second plan was 2. I didn't investigate this topic deeper, but it seems this is another kind of query transformation, that is not based on the cost.