DB2 – XML Indexes and ‘ORDER BY’ Clause

db2indexorder-byxml

My table looks like this:

CREATE TABLE "Content" (
    "id" VARCHAR(36) NOT NULL,
    "xml" XML
);

I have an index defined

create index "idx_creation_date" on "Content"("xml") generate keys 
    using xmlpattern '/*/dateCreated' as sql timestamp;

This index works fine when using it in comparisons (XMLEXISTS), but now I want to use that index to sort my table according with the date my content was created and select the first 20 items.

Sadly, this didn't work

SELECT "Content".* FROM "Content" 
    ORDER BY XMLCAST(XMLQUERY('$xml/*/dateCreated' PASSING "xml" as "xml")
    as TIMESTAMP) DESC FETCH FIRST 20 ROWS ONLY

because XMLQUERY doesn't use indexes. The same goes for

xquery
let $items :=
   for $i in db2-fn:xmlcolumn("Content.xml")
   order by $i/*/dateCreated
   return $i
for $i at $count in subsequence($items, 1, 20)
return $i

because let and return can't use indexes.

Anyone has an idea how I can use the existing (and working) index on dateCreated in a ORDER BY query?

Best Answer

Indexes over XML data (a.k.a. "XML Index") differ from relational indexes in several ways. They are used to identify documents and the position of the nodes within for the indexed pattern. Several nodes within a single XML document may match that pattern. For a relational index each row is indexed, for an XML index there might be zero, one or more matches per document (which is stored as part of a relational row). Some more information can be found in the DB2 documentation on Indexing XML Data.

By SQL Standard the XML data type is kind of a black box. It does not allow comparison and hence sorting. What you are trying to do is sort an XML fragment by a specific node. The fragment is already derived from the stored XML value/document. At the time of the processing there could be no connection between the XML index and the fragment. A good reference on XML indexes and their usage was written by Matthias Nicola.

With that background information we can now focus on your specific question. How would DB2 know that a timestamp/date value is present for each row and the related "Content"? How should the sorting be organized? Because of the index entry/document/row relationship outlined above the sorting in the SQL statement won't work.

For your XQuery I assume that the data is untyped, no schema is used. How would DB2 know that dateCreated is of timestamp and that the index should be used? Have you tried a cast around the dateCreated to let DB2 know that timestamp data is expected?

BTW: You can specify multiple XML indexes for the same pattern, but with different data types.