Optimal way of loading, registering, indexing and querying a huge XMLDB

optimizationoracleoracle-11g-r2performancexml

I've got 35 million highly structured XML documents, with a total size of 500 GB. An XSD is available for this data, and the end goal is to transform the data into an RDB. All fields can be considered varchar, number, or date; except maybe one or two CLOB fields. The XSD structure contains a lot of 1-n relationships, and sub-nested structures, so is quite complex.

The data has to get loaded once, and then transformed into an RDB once, so speed of update/insert queries is irrelevant. Also the speed of loading/index creation is not so important, since it only has to run once. As long as it finishes within a week or two i'm happy.

At the moment I'm setting up the XMLDB by registering the XSD as Object Relational, with create objects and create tables set to true. For now, I haven't added any other significant annotations to the XSD, and I'm only loading about 50MB's of data as a test. I'm using a whole set of XMLTable/flwor queries to transform the XML data into RDB tables. I fear scaling this to 500 GB will cause queries that run for eternity.

As I'm quite new to XMLDB, and there are -a lot- of possible tweaks and settings in the XMLDB documentation, I'm wondering if there are some crucial features/annotations/optimizations…. i should be looking out for that can have an impact on the overall speed of the final queries, or other parts of the process.

This is a very broad question, but, any tips are welcome, and if there's more information that might be relevant, ask away!

Best Answer

Wouter got, hopefully, useful answers here:

https://community.oracle.com/message/12608939

That said, being the Oracle ACE Director for this technology. Oracle XMLDB functionality is nowadays so interwoven with the Oracle database that this is an MANDATORY (install) functionality for Oracle 12c and onwards. It is the basis for much in the database like ACL security, database security, Database Vault, Oracle APEX, Multitenent functionality, import and export, DBMS_packages like DBMS_METADATA, Oracle performance implementations (AWR XMLtype columns etc etc), DBA/Developer tools like DB Express...

XML is (alas?) everywhere nowadays. To give one example, you using it now (that is if you use a mobile phone...)