How should I store a large number of dynamically generated heterogeneous tables

oracleoracle-11g-r2

I'm a developer on a reporting application that runs complex user-defined queries, stores the results, and then sends the result to Excel generation, web-based tables, and graphs, and so on. Currently, we're storing the entire result set serialized as a clob in an existing table.

These result sets may be large (some >1GB) and expensive to generate (many minutes). Regenerating them on-the-fly is not an option. They also are heterogeneous — each result set will likely have very different columns. The result sets are immutable — they will never be modified, although they will eventually be deleted. Each result set will be read occasionally, but often in bursts (e.g. if someone views the data via our web interface, then we expect many more requests soon).

Currently, these thousands of result sets are taking up a lot of DB space. In addition, it is difficult to manipulate the values other than to select the entire result set or the first n records. We would like to be able to sort and filter the result set. It seems logical that if we have tabular data that we'd like to sort and filter, then storing it as a table is the obvious answer.

However, I'm not sure what the right way would be to do this in Oracle. The naive solution would be to dynamically run CREATE TABLE statements for every one, naming them from a GUID or something. This kind of vaguely feels like an unwise thing to do, maybe unless we also move this data to a separate database. I looked up global temporary tables, but we need them to persist across sessions. A co-worker suggested using external tables, but I'm not sure I understand the performance implications of that. Basically, I'm in over my head from an Oracle architecture standpoint, and I'd appreciate advice from more experienced developers and DBAs.

So: given these constraints, what approaches should I consider?

Best Answer

You could use a data-defined schema, which isn't terrific for performance or manageability but comes closer to doing what the database was designed to do.

CREATE TABLE report_result_tab (
   report_result_id NUMBER NOT NULL,
   row_id           NUMBER NOT NULL,
   column_name      VARCHAR2(30) NOT NULL,
   value_number     NUMBER,
   value_varchar    VARCHAR2(4000));

ALTER TABLE report_result_tab
   ADD CONSTRAINT report_result_pk
   PRIMARY KEY (report_result_id, row_id, column_name);

With a table like that, you get a variable number of "columns" by putting the column as part of the key.

Let's say we're reporting on a table like this:

CREATE TABLE employee (
   employee_id NUMBER,
   last_name   VARCHAR2(255));

Then to report on it, with one report having two records, it would look like this:

INSERT INTO report_result_tab (
   report_result_id,
   row_id,
   column_name,
   value_number,
   value_varchar)
VALUES (
   0,             /* report_result_id */
   0,             /* row_id */
   'EMPLOYEE_ID', /* column_name */
   37,            /* value_number */
   NULL);         /* value_varchar */

INSERT INTO report_result_tab (
   report_result_id, 
   row_id,
   column_name,
   value_number,
   value_varchar)
VALUES (
   0,             /* report_result_id */
   0,             /* row_id */
   'LAST_NAME',   /* column_name */
   NULL,          /* value_number */
   'Smith');      /* value_varchar */

INSERT INTO report_result_tab (
   report_result_id, 
   row_id,
   column_name,
   value_number,
   value_varchar)
VALUES (
   0,             /* report_result_id */
   1,             /* row_id */
   'EMPLOYEE_ID', /* column_name */
   42,            /* value_number */
   NULL);         /* value_varchar */

INSERT INTO report_result_tab (
   report_result_id, 
   row_id,
   column_name,
   value_number,
   value_varchar)
VALUES (
   0,             /* report_result_id */
   1,             /* row_id */
   'LAST_NAME',   /* column_name */
   NULL,          /* value_number */
   'Jones');      /* value_varchar */

You'll have to figure out what to put as the COLUMN_NAME in the base data. You could use a numeric COLUMN_ID which would save space. In either case, you'll also have to figure out a mapping to get the user-friendly column names and their sort order.

The method above is advantageous for sparse data, but keeping so many copies of the column IDs is expensive.

I've also seen a much uglier version which could potentially be cheaper on disk space. You could define a particular report type, which is nothing more than a collection of column metadata, then build your table to handle the maximum number of columns of each type.

CREATE TABLE report_result_tab (
   report_result_id NUMBER NOT NULL,
   row_id           NUMBER NOT NULL,
   n00              NUMBER,
   n01              NUMBER,
   n02              NUMBER,
   n03              NUMBER,
   n04              NUMBER,
   /* ... */
   s00              VARCHAR2(4000),
   s01              VARCHAR2(4000),
   s02              VARCHAR2(4000),
   s03              VARCHAR2(4000),
   s04              VARCHAR2(4000),
   /* ... */
   d00              DATE,
   d01              DATE,
   d02              DATE,
   d03              DATE,
   d04              DATE);

INSERT INTO report_result_tab (
   report_result_id,
   row_id,
   n00,
   n01,
   n02,
   n03,
   n04,
   /* ... */
   s00,
   s01,
   s02,
   s03,
   s04,
   /* ... */
   d00,
   d01,
   d02,
   d03,
   d04)
VALUES (
   0,        /* report_result_id */
   0,        /* row_id */
   37,       /* n00 */
   NULL,     /* n01 */
   NULL,     /* n02 */
   NULL,     /* n03 */
   NULL,     /* n04 */
   /* ... */
   'Smith',  /* s00 */
   NULL,     /* s01 */
   NULL,     /* s02 */
   NULL,     /* s03 */
   NULL,     /* s04 */
   /* ... */
   NULL,     /* d00 */
   NULL,     /* d01 */
   NULL,     /* d02 */
   NULL,     /* d03 */
   NULL);    /* d04 */

INSERT INTO report_result_tab (
   report_result_id,
   row_id,
   n00,
   n01,
   n02,
   n03,
   n04,
   /* ... */
   s00,
   s01,
   s02,
   s03,
   s04,
   /* ... */
   d00,
   d01,
   d02,
   d03,
   d04)
VALUES (
   0,        /* report_result_id */
   0,        /* row_id */
   42,       /* n00 */
   NULL,     /* n01 */
   NULL,     /* n02 */
   NULL,     /* n03 */
   NULL,     /* n04 */
   /* ... */
   'Jones',  /* s00 */
   NULL,     /* s01 */
   NULL,     /* s02 */
   NULL,     /* s03 */
   NULL,     /* s04 */
   /* ... */
   NULL,     /* d00 */
   NULL,     /* d01 */
   NULL,     /* d02 */
   NULL,     /* d03 */
   NULL);    /* d04 */