Oracle: Physically reorganizing rows of a huge table by a different key

indexoracleoracle-12c

I have a huge table RECORD — a few billion rows — which contains data loaded continuously in a lot of batches. I can assign a primary key RECORD_ID from sequence, or use natural composite key BATCH_NUM + RECORD_NUM, which gives me a nice, unfragmented table where a batch with thousands of records only spans over several physical blocks, which is useful for next batch-aware step of processing.

But later I need to group together rows from arbitrary batches based on another field value, some RECORD_KEY. There are hundreds of millions of different RECORD_KEYs spread across rows of the table, therefore if I want to pick 10 rows with the same RECORD_KEY, it will almost certainly require retrieval of 10 physical blocks. Due to the table size this also cannot easily be cached so it is 10 physical reads (plus overhead on index traversal). This is obviously very slow.

Example:

Table RECORD

RECORD_ID is primary key from sequence
BATCH_NUM and RECORD_NUM are unique constraint (and could be a primary key)
RECORD_KEY is an indexed column

Sample data:

| RECORD_ID | BATCH_NUM | RECORD_NUM | RECORD_KEY
|         1 |         1 |          1 |  987654321
|         2 |         1 |          2 |  876543219
|         3 |         1 |          3 |  765432198
|         4 |         2 |          1 |  654321987
|         5 |         2 |          2 |  543219876
...
| 100000006 |   3000003 |          2 |  432198765
| 100000007 |   3000003 |          3 |  876543219
| 100000008 |   3000003 |          4 |  321987654
...
| 200000009 |   6000004 |          3 |  219876543
| 200000010 |   6000004 |          4 |  876543219
| 200000011 |   6000004 |          5 |  198765432
...

This SQL command will be fast because it will only have to retrieve one physical block:

select RECORD_ID from RECORD where BATCH_NUM = 1

This SQL command will be slow because it will have to retrieve three physical blocks — one for each retrieved row:

select RECORD_ID from RECORD where RECORD_KEY = 876543219

EDIT:

The above is just an example. Typically I would have:

thousands of rows per BATCH_NUM
tens of rows per RECORD_KEY
thousands of rows per lookup, by BATCH_NUM of RECORD_KEY

SQLs look like this, both retrieving around 1000 rows:

select RECORD_ID from RECORD where BATCH_NUM = 123456

select RECORD_ID from RECORD where RECORD_KEY in (
  select COLUMN_VALUE from TABLE(batch_num_tbl) -- 100 values
)

The execution plans look both reasonable and similar — using the respective index. However while the BATCH_NUM lookup takes under 1 second to execute, RECORD_KEY lookup takes around 20 seconds.

Options which I considered:

Partitioning: I could do of hash partitioning on RECORD_KEY. It would put the records a bit closer to each other, increasing the chance of them being in the same block, and enabling some partition-wise joins. Might help a bit but will not resolve the problem for me.
Index-organized table: Since it requires primary key, I would have to make primary key RECORD_KEY + RECORD_ID. Also the lookup on BATCH_NUM would then become slow.
Re-inserting the rows: A rather ugly solution, where I would regularly select rows ordered by RECORD_KEY, delete them and insert append them, effectively putting them next to each other.

Please note I am not any experienced DBA, just a developer stuck with this non-trivial DB problem.

Best Answer

Create an index on (record_key, record_id). This way oracle will most likely use the index only to answer your slow running query. You can add more columns to that index, so that you also get the needed content. Hopefully you don't need all columns.

Be careful about, what impact this index has on your load performance.

Best Answer

Related Solutions

Mysql – What logical structure is used for storing rows in Oracle

Postgresql – How does PostgreSQL physically order new records on disk (after a cluster on primary key)

Related Question