Oracle Horizontal Partitioning – How to Sort Fetched Data

javaoraclepagingpartitioning

I have a telco billing software system. In it there are daily logs of users' calls. The logs are horizontally partitioned by date (month). Each partition is stored in a separate database and may be spread over multiple instances.

In the UI the user will specify a date range. The data returned can be sorted on any field. The date range may span over multiple partitions. The application must support paging through the date range's data.

I cannot load too many records into memory for sorting. Putting sort inside the query only gives me sorted data inside one result-set.

So I need to sort data from multiple partitions which are each individually sorted. How can I return sorted records to the UI from multiple sorted result-sets?

Best Answer

It sounds like your architecture has gone past partitioning and is sharded. If you have system software or an application framework in place which can handle sharding that is where this requirement should be implemented.

In order to get the correct sort sequence you will have to have the results from all involved shards in the same place at the same time. There is no way to get some results from one shard and then get more results from another shard and be able to guarantee any sort sequence the user may choose.

One way (as BriteSponge suggested) would be to define a view which spans all the shards. Let the DBMS deal with the complexities of marshalling the various shards' results sets. This would be my preferred solution if it can be managed. It separates the application from the DB implementation details.

If you have to write this yourself there is some hope in the fact you have paging in place. This limits the maximum number or rows you will show in the UI and, therefore, the maximum number of rows that have to be returned from each shard. From the date range you can determine which shards are required. To each shard you submit a query of the form

select <columns>
from <table>
where date between <start> and <end>
order by <user selection>
fetch first <number> rows only

The <number> will be the maximum number of rows you want to show in a single page in the UI. While it is unlikely they will all come from a single shard it is possible so that's what the limit must be.

If your version of Oracle does not support the final limiting clause the following will still work, but will be much less efficient.

Perform a multi-way merge on the resultsets. Stop when you reach the limit for one page in the UI. Discard the remainder of the resultsets.

When the user pages forward, re-submit the SELECT using standard paging techniques.

When the user chooses a different sort sequence amend the SQL to match and resubmit to each shard.

My preference would be to implement this server side in a stored procedure. This will keep application and storage as separate as possible. Also, rather than a multi-way merge, the various resultsets can all go into a single working table which is then sorted and the correct number of rows returned to the application.

If you do implement it client-side, rather than discarding the unused portion of the resultsets they could be cached. This which may reduce the load for subsequent pages at the cost of additional complexity.