Querying a database efficiently for a huge chunk of data

oracleperformancequery

Here is my query

SELECT *
FROM
    (
        SELECT a.*, rownum rnum
        FROM
            (
                SELECT id, data
                FROM t
                ORDER BY id
            ) a
       WHERE rownum <= HIGHER
    )
WHERE rnum >= LOWER;

I have a huge set of data and hence I am getting parts of it one at a time, say from 1 to 100 in first attempt, 101-200 in the second attempt and so on.

If I execute the above query it will do the following piece for each portion of data:

SELECT id, data
FROM t
ORDER BY id

I think it is pretty inefficient to do it for each part. Can't I do something like order my data set once and apply a rownum? Will it be of any use at all?

Best Answer

Is your intention to get every row eventually? For example, are you fetching all the rows and passing them to some sort of process that needs to process every row? Or is your intention to present pages of results to a human who will likely only look at the first or second page of results.

Assuming that you are presenting results to a human that will only be looking at the first couple pages of data, the query you have is likely to be reasonably efficient. Assuming that there is an index on the id column, in order to fetch rows 101-200, Oracle would simply have to read the first 200 id values from the index then filter out rows 1-100. That's not quite as efficient as getting the first page of results but it's still pretty efficient.

Of course, as you get further and further into the data, the pagination query gets less and less efficient. That's a problem if your intention is to eventually fetch every row. It's generally not a problem, though, if you are presenting results to a human. Humans don't really care how long it takes to fetch page 50 of a result set-- they're going to give up long before then.

If your intention is to send the data to a Java process to process the data (this will be substantially less efficient than processing the data in the database-- Oracle and PL/SQL are designed specifically to process large amounts of data), it would generally make sense to issue a single query without an ORDER BY, have a master thread on the Java application server that fetches N rows of data, spawns a thread to process that data, fetches the next N rows of data, etc.