HOW does ROWNUM work with this “optimization”

oracleperformance

I'm trying to optimize a query in which I have some requirements where I need to be able to arbitrarily select a row ROWX from a query. Why is it that

SELECT XML_DATA FROM 
    (Select ROWNUM AS MYROWNUM , EXTRACT(COL1,'expresion').getStringVal() AS XML_DATA
    From "Table1" Where Col2 = 'val' AND ROWNUM < ROWX+1)
WHERE MYROWNUM = ROWX ;

Runs much faster than

SELECT XML_DATA FROM (Select ROWNUM AS MYROWNUM , EXTRACT(COL1,'expresion').getStringVal() AS XML_DATA
    From "Table1" Where Col2 = 'val') AND MYROWNUM = ROWX

Obviously as ROWX grows there is some performance degradation with the first query but the second query has a constant run time (outside of the case where ROWX=1) of about 4 minutes where the base runtime for the first query is non-existent and growth is more acceptable than having to wait 4 minutes per iteration.

Version:
Oracle Database 11g Enterprise Edition  11.2.0.3.0  64bit Production

Sorry I'm a software developer (not a DBA!) that primarily works with MySQL/SQL Server so I'm not familiar with Oracle's internal workings feel free to point me in an entirely different direction,

Thanks.


Based on Justin Cave's answer I wrote the query to be deterministic and a bit cleaner (less hacky)

SELECT EXTRACT(COL1,'expresion').getStringVal() AS XML_DATA
FROM (Select ROWNUM AS MYROWNUM , CLUSTERED_ID From "Table1" Where Col2 = 'val' ORDER BY CLUSTERED_ID) t1 
JOIN table1 ON (t1.CLUSTERED_ID = table1.CLUSTERED_ID)
WHERE MYROWNUM = ROWX

This has the benifit of running in constant time (where n = rowx rather than count()) and is much easier to understand.

Best Answer

The rownum pseudocolumn is assigned as rows are processed. The predicate rownum = <<x>> will evaluate to false if <<x>> is anything other than 1.

Conceptually (and I emphasize that this isn't how Oracle actually works, it's just a useful abstraction), the second query does something like

  1. Read a row from table1
  2. Evaluate the col2 = 'val' expression
  3. If col2 = 'val', evaluate the extract
  4. Assign the rownum to the row that has been returned. Since no other rows have satisfied all the criteria yet, the rownum will be 1
  5. Evaluate the rownum = <<x>> predicate. If <<x>> is anything other than 1, the row is rejected
  6. Go back to step 1 and read the next row

In this loop, every row will be read from table1. And every row will be rejected (assuming <<x>> is something other than 1) because of the rownum = <<x>> predicate. So you'll incur the cost of reading every row from the table, evaluating the extract for every row, and you'll never return any data.

Your first query appears to be working because it is filtering out <<x>> + 1 rows in the subquery. Of course, since there is no ORDER BY, your first query always returns an arbitrary row. It would be entirely legal for Oracle to return the same row for every value of <<x>> in your first query. It probably won't, of course, but it is entirely possible that you'd get different rows over time, that some rows would never be returned, that other rows would be returned multiple times, etc. If you want a deterministic result (prior to 12.1 which has some simpler syntax options), you'd need to do two levels of nesting

SELECT *
  FROM (SELECT b.*,
               rownum rn
          FROM (SELECT a.*
                  FROM some_table a
                 ORDER BY some_column) b
         WHERE rownum <= <<upper_limit>>) c
 WHERE rn >= <<lower_limit>>

For a pretty thorough discussion of the various issues, take a look through this askTom thread.