Why does adding ROWNUM to a query improve performance

oracleperformancequery-performancesubquery

I have two queries:

1) This query has a ROWNUM column (takes 20 seconds to execute):

SELECT
     ROWNUM
     ,ROAD_ID
     ,VERTEX_INDEX
     ,SDE.ST_X(ST_POINT) AS X
     ,SDE.ST_Y(ST_POINT) AS Y
FROM
(
     SELECT  
           ROWNUM
           ,a.ROAD_ID
           ,b.NUMBERS VERTEX_INDEX
           ,SDE.ST_PointN(a.SHAPE, b.NUMBERS) AS ST_POINT
     FROM  ENG.ROAD a
           CROSS JOIN ENG.NUMBERS b
     WHERE b.NUMBERS <= SDE.ST_NUMPOINTS(a.SHAPE)
)
--removed to do explain plan: ORDER BY ROAD_ID, VERTEX_INDEX

-------------------------------------------------------------------------------------------------------
| Id  | Operation              | Name                 | Rows  | Bytes |TempSpc| Cost (%CPU)| Time     |
-------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT       |                      |  5996 |   322K|       |   262   (1)| 00:00:01 |
|   1 |  COUNT                 |                      |       |       |       |            |          |
|   2 |   VIEW                 |                      |  5996 |   322K|       |   262   (1)| 00:00:01 |
|   3 |    COUNT               |                      |       |       |       |            |          |
|   4 |     MERGE JOIN         |                      |  5996 |  1545K|       |   262   (1)| 00:00:01 |
|   5 |      INDEX FULL SCAN   | R23715_SDE_ROWID_UK  |    30 |    90 |       |     1   (0)| 00:00:01 |
|*  6 |      SORT JOIN         |                      |  3997 |  1018K|  2392K|   261   (1)| 00:00:01 |
|   7 |       TABLE ACCESS FULL| ROAD                 |  3997 |  1018K|       |    34   (0)| 00:00:01 |
-------------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
"   6 - access(""B"".""NUMBERS""<=""SDE"".""ST_NUMPOINTS""(""A"".""SHAPE""))"
"       filter(""B"".""NUMBERS""<=""SDE"".""ST_NUMPOINTS""(""A"".""SHAPE""))"

2) This query does not have a ROWNUM column (takes 40 seconds to execute):

SELECT
    RDSEC
    ,VERTEX_INDEX
    ,SDE.ST_X(ST_POINT) AS X
    ,SDE.ST_Y(ST_POINT) AS Y
FROM
(
    SELECT  
          a.RDSEC
          ,b.NUMBERS VERTEX_INDEX
          ,SDE.ST_PointN(a.SHAPE, b.NUMBERS) AS ST_POINT
    FROM  INFRASTR.STRLN_ROUTE_SIMPLIFY a
          CROSS JOIN INFRASTR.NUMBERS b
    WHERE b.NUMBERS <= SDE.ST_NUMPOINTS(a.SHAPE)
)
--removed to do explain plan: ORDER BY RDSEC, VERTEX_INDEX

----------------------------------------------------------------------------------------------------
| Id  | Operation           | Name                 | Rows  | Bytes |TempSpc| Cost (%CPU)| Time     |
----------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT    |                      |  5996 |  1545K|       |   262   (1)| 00:00:01 |
|   1 |  MERGE JOIN         |                      |  5996 |  1545K|       |   262   (1)| 00:00:01 |
|   2 |   INDEX FULL SCAN   | R23715_SDE_ROWID_UK  |    30 |    90 |       |     1   (0)| 00:00:01 |
|*  3 |   SORT JOIN         |                      |  3997 |  1018K|  2392K|   261   (1)| 00:00:01 |
|   4 |    TABLE ACCESS FULL| ROAD                 |  3997 |  1018K|       |    34   (0)| 00:00:01 |
----------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
"   3 - access(""B"".""NUMBERS""<=""SDE"".""ST_NUMPOINTS""(""A"".""SHAPE""))"
"       filter(""B"".""NUMBERS""<=""SDE"".""ST_NUMPOINTS""(""A"".""SHAPE""))"

Why does adding ROWNUM in the first query improve performance? I would have thought it would have slowed things down, if anything.

There is some background info on the query here.

Update:

I had incorrectly assumed that I was unable to run explain plans. I'm using mere GIS software to run the query; not SQLplus (or any other proper database software).

I have been told otherwise; I'm now using DBMS_XPLAN to run explain plans.

Best Answer

Posting the query plan for the two versions of the query (and the exact SQL for the other version just for clarity's sake) would certainly be helpful. I would guess that doing so would prove out the following theory. Without it, though, I can guess what is likely going on but I can't be sure.

In general, the database is free to evaluate the elements of a query in whatever order it expects to be most efficient. That may mean that it executes the inline view first in its entirety and then applies the outer projection which would include calling the SDE.ST_X and SDE.ST_Y functions. Or it may mean that it transforms the query so that the outer functions get called for every row and that outer predicates, if any, get pushed into the inline view.

If you add a rownum to the inline view, in any current version of Oracle, you prevent the optimizer from pushing logic into the inline view since the optimizer can't be sure that it won't impact the results that are returned. In theory, some version of the future optimizer might be smart enough to figure out that evaluating the SDE.ST_X and SDE.ST_Y functions in the inline view wouldn't actually change the rownum that is generated but that pushing the order by could change the result so some transforms would be allowed and others are rejected. But in the present, adding a rownum acts to basically force the optimizer's hand to leave the inline view "as is".

Most likely, the optimizer's estimate for the cost and the selectivity of the various function calls are off. If the b.NUMBERS <= SDE.ST_NUMPOINTS(a.SHAPE) function is less expensive than the optimizer expects and filters out much more data than the optimizer expects, it may decide that it is more efficient to call SDE.ST_X and SDE.ST_Y on every row rather than filtering all the data first and then taking a second pass through it to call the functions and do the sorting. Ideally, you'd give the optimizer better statistics about the various functions so that it could come up with the more efficient plan without you needing to add a rownum (particularly since that trick might suddenly stop working at some point in the future) but I've certainly been known to add a rownum like this a time or two (along with a comment explaining why) if I needed a quick-and-dirty fix.