Why does adding ROWNUM to a query improve performance

oracleperformancequery-performancesubquery

I have two queries:

1) This query has a ROWNUM column (takes 20 seconds to execute):

SELECT
     ROWNUM
     ,ROAD_ID
     ,VERTEX_INDEX
     ,SDE.ST_X(ST_POINT) AS X
     ,SDE.ST_Y(ST_POINT) AS Y
FROM
(
     SELECT  
           ROWNUM
           ,a.ROAD_ID
           ,b.NUMBERS VERTEX_INDEX
           ,SDE.ST_PointN(a.SHAPE, b.NUMBERS) AS ST_POINT
     FROM  ENG.ROAD a
           CROSS JOIN ENG.NUMBERS b
     WHERE b.NUMBERS <= SDE.ST_NUMPOINTS(a.SHAPE)
)
--removed to do explain plan: ORDER BY ROAD_ID, VERTEX_INDEX

-------------------------------------------------------------------------------------------------------
| Id  | Operation              | Name                 | Rows  | Bytes |TempSpc| Cost (%CPU)| Time     |
-------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT       |                      |  5996 |   322K|       |   262   (1)| 00:00:01 |
|   1 |  COUNT                 |                      |       |       |       |            |          |
|   2 |   VIEW                 |                      |  5996 |   322K|       |   262   (1)| 00:00:01 |
|   3 |    COUNT               |                      |       |       |       |            |          |
|   4 |     MERGE JOIN         |                      |  5996 |  1545K|       |   262   (1)| 00:00:01 |
|   5 |      INDEX FULL SCAN   | R23715_SDE_ROWID_UK  |    30 |    90 |       |     1   (0)| 00:00:01 |
|*  6 |      SORT JOIN         |                      |  3997 |  1018K|  2392K|   261   (1)| 00:00:01 |
|   7 |       TABLE ACCESS FULL| ROAD                 |  3997 |  1018K|       |    34   (0)| 00:00:01 |
-------------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
"   6 - access(""B"".""NUMBERS""<=""SDE"".""ST_NUMPOINTS""(""A"".""SHAPE""))"
"       filter(""B"".""NUMBERS""<=""SDE"".""ST_NUMPOINTS""(""A"".""SHAPE""))"

2) This query does not have a ROWNUM column (takes 40 seconds to execute):

SELECT
    RDSEC
    ,VERTEX_INDEX
    ,SDE.ST_X(ST_POINT) AS X
    ,SDE.ST_Y(ST_POINT) AS Y
FROM
(
    SELECT  
          a.RDSEC
          ,b.NUMBERS VERTEX_INDEX
          ,SDE.ST_PointN(a.SHAPE, b.NUMBERS) AS ST_POINT
    FROM  INFRASTR.STRLN_ROUTE_SIMPLIFY a
          CROSS JOIN INFRASTR.NUMBERS b
    WHERE b.NUMBERS <= SDE.ST_NUMPOINTS(a.SHAPE)
)
--removed to do explain plan: ORDER BY RDSEC, VERTEX_INDEX

----------------------------------------------------------------------------------------------------
| Id  | Operation           | Name                 | Rows  | Bytes |TempSpc| Cost (%CPU)| Time     |
----------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT    |                      |  5996 |  1545K|       |   262   (1)| 00:00:01 |
|   1 |  MERGE JOIN         |                      |  5996 |  1545K|       |   262   (1)| 00:00:01 |
|   2 |   INDEX FULL SCAN   | R23715_SDE_ROWID_UK  |    30 |    90 |       |     1   (0)| 00:00:01 |
|*  3 |   SORT JOIN         |                      |  3997 |  1018K|  2392K|   261   (1)| 00:00:01 |
|   4 |    TABLE ACCESS FULL| ROAD                 |  3997 |  1018K|       |    34   (0)| 00:00:01 |
----------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
"   3 - access(""B"".""NUMBERS""<=""SDE"".""ST_NUMPOINTS""(""A"".""SHAPE""))"
"       filter(""B"".""NUMBERS""<=""SDE"".""ST_NUMPOINTS""(""A"".""SHAPE""))"

Why does adding ROWNUM in the first query improve performance? I would have thought it would have slowed things down, if anything.

There is some background info on the query here.

Update:

I had incorrectly assumed that I was unable to run explain plans. I'm using mere GIS software to run the query; not SQLplus (or any other proper database software).

I have been told otherwise; I'm now using DBMS_XPLAN to run explain plans.

Best Answer

Posting the query plan for the two versions of the query (and the exact SQL for the other version just for clarity's sake) would certainly be helpful. I would guess that doing so would prove out the following theory. Without it, though, I can guess what is likely going on but I can't be sure.

In general, the database is free to evaluate the elements of a query in whatever order it expects to be most efficient. That may mean that it executes the inline view first in its entirety and then applies the outer projection which would include calling the SDE.ST_X and SDE.ST_Y functions. Or it may mean that it transforms the query so that the outer functions get called for every row and that outer predicates, if any, get pushed into the inline view.

If you add a rownum to the inline view, in any current version of Oracle, you prevent the optimizer from pushing logic into the inline view since the optimizer can't be sure that it won't impact the results that are returned. In theory, some version of the future optimizer might be smart enough to figure out that evaluating the SDE.ST_X and SDE.ST_Y functions in the inline view wouldn't actually change the rownum that is generated but that pushing the order by could change the result so some transforms would be allowed and others are rejected. But in the present, adding a rownum acts to basically force the optimizer's hand to leave the inline view "as is".

Most likely, the optimizer's estimate for the cost and the selectivity of the various function calls are off. If the b.NUMBERS <= SDE.ST_NUMPOINTS(a.SHAPE) function is less expensive than the optimizer expects and filters out much more data than the optimizer expects, it may decide that it is more efficient to call SDE.ST_X and SDE.ST_Y on every row rather than filtering all the data first and then taking a second pass through it to call the functions and do the sorting. Ideally, you'd give the optimizer better statistics about the various functions so that it could come up with the more efficient plan without you needing to add a rownum (particularly since that trick might suddenly stop working at some point in the future) but I've certainly been known to add a rownum like this a time or two (along with a comment explaining why) if I needed a quick-and-dirty fix.

Related Solutions

Sql-server – Optimising join on large table

Your ix_hugetable looks quite useless because:

it is the clustered index (PK)
the INCLUDE makes no difference because a clustered index INCLUDEs all non-key columns (non-key values at lowest leaf = INCLUDEd = what a clustered index is)

In addition: - added or fk should be first - ID is first = not much use

Try changing the clustered key to (added, fk, id) and drop ix_hugetable. You've already tried (fk, added, id). If nothing else, you'll save a lot of disk space and index maintenance

Another option might be to try the FORCE ORDER hint with table order boh ways and no JOIN/INDEX hints. I try not to use JOIN/INDEX hints personally because you remove options for the optimiser. Many years ago I was told (seminar with a SQL Guru) that FORCE ORDER hint can help when you have huge table JOIN small table: YMMV 7 years later...

Oh, and let us know where the DBA lives so we can arrange for some percussion adjustment

Edit, after 02 Jun update

The 4th column is not part of the non-clustered index so it uses the clustered index.

Try changing the NC index to INCLUDE the value column so it doesn't have to access the value column for the clustered index

create nonclustered index ix_hugetable on dbo.hugetable (
    fk asc, added asc
) include(value)

Note: If value is not nullable then it is the same as COUNT(*) semantically. But for SUM it need the actual value, not existence.

As an example, if you change COUNT(value) to COUNT(DISTINCT value) without changing the index it should break the query again because it has to process value as a value, not as existence.

The query needs 3 columns: added, fk, value. The first 2 are filtered/joined so are key columns. value is just used so can be included. Classic use of a covering index.

Oracle aggregate functions slows query down massively

Thanks for all your input - I seem to have solved it. I recalculated table statistics using:

SQL> EXEC DBMS_STATS.GATHER_TABLE_STATS(OWNNAME=>USER,TABNAME=>'RESULTS
_DATA');

After this, the queries worked fine.

Best Answer

Related Solutions

Sql-server – Optimising join on large table

Oracle aggregate functions slows query down massively

Related Question