Cross join on a numbers table to get line vertices, is there a better way

geometrynumber-tableoraclespatial

The Question:

I have a spatial table (road lines), stored using ESRI's SDE.ST_GEOMETRY user-defined datatype in an Oracle 12c geodatabase. I want to list the line vertices so that I can ultimately access & update their coordinates. If I was using SDO_GEOMETRY/Oracle Locator, then I would use the
SDO_UTIL.GETVERTICES function. But I'm not using SDO_GEOMETRY/Oracle Locator, and there is no equivalent function in SDE.ST_GEOMETRY. The only SDE.ST_GEOMETRY functions I can find that pertain to vertices are ST_PointN and ST_NumPoints.

I've come up with a query that successfully does all this – gets the line vertices as rows (inspired by this page):

1    SELECT   a.ROAD_ID
2             ,b.NUMBERS VERTEX_INDEX
3             ,a.SDE.ST_X(SDE.ST_PointN(a.SHAPE, b.NUMBERS)) AS X
4             ,a.SDE.ST_Y(SDE.ST_PointN(a.SHAPE, b.NUMBERS)) AS Y
5    FROM     ENG.ROADS a
6             CROSS JOIN ENG.NUMBERS b
7    WHERE    b.NUMBERS <= SDE.ST_NumPoints(a.SHAPE)
8    --removed to do explain plan: ORDER BY ROAD_ID, b.NUMBERS

----------------------------------------------------------------------------------------------------
| Id  | Operation           | Name                 | Rows  | Bytes |TempSpc| Cost (%CPU)| Time     |
----------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT    |                      |  5996 |  1545K|       |   262   (1)| 00:00:01 |
|   1 |  MERGE JOIN         |                      |  5996 |  1545K|       |   262   (1)| 00:00:01 |
|   2 |   INDEX FULL SCAN   | R23715_SDE_ROWID_UK  |    30 |    90 |       |     1   (0)| 00:00:01 |
|*  3 |   SORT JOIN         |                      |  3997 |  1018K|  2392K|   261   (1)| 00:00:01 |
|   4 |    TABLE ACCESS FULL| ROAD                 |  3997 |  1018K|       |    34   (0)| 00:00:01 |
----------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
"   3 - access(""B"".""NUMBERS""<=""SDE"".""ST_NumPoints""(""A"".""SHAPE""))"
"       filter(""B"".""NUMBERS""<=""SDE"".""ST_NumPoints""(""A"".""SHAPE""))"

It CROSS JOINS the lines in the ROADS table to a NUMBERS table (and limits the results to the number of vertices in each line).

Stats: (updated)

  • Each line has a maximum of 30 vertices (average of 4.38 vertices per line)
  • ROADS has 3,997 lines
  • NUMBERS has 30 rows (sequential numbers starting at 1)
  • The result set has 17,536 rows

However, the performance is poor (40 seconds), and I can't help but think – is there a more elegant way to do this? To me, using a numbers table and a cross join seems like a sloppy approach. Is there a better way?

Layman's terms would be appreciated; I'm a Public Works guy, not a DBA.


Update #1:

If I remove lines 3 & 4 (string of X & Y related functions) from the query, it executes instantly. But of course, I can't just remove these lines, I need the X & Y columns. So this leads me to believe that the slow performance has something to do with the X & Y functions.

However, if I export the points to a static table, and then run the X & Y functions on it, this executes instantly as well.

So, does this mean that the slow performance is caused by the X & Y functions, except, well, no it isn't? I'm confused.


Update #2:

If I bring the X and Y out of the query, put them in an outer query, and add ROWNUM to the inner query, then it's much quicker (16 seconds – updated):

    SELECT
        ROWNUM
        ,ROAD_ID
        ,VERTEX_INDEX
        ,SDE.ST_X(ST_POINT) AS X
        ,SDE.ST_Y(ST_POINT) AS Y
    FROM
    (
        SELECT  
              ROWNUM
              ,a.ROAD_ID
              ,b.NUMBERS VERTEX_INDEX
              ,SDE.ST_PointN(a.SHAPE, b.NUMBERS) AS ST_POINT
        FROM  ENG.ROAD a
              CROSS JOIN ENG.NUMBERS b
        WHERE b.NUMBERS <= SDE.ST_NumPoints(a.SHAPE)
    )
    --removed to do explain plan: ORDER BY ROAD_ID, VERTEX_INDEX

-------------------------------------------------------------------------------------------------------
| Id  | Operation              | Name                 | Rows  | Bytes |TempSpc| Cost (%CPU)| Time     |
-------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT       |                      |  5996 |   322K|       |   262   (1)| 00:00:01 |
|   1 |  COUNT                 |                      |       |       |       |            |          |
|   2 |   VIEW                 |                      |  5996 |   322K|       |   262   (1)| 00:00:01 |
|   3 |    COUNT               |                      |       |       |       |            |          |
|   4 |     MERGE JOIN         |                      |  5996 |  1545K|       |   262   (1)| 00:00:01 |
|   5 |      INDEX FULL SCAN   | R23715_SDE_ROWID_UK  |    30 |    90 |       |     1   (0)| 00:00:01 |
|*  6 |      SORT JOIN         |                      |  3997 |  1018K|  2392K|   261   (1)| 00:00:01 |
|   7 |       TABLE ACCESS FULL| ROAD                 |  3997 |  1018K|       |    34   (0)| 00:00:01 |
-------------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
"   6 - access(""B"".""NUMBERS""<=""SDE"".""ST_NumPoints""(""A"".""SHAPE""))"
"       filter(""B"".""NUMBERS""<=""SDE"".""ST_NumPoints""(""A"".""SHAPE""))"

Justin Cave explains why ROWNUM helps performance here: Why does adding ROWNUM to a query improve performance?

While this performance improvement is good, it's not yet good enough. And I can't help but think that I still don't fully understand how the query works or why it is as slow as it is.

The question still stands: is there a better way?

Best Answer

I know a bit about Oracle performance and pretty much nothing about custom data types, but I'll try to give you a plan to improve performance.

1) Verify that you cannot get an explain plan.

It's possible to get explain plans even if you don't have sophisicated database software. What happens if you execute set autotrace on explain?

You could also try DBMS_XPLAN. First save off the plan by wrapping your query with a few extra key words:

explain plan for (SELECT... your query goes here); 

Then execute this:

SELECT PLAN_TABLE_OUTPUT FROM TABLE(DBMS_XPLAN.DISPLAY());

It's possible that neither of those will work and you truly cannot get an explain plan. I just wanted to verify that because with an explain plan it'll be much easier for the community to help you.

2) Consider requirements.

You said that 20 seconds isn't good enough. Have you or someone else defined exactly what is good enough? Is there any room for negotiation? Does your query need to be exactly one SELECT query? Could you populate a global temporary table in one step and select the results you wanted in the next? Could you create a stored procedure that returns a result set and call that?

3) Establish a lower bound for the time required to complete the query.

I suggest running a simple query that "cheats" to figure out what a well-optimized query would look like. For example, how long does this query that gets only the first vertices take?

SELECT
    ROWNUM
    ,ROAD_ID
    ,VERTEX_INDEX
    ,SDE.ST_X(ST_POINT) AS X
    ,SDE.ST_Y(ST_POINT) AS Y
FROM
(
    SELECT  
          ROWNUM
          ,a.ROAD_ID
          ,1 VERTEX_INDEX
          ,SDE.ST_PointN(a.SHAPE, 1) AS ST_POINT
    FROM  ENG.ROAD a
)
ORDER BY ROAD_ID, VERTEX_INDEX;

I suspect that will give you 4000 rows. If you multiply that query's response time by 17.5/4 that could give you a good lower bound for the total execution time.

If your lower bound for the total execution time is longer than what you established in step 2 then you either need to get creative with your data model by calculating results ahead of time and storing them in tables or you need to renegotiate the required response time.

4) Benchmark to figure out which functions are contributing the most to your execution time.

You were on the right track with Update #1 but you need to try to control for the amount of work being done. For example, is it possible to write a group of relatively simple queries that execute each function exactly 10000 times? How do the response times compare?

5) Go to work.

Depending on the requirements established in step 2 and what you found in step 4 try any trick that you can think of to reduce the query runtime. Are you able to pre-compute results and save off them? If the problem relates to the number of times the functions are executed then the undocumented materialize hint may be helpful. That forces Oracle to create a hidden temp table behind the scenes to store the results. I do not know if it is compatible with the special data types that you are using.

For example, maybe something like this performs better? Apologies if it does not compile but I have no way to test.

WITH ROAD_CTE (ROAD_ID, VERTEX_INDEX, SHAPE) AS
(
    SELECT /*+ materalize */
      a.ROAD_ID
    , b.NUMBERS VERTEX_INDEX
    , a.SHAPE
    FROM ENG.ROAD a
    CROSS JOIN ENG.NUMBERS b
    WHERE b.NUMBERS <= SDE.ST_NUMPOINTS(a.SHAPE)
)
, CTE_WITH_ST_POINT (ROAD_ID, VERTEX_INDEX, ST_POINT) AS
(
    SELECT /*+ materalize */
      rcte.ROAD_ID
    , rcte.VERTEX_INDEX
    , SDE.ST_PointN(rcte.SHAPE, rcte.VERTEX_INDEX) ST_POINT
    FROM ROAD_CTE rcte
)
SELECT 
      ROAD_ID
    , VERTEX_INDEX
    , SDE.ST_X(ST_POINT) AS X
    , SDE.ST_Y(ST_POINT) AS Y
FROM CTE_WITH_ST_POINT
ORDER BY ROAD_ID, VERTEX_INDEX;

If you're still stuck after all of this I suspect that it'll at least give you additional information that you can edit into the question. Good luck!