Oracle – Why Spatial Index Not Used in JOIN Queries?

indexoraclespatial

I have the following table:

CREATE TABLE MY_DATA (
  MY_DATA_ID NUMBER(38,0) PRIMARY KEY,
  GEOM SDO_GEOMETRY
);

I've set its SDO metadata and created a spatial index:

INSERT INTO USER_SDO_GEOM_METADATA (TABLE_NAME, COLUMN_NAME, SRID, DIMINFO)
VALUES ('MY_DATA', 'GEOM', 4326, MDSYS.SDO_DIM_ARRAY(MDSYS.SDO_DIM_ELEMENT('X',-180.0000,180.0000,0.00000000001),MDSYS.SDO_DIM_ELEMENT('Y',-90.0000,90,0.00000000001)));

CREATE INDEX SPIX_MY_DATA ON MY_DATA (GEOM) INDEXTYPE IS MDSYS.SPATIAL_INDEX;

The table has about 150,000 rows.

I need to find rows where the geometries intersect one specific row. This seems like a simple enough task. All it takes is a JOIN and a WHERE clause:

SELECT /*+ INDEX(MD2 SPIX_MY_DATA)*/
    MD2.MY_DATA_ID
FROM MY_DATA MD1
JOIN MY_DATA MD2 ON SDO_RELATE(MD1.GEOM, MD2.GEOM, 'mask=ANYINTERACT') = 'TRUE'
WHERE MD1.MY_DATA_ID = 143668 AND MD2.MY_DATA_ID != 143668

But this query is very slow. It takes over 85 seconds to fetch only 8 related rows. We can see from the query plan that it's not using the spatial index, in spite of the index hint:

Plan hash value: 4008712617

----------------------------------------------------------------------------------------------
| Id  | Operation                    | Name          | Rows  | Bytes | Cost (%CPU)| Time     |
----------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT             |               |   379 |  9854 |  1390   (1)| 00:00:17 |
|   1 |  NESTED LOOPS                |               |   379 |  9854 |  1390   (1)| 00:00:17 |
|   2 |   TABLE ACCESS BY INDEX ROWID| MY_DATA       |     1 |    13 |     2   (0)| 00:00:01 |
|*  3 |    INDEX UNIQUE SCAN         | SYS_C00755898 |     1 |       |     1   (0)| 00:00:01 |
|*  4 |   TABLE ACCESS FULL          | MY_DATA       |   379 |  4927 |  1388   (1)| 00:00:17 |
----------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   3 - access("MD1"."MY_DATA_ID"=143668)
   4 - filter("MD2"."MY_DATA_ID"<>143668 AND 
              "MDSYS"."SDO_RTREE_RELATE"("MD1"."GEOM","MD2"."GEOM",'mask=ANYINTERACT 
              querytype=window  ')='TRUE')

(Row 2 is MD1, and row 4 is MD2.)

I tried changing the hint to /*+ INDEX(MY_DATA SPIX_MY_DATA)*/, but this also had no effect. I've made sure my statistics are up to date.

How can I speed up this query? How can I get it to use the spatial index?

I'm currently using Oracle 11.2.

Best Answer

Behavior of SDO_RELATE and the query planner

SDO_RELATE will only use the spatial index of the first parameter. The documentation doesn't really say this explicitly, but it can be gleaned from some details there.

The documentation for the two geometry arguments say:

geometry1: Specifies a geometry column in a table. The column must be spatially indexed. Data type is SDO_GEOMETRY.

geometry2 Specifies either a geometry from a table or a transient instance of a geometry. (Specified using a bind variable or SDO_GEOMETRY constructor.) Data type is SDO_GEOMETRY.

(emphasis mine)

This suggests very strongly that SDO_RELATE is designed specifically to use the spatial index on its first argument. However, since you have a primary key filter on MD1 in your WHERE clause, the planner rightly concludes that using the primary key index will be faster that using the spatial index on MD1. It never considers reversing the arguments to the function, and since SDO_RELATE uses the index on the first argument instead of the second, it never considers actually using the spatial index.

SDO_FILTER behaves the same way, as far as I can tell.

The solution

The answer is as simple as it is unintuitive. Just switch the order of the arguments to SDO_RELATE:

SELECT
    MD2.MY_DATA_ID
FROM MY_DATA MD1
JOIN MY_DATA MD2 ON SDO_RELATE(MD2.GEOM, MD1.GEOM, 'mask=ANYINTERACT') = 'TRUE'
WHERE MD1.MY_DATA_ID = 143668 AND MD2.MY_DATA_ID != 143668

This causes it to use the spatial index now:

Plan hash value: 3780744499

----------------------------------------------------------------------------------------------
| Id  | Operation                    | Name          | Rows  | Bytes | Cost (%CPU)| Time     |
----------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT             |               |  1487 | 38662 |   534   (1)| 00:00:07 |
|   1 |  NESTED LOOPS                |               |  1487 | 38662 |   534   (1)| 00:00:07 |
|   2 |   TABLE ACCESS BY INDEX ROWID| MY_DATA       |     1 |    13 |     2   (0)| 00:00:01 |
|*  3 |    INDEX UNIQUE SCAN         | SYS_C00755898 |     1 |       |     1   (0)| 00:00:01 |
|*  4 |   TABLE ACCESS BY INDEX ROWID| MY_DATA       |  1487 | 19331 |   534   (1)| 00:00:07 |
|*  5 |    DOMAIN INDEX              | SPIX_MY_DATA  |       |       |     0   (0)| 00:00:01 |
----------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   3 - access("MD1"."MY_DATA_ID"=143668)
   4 - filter("MD2"."MY_DATA_ID"<>143668)
   5 - access("MDSYS"."SDO_RTREE_RELATE"("MD2"."GEOM","MD1"."GEOM",'mask=ANYINTERACT 
              querytype=window  ')='TRUE')

The query is now nearly instantaneous, running in about 50 milliseconds.

Ignore the ORDERED hint part in the docs for cases like this

The documentation goes on to say later:

geometry2 can come from a table or be a transient SDO_GEOMETRY object, such as a bind variable or SDO_GEOMETRY constructor.

  • If the geometry2 column is not spatially indexed, the operator indexes the query window in memory and performance is very good.

  • If two or more geometries from geometry2 are passed to the operator, the ORDERED optimizer hint must be specified, and the table in geometry2 must be specified first in the FROM clause.

From this, we might think we could also force the planner to use the index by putting MD2 first in the list and specifying the ORDERED hint:

SELECT /*+ ORDERED INDEX(MD2 SPIX_MY_DATA)*/
    MD2.MY_DATA_ID
FROM MY_DATA MD2
JOIN MY_DATA MD1 ON SDO_RELATE(MD1.GEOM, MD2.GEOM, 'mask=ANYINTERACT') = 'TRUE'
WHERE MD1.MY_DATA_ID = 143668 AND MD2.MY_DATA_ID != 143668

which gives us a different execution plan:

Plan hash value: 3307523706

--------------------------------------------------------------------------------------------------
| Id  | Operation                        | Name          | Rows  | Bytes | Cost (%CPU)| Time     |
--------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT                 |               |  1487 | 38662 | 14806   (1)| 00:02:58 |
|   1 |  NESTED LOOPS                    |               |  1487 | 38662 | 14806   (1)| 00:02:58 |
|*  2 |   TABLE ACCESS FULL              | MY_DATA       |   148K|  1887K|  4752   (1)| 00:00:58 |
|   3 |   BITMAP CONVERSION TO ROWIDS    |               |       |       |            |          |
|   4 |    BITMAP AND                    |               |       |       |            |          |
|   5 |     BITMAP CONVERSION FROM ROWIDS|               |       |       |            |          |
|   6 |      SORT ORDER BY               |               |       |       |            |          |
|*  7 |       DOMAIN INDEX               | SPIX_MY_DATA  |     1 |       |     0   (0)| 00:00:01 |
|   8 |     BITMAP CONVERSION FROM ROWIDS|               |       |       |            |          |
|*  9 |      INDEX RANGE SCAN            | SYS_C00755898 |     1 |       |     0   (0)| 00:00:01 |
--------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   2 - filter("MD2"."MY_DATA_ID"<>143668)
   7 - access("MDSYS"."SDO_RTREE_RELATE"("MD1"."GEOM","MD2"."GEOM",'mask=ANYINTERACT 
              querytype=window  ')='TRUE')
   9 - access("MD1"."MY_DATA_ID"=143668)

This does force the index to be used, but it's used in the wrong way. It's still doing a full table scan on MD2. Instead of filtering MD2 by the lone geometry from MD1's result set, it grabs almost all the rows in the table from MD2 (note the 148K count on row 2 of the plan), performs two searches on MD1 (one by the primary key and the other comparing all the rows in MD2 using the spatial index), and bitmap-ands the result. This actually makes performance worse, doing a completely unnecessary scan on the spatial index and extra work comparing that scan to the primary key one.

As a general rule, you want to put the table with the smallest set of geometries (after other filters) first in your FROM clause and let it use the spatial index on the second table.

Just specifying the ORDERED hint on the original query does not change the plan from the original.