I have the following table:
CREATE TABLE MY_DATA (
MY_DATA_ID NUMBER(38,0) PRIMARY KEY,
GEOM SDO_GEOMETRY
);
I've set its SDO metadata and created a spatial index:
INSERT INTO USER_SDO_GEOM_METADATA (TABLE_NAME, COLUMN_NAME, SRID, DIMINFO)
VALUES ('MY_DATA', 'GEOM', 4326, MDSYS.SDO_DIM_ARRAY(MDSYS.SDO_DIM_ELEMENT('X',-180.0000,180.0000,0.00000000001),MDSYS.SDO_DIM_ELEMENT('Y',-90.0000,90,0.00000000001)));
CREATE INDEX SPIX_MY_DATA ON MY_DATA (GEOM) INDEXTYPE IS MDSYS.SPATIAL_INDEX;
The table has about 150,000 rows.
I need to find rows where the geometries intersect one specific row. This seems like a simple enough task. All it takes is a JOIN
and a WHERE
clause:
SELECT /*+ INDEX(MD2 SPIX_MY_DATA)*/
MD2.MY_DATA_ID
FROM MY_DATA MD1
JOIN MY_DATA MD2 ON SDO_RELATE(MD1.GEOM, MD2.GEOM, 'mask=ANYINTERACT') = 'TRUE'
WHERE MD1.MY_DATA_ID = 143668 AND MD2.MY_DATA_ID != 143668
But this query is very slow. It takes over 85 seconds to fetch only 8 related rows. We can see from the query plan that it's not using the spatial index, in spite of the index hint:
Plan hash value: 4008712617
----------------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
----------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 379 | 9854 | 1390 (1)| 00:00:17 |
| 1 | NESTED LOOPS | | 379 | 9854 | 1390 (1)| 00:00:17 |
| 2 | TABLE ACCESS BY INDEX ROWID| MY_DATA | 1 | 13 | 2 (0)| 00:00:01 |
|* 3 | INDEX UNIQUE SCAN | SYS_C00755898 | 1 | | 1 (0)| 00:00:01 |
|* 4 | TABLE ACCESS FULL | MY_DATA | 379 | 4927 | 1388 (1)| 00:00:17 |
----------------------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
3 - access("MD1"."MY_DATA_ID"=143668)
4 - filter("MD2"."MY_DATA_ID"<>143668 AND
"MDSYS"."SDO_RTREE_RELATE"("MD1"."GEOM","MD2"."GEOM",'mask=ANYINTERACT
querytype=window ')='TRUE')
(Row 2 is MD1, and row 4 is MD2.)
I tried changing the hint to /*+ INDEX(MY_DATA SPIX_MY_DATA)*/
, but this also had no effect. I've made sure my statistics are up to date.
How can I speed up this query? How can I get it to use the spatial index?
I'm currently using Oracle 11.2.
Best Answer
Behavior of
SDO_RELATE
and the query plannerSDO_RELATE
will only use the spatial index of the first parameter. The documentation doesn't really say this explicitly, but it can be gleaned from some details there.The documentation for the two geometry arguments say:
(emphasis mine)
This suggests very strongly that
SDO_RELATE
is designed specifically to use the spatial index on its first argument. However, since you have a primary key filter onMD1
in yourWHERE
clause, the planner rightly concludes that using the primary key index will be faster that using the spatial index onMD1
. It never considers reversing the arguments to the function, and sinceSDO_RELATE
uses the index on the first argument instead of the second, it never considers actually using the spatial index.SDO_FILTER
behaves the same way, as far as I can tell.The solution
The answer is as simple as it is unintuitive. Just switch the order of the arguments to
SDO_RELATE
:This causes it to use the spatial index now:
The query is now nearly instantaneous, running in about 50 milliseconds.
Ignore the
ORDERED
hint part in the docs for cases like thisThe documentation goes on to say later:
From this, we might think we could also force the planner to use the index by putting
MD2
first in the list and specifying theORDERED
hint:which gives us a different execution plan:
This does force the index to be used, but it's used in the wrong way. It's still doing a full table scan on
MD2
. Instead of filteringMD2
by the lone geometry fromMD1
's result set, it grabs almost all the rows in the table fromMD2
(note the 148K count on row 2 of the plan), performs two searches onMD1
(one by the primary key and the other comparing all the rows inMD2
using the spatial index), and bitmap-ands the result. This actually makes performance worse, doing a completely unnecessary scan on the spatial index and extra work comparing that scan to the primary key one.As a general rule, you want to put the table with the smallest set of geometries (after other filters) first in your
FROM
clause and let it use the spatial index on the second table.Just specifying the
ORDERED
hint on the original query does not change the plan from the original.