PostGIS Performance – Comparing && vs ST_Intersects

postgispostgresqlspatial

I am trying to figure out points reside inside an rectangular area (envelope). I am having a little bit difficulty understanding performance implications of using && operator compared to ST_Intersect.

However I think I explained my own question while trying to formulate my question. I submit it anyway in case it would be useful to somebody.

The manual for && says (ironically the page name is geometry_overlaps.html):

&& — Returns TRUE if A's 2D bounding box intersects B's 2D boundingbox.`

The manual for ST_Intersects says:

Returns TRUE if the Geometries/Geography "spatially intersect in 2D" – (share any portion of space) and FALSE if they don't (they are Disjoint). For geography — tolerance is 0.00001 meters (so any points that close are considered to intersect)

ST_Intersects will return true in such cases. For my objective && operator does almost same thing since I am using a rectangular bounding box in either case. I am wondering if && is the fastest operator for my purpose? It feels like && has to do less checks so it must be much more efficient.

Here is an example which is direct copy from &&, ST_Intersects adaptation:

SELECT
        t1.id AS t1,
        t2.id AS t2,
        t1.ln && t2.ln AS "&&",
        ST_Intersects(t1.ln,t2.ln)
FROM ( VALUES
        (1, 'LINESTRING(0 0, 3 3)'::geometry),
        (2, 'LINESTRING(0 1, 0 5)'::geometry)
) AS t1(id,ln)
CROSS JOIN (VALUES
        (3, 'LINESTRING(1 2, 4 6)'::geometry)
) AS t2(id,ln);

 t1 | t2 | && | st_intersects 
----+----+----+---------------
  1 |  3 | t  | f
  2 |  3 | f  | f
(2 rows)

Below is a simple graph of how these lines should look like. I just added plotted this to see how bounding box works exactly for lines. For my purposes, the points are always inside the box (while this line example does not really reflect what I am trying to do but good for illustration)

Question 1 is if && is significantly faster than ST_Intersects when using with ST_MakeEnvelope (a rectangular boundary), when finding points inside a rectangular bounding box.

Question 2 is Also am I understanding correctly that when checking points inside a rectangular boundary && does exactly same thing as ST_Intersects?

Best Answer

Background, functionality and performance

`&&` opperator

&& is bounding-box overlaps. All operators call functions in PostgreSQL: you can see this \doS+ && in this case && literally calls the PostGIS function geometry_overlaps. The only catch here is that && will make use of an index, from the docs

In general, you will want to use the "intersects operator" (&&) which tests whether the bounding boxes of features intersect. The reason the && operator is useful is because if a spatial index is available to speed up the test, the && operator will make use of this. This can make queries much much faster.

You can see in the definition of geometry_overlaps that it calls an internal C function gserialized_overlaps_2d. The function gserialized_overlaps_2d uses 4 comparisons to determine whether or not there is an overlap in the bounding box. That's not usually all that useful except for adding selectivity, so you don't normally want it.

That means this isn't a performance question, && just doesn't do much. However what && does do can make use of a GIST index.

`ST_Intersects`

ST_Intersects from the docs,

This function call will automatically include a bounding box comparison that will make use of any indexes that are available on the geometries.

The reason why is simple, only the bounding box uses the index. That means it'll do a && AND someting else. And you can see that with \dfS+ st_intersects

SELECT $1 && $2 AND _ST_Intersects($1,$2);

So the extra bit it does is call either geos_intersects or sfcgal_intersects intersects depending on your chosen back end. in the best case, that you get geos_intersects, you can see what that does here.

In essence, it is telling you if any point intersects without making any assumptions (other than floating point math).

Mixed SRIDs

As a last note, it maybe be worth noting that these two operations handle mixed SRIDs differnetly.

Your questions

Question 1 is if && is significantly faster than ST_Intersects when using with ST_MakeEnvelope (a rectangular boundary), when finding points inside a rectangular bounding box.

Yes, it's faster -- significantly. It does less. It doesn't find "points" inside a rectangular bounding box unless one side is a simple point. Other than that, it finds bounding-box overlaps which are subject to false positives if all points reside outside of the bounding box.

Question 2 is Also am I understanding correctly that when checking points inside a rectangular boundary && does exactly same thing as ST_Intersects?

No. It should be clear why now.

Related Solutions

MongoDB geospatial query with sort – performance issues

A query in MongoDB can only use one index at a time, so it's a case of one or the other - it can't use the 2d index first, then do a sort on the _id index. In order to use indexes for both the selection and the sort, you would need a compound index like this:

db.markers.ensureIndex( { latlng : "2d" , _id : 1 } );

Try that, or similar and see how it impacts the results bearing in mind that once you define it, you can remove the original 2d index to save space and that this new index will have to be loaded into memory to be efficient.

Update: as mentioned in the summary, the above did not improve things, and the resulting query results in a scanAndOrder result. This also happens with range based queries, as explained in this excellent blog post:

http://blog.mongolab.com/2012/06/cardinal-ins/

As explained in that post, the usual resolution for range based query performance is to switch the order of the indexes. However this is currently not possible with geo indexes. There is a Jira issue already open for this here for voting and tracking purposes:

https://jira.mongodb.org/browse/SERVER-4247

Sql-server – sql server spatial index performance

That your indexing is slowing some queries but not others has a lot to do with the way spatial indexes work:

The spatial index is broken into 4 levels which are broken into grids. With each level you will be able to specify a density which of low, medium or high corresponding to 16, 64 or 256 cells respectively. Then within each cell these are then subdivided into the next lower level with its setting.

All of this means that depending on your settings and your data distribution you may need to adjust the density at different levels to get proper index performance. This may also mean that you need to balance your performance of your main queries of this data against your edge cases.

Take a look at this Microsoft TechNet library article for a more detailed explanation:

Spatial Indexing Overview