SQL Server Geometry – Poor Cardinality Estimate on Intersection

cardinality-estimatesspatialsql serversql-server-2012

I'm intersecting a point with a set of polygons. The query is indexed, and the polygons do not overlap but the query plan seems to think that instead of 1 row, i'll return 18k rows, and this results in a bad query plan.

In particular the right most nodes of the query plan seem to think that the STPointFromText function will return a cardinality of 1000, and that the intersection of this point set with the geometry index returns 30% of the 54k rows.
(ran 1 million points through the table without finding a counter-example that actually returned more than 1 row)

The result isn't horrible in this abbreviated query, but when i join the output of this onto anything else, the high cardinality estimate forces the upstream table to be a tablescan+hashmap, even though the overall query returns 1 row. This extended query is running a few times per second, so i'm wondering how i can optimise this.

The spatial index is HHHH, for a highest resolution (over the approx 4000km longest side of the domain) of approx 80x50m, there are 56k polygons in the index, with expected minimum size of ~100m.

Estimated vs Actual Cardinality
Note the difference between the est rows and the actual.

Estimated query plan.

Best Answer

Sounds like you want to get rowgoals to play their part on the query - so try using TOP(1), maybe with testing to avoid NULLs (in case of non-matching SRIDs). That way you can get the "nearest neighbour" functionality to kick in. I know you're using Contains, but you want to use a method that tells the QO that you're only going to get a single row back.

http://blogs.lobsterpot.com.au/2014/08/14/sql-spatial-getting-nearest-calculations-working-properly/ might have a few tips...

Related Solutions

Sql-server – Does WITH SCHEMABINDING on a multi-statement TVF improve cardinality estimates

In my tests, no, adding WITH SCHEMABINDING does not improve cardinality estimates. I created a simple table:

CREATE TABLE dbo.myobjects(id INT PRIMARY KEY);

INSERT dbo.myobjects SELECT [object_id] FROM sys.all_objects;

Then two functions:

CREATE FUNCTION dbo.noschemabinding(@UserID INT)
RETURNS @x TABLE (id INT)
AS
BEGIN
  INSERT @x SELECT id FROM dbo.myobjects;

  RETURN;
END
GO

CREATE FUNCTION dbo.withschemabinding(@UserID INT)
RETURNS @x TABLE (id INT)
WITH SCHEMABINDING
AS
BEGIN
  INSERT @x SELECT id FROM dbo.myobjects;

  RETURN;
END
GO

Comparing the actual plans, both show estimated rows = 1, actual rows = 2112 (this latter number may differ on your system depending on version/SP etc).

Comparing the speed:

SET NOCOUNT ON;
GO
SELECT SYSDATETIME();
GO
SELECT id INTO #x FROM dbo.noschemabinding(1);
DROP TABLE #x;
GO 1000
GO
SELECT SYSDATETIME();
GO
SELECT id INTO #x FROM dbo.withschemabinding(1);
DROP TABLE #x;
GO 1000
SELECT SYSDATETIME();

Results:

                    run 1               run 2
----------------    ------------------  ------------------
No schemabinding    14632 milliseconds  14079 milliseconds
Schemabinding       14251 milliseconds  13979 milliseconds

So, does it matter much? Nope.

SCHEMABINDING in this case is used for a more important goal: underlying schema stability. You will probably have much better optimization opportunities if you pursue converting your function to an inline TVF than to chase down obscure plan-affecting differences in a multi-statement TVF.

Sql-server – Improving the performance of STIntersects

Firstly, check whether a spatial index is being used by looking at the query execution plan and see if there is a Clustered Index Seek (Spatial) item.

Assuming it is being used, you could try adding a secondary/simplified filter based on a bounding box with simplified polygons to check for first. Matches against these simplified polygons could then be run through the primary filter to get the final results.

1) Add a new geography and geometry column to the [dbo].[T_POLYGON] table:

ALTER TABLE [dbo].[T_POLYGON] ADD SimplePolysGeom geometry;
ALTER TABLE [dbo].[T_POLYGON] ADD SimplePolysGeog geography;

2) Create the bounding box polygons (this involves an initial conversion to geometry to take advantage of STEnvelope()):

UPDATE [dbo].[T_POLYGON] SET SimplePolysGeom = geometry::STGeomFromWKB(
    COORD.STAsBinary(), COORD.STSrid).STEnvelope();

UPDATE [dbo].[T_POLYGON] SET SimplePolysGeog = geography::STGeomFromWKB(
    SimplePolysGeom.STAsBinary(), SimplePolysGeom.STSrid);

3) Create a spatial index on the simplified geography column

4) Get the intersections against this simplified geography column, then filter again on the matching geography data types. Roughly, something like this:

;WITH cte AS
(
   SELECT pinID, polygonID FROM T_PIN INNER JOIN T_POLYGON
    ON T_PIN.Coord.STIntersects(T_POLYGON.SimplePolysGeog ) = 1
)
SELECT COUNT(*)
FROM T_PIN 
INNER JOIN T_POLYGON
    ON T_PIN.Coord.STIntersects(T_POLYGON.COORD) = 1
    AND T_PIN.pinID IN (SELECT pinID FROM cte)
    AND T_POLYGON.polygonID IN (SELECT polygonID FROM cte)

EDIT: you can replace (1) and (2) with this computed, persisted column. credit to Paul White for the suggestion.

ALTER TABLE [dbo].[T_POLYGON] ADD SimplePolysGeog AS  ([geography]::STGeomFromWKB([geometry]::STGeomFromWKB([COORD].[STAsBinary](),[COORD].[STSrid]).STEnvelope().STAsBinary(),(4326))) PERSISTED

Best Answer

Related Solutions

Sql-server – Does WITH SCHEMABINDING on a multi-statement TVF improve cardinality estimates

Sql-server – Improving the performance of STIntersects

Related Question