Sql-server – SQL Server WHERE clause on CLR method (spatial) peformance

performancequery-performancesql server

I have a query that joins to a table containing 2 million rows and need to filter on its geometry column by its geometry type (using geom.STGeometryType()). The query takes less than a second to run without this WHERE clause but over 40 seconds to run with the WHERE clause.

The execution plan seems to indicate that the filter on STGeometryType() is being run over the geometry column of the whole locations table and there's an accompanying warning that there are no statistics for the geometry column.

If I first select into a temporary table without the filter and then filter on the temporary table, I get the results in under a second. This is because it is only testing for geometry type over a result set of 25000 records. So one solution would be to do this (though it would require re-factoring the application).

I've also tried using a CTE and derived table to do all of the joins 'first' and then filter but that doesn't improve performance.

Is there a way I can structure the query so that this relatively expensive filter is applied last without the use of intermediate tables?

   select o.code as Code, ro.ReportTaxonCode as TaxonCode, t1.TaxonName as TaxonName,
   case
       when o.NumberObserved IS NOT NULL THEN o.NumberObserved
       when o.NumberObserved IS NULL then su.Mean
   end as Scale,
   ro.OriginalTaxonCode as RecordedCode, t2.TaxonName as RecordedName, s.code as SetCode, CAST(s.StartDate AS nvarchar), CAST(s.EndDate AS nvarchar), l.code as Location, l.geom
   from obsveg.obs_observations o
   inner join obsveg.obs_sets s
   on o.obs_setsguid = s.guid
   inner join obsveg.obs_locations l
   on o.obs_locationsguid = l.guid
   inner join obsveg.obs_rpt_reportobservations ro
   on o.guid = ro.obs_observationsguid
   inner join obsveg.obs_taxon t1
   on ro.ReportTaxonCode = t1.code
   inner join obsveg.obs_taxon t2
   on ro.OriginalTaxonCode = t2.code
   left join obsveg.OBS_MeasurementMethodScaleUnits mmsu
   on o.OBS_MeasurementMethodScaleUnitsGUID = mmsu.GUID
   left join obsveg.obs_ScaleUnits su
   on mmsu.OBS_ScaleUnitsGUID = su.GUID
   where ro.reportguid = '64c0a4af-ee4d-4e83-a194-2a14e8a6ab0e'
   and l.geom.STGeometryType() = 'Point'

EDIT Warning in execution plan added
execution plan warning

EDIT Execution plan posted to pastebin
Execution plan

Best Answer

The first thing to look at is your indexing strategy. Bad execution plans are often caused by insufficient indexes or stale statistics. Your statistics warning might hint at that.

If that does not resolve your problem there are a few hacks that you can try:

A top operation requires SQL Server to separate query sections in the execution plan.

If you know you are always dealing with less then 2 billion rows you could write your query like this:

SELECT * FROM(
  SELECT TOP(2000000000) *
    FROM <complex join>
    WHERE ro.reportguid = '64c0a4af-ee4d-4e83-a194-2a14e8a6ab0e'
  )X
WHERE l.geom.STGeometryType() = 'Point'

An alternative is to write a scalar valued function that takes in the l.geom column and a few other columns from the other tables and returns the STGeometryType() value while ignoring all the other values. Because SQL Server does not consider the function logic at optimization time, it is forced to execute the function after the join. That does not guarantee that the other filter is executed first but often it works out that way.

The third option is to play around with join hints and join order. They often lead to a change in where filters are applied.

All three options I would consider a last resort because they make the code ugly and you run the risk that someone removes that ugliness later trying to make the code better.