Sql-server – SQL Server 2008 – Return records in chronological order, given a separation factor

indexsql server

This question was originally posted here https://stackoverflow.com/questions/8492149/return-records-in-chronological-order-given-a-separation-factor and re-posted on this forum given a user's advice.

I have a massive table in SQL Server 2008, it contains the position reported by technicians every minute. I need to report on this table but in order to control the amount of records that are displayed in the report both a time and distance separation factors need to be taken into account.

So, a query may look like : "Return all records with no less than 5 minutes and/or 300 feet between them".

The time part is done, but I'm having a hard time with the distance factor. I have the latitude and longitude for each point, and I have no problem if I need to include a SQL Server 2008 spatial UDT in order to resolve the problem.

Things I have considered:

  • Bring the records by the time factor, and apply the separation constrain in the client by calculating the distance between adjacent points and discard those which falls inside the the factor. (the easiest, but it must be the one consuming more resources).

  • Keep the last record per technician in a cache, pre-calculate the distance between the record and its predecessor, and resolve the constrain in the client. (should consume less resources than 1) since the distance is pre-calculated, however and since the table is BIG It will increase the size of the dataset, not sure if the space is worth the processing savings).

  • Use the spatial functions in SQL Server 2008, but honestly I had been reading and I couldn't find anything that helps me resolve this type of requirement. Any GIS expert??

I would like to go with the best option possible (maybe not listed above?) and IMO should be the one using the SQL Server features most efficiently.

Best Answer

Prior to SQL 2008, the most common solution was to use a UDF to calculate the great-circle distance between two points on a sphere. The Haversine formula is probably the most commonly used method.

Of course the Earth is not actually a perfect sphere, but this was considered "good enough" for most uses.

In SQL 2008, as you anticipated, such calculations are simplified and made more accurate by the introduction of the Geography and Geometry data types. Here's a brief sample of how you can use them to simplify distance calculations.

DECLARE @locations TABLE(locname VARCHAR(100), coord geography)
DECLARE @loc1 geography
DECLARE @loc2 geography


INSERT INTO @locations 
VALUES('HOME', geography::Point(-81.810194, 41.478156, 4326))   --Note: Lat, Long, SRID
                                                                --The 4326 is the SRID (spatial reference id) used by SQL as 
                                                                --a reference to the WGS 84 Standard. This is the same reference
                                                                --used by the GPS system
INSERT INTO @locations 
VALUES('WORK', geography::Point(-81.687771, 41.498227, 4326))

SELECT * FROM @locations

SELECT @loc1 = coord FROM @locations WHERE locname = 'HOME'
SELECT @loc2 = coord FROM @locations WHERE locname = 'WORK'

SELECT @loc1.STDistance(@loc2) * 3.2808399  --STDistance is in meters so we multiply to convert to feet 

SELECT @loc1.STIntersects(@loc2.STBuffer(300 / 3.2808399)) as isWithin300Ft  --This formula returns True when the point @loc1 intersects with the 300ft buffer zone around @loc2

The SRID is the key to the improved accuracy. The WGS 84 specification to which it refers includes a standardized coordinate system and a reference ellipsoid. In other words, it accounts for the non-spherical nature of the Earth, giving better results than a pure spherical Great Circle calculation.

If GIS accuracy is important to your work, this is the simplest way to implement it in SQL 2008.