Sql-server – How to setup solution for multi-dimensional distance search with SQL Server spatial data types

geometrynearest neighborspatialsql server

I have an existing database solution that is not performing very well. I am searching on multi-dimensional data with a traditional table structure using floats, etc. The database contains ~ 1-2M rows.

After some searching I came across the SQL Server spatial data types where you can define a geometry based on Point or MultiPoint where you can search with a distance. I think this might be a solution but I need some help to get started in the right direction.

My current table structure is as follows where each row contains 2 (eventually more) XYZ values:

    | ID | X1  | Y1  | Z1  | X2  | Y2  | Z2  |
    | 1  | 1.1 | 2.2 | 5.1 | 1.2 | 2.1 | 4.1 |
    | 2  | 3.2 | 5.1 | 4.1 | 3.2 | 3.1 | 3.1 |
    | 3  | 4.1 | 2.3 | 6.3 | 4.2 | 4.1 | 2.1 |
    | 4  | 2.4 | 3.5 | 2.1 | 3.2 | 2.1 | 4.1 |

Im performing search with a tolerance for each X, Y and Z value.

So for example:

    (pseudocode)
    declare targetXYZ1 = (4.1, 2.2, 3.1);
    declare targetXYZ2 = (2.8, 2.2, 4.2);

    declare toleranceXYZ = (2,2,2)

    Select ID from MyXYZTable tb
    Where 
           Math.Abs(tb.X1 - targetXYZ1.X1) < toleranceXYZ.X 
       AND Math.Abs(tb.Y1 - targetXYZ1.Y1) < toleranceXYZ.Y
       AND Math.Abs(tb.Z1 - targetXYZ1.Z1) < toleranceXYZ.Z

       AND Math.Abs(tb.X2 - targetXYZ1.X2) < toleranceXYZ.X 
       AND Math.Abs(tb.Y2 - targetXYZ1.Y2) < toleranceXYZ.Y
       AND Math.Abs(tb.Z2 - targetXYZ1.Z2) < toleranceXYZ.Z

After some analysis I would probably need to make a table with 1 or 2 geometry columns for xyz1 and xyz2 with geometry::Point(0, 0, 0)

My questions:

  1. Is there a solution possible with SQL Server spatial types?
  2. Do I need to use nearest neighbor search or can I use STDistance
  3. Do I need to setup a spatial index for the table?

Any advice or tips and tricks are very welcome!

Best Answer

There might be some spatial methods that are useful to some degree, but your biggest problem is going to be the Z value:

Z-coordinates are not used in any calculations made by the library and are not carried through any library calculations.

This is by design. Don't think of SQL spatial objects as "true" 3D-geometric objects, think of them as map coordinates with an "elevation tag" that isn't factored into any calculations. So the distance function STDistance() works great for "map distances", but not true 3D distances.

You can define the Z value (and the "M" value) of a point, but it isn't used by any calculations.

To demonstrate:

DECLARE @a geometry = geometry::Parse('POINT(0 0 0 2)');
DECLARE @b geometry = geometry::Parse('POINT(1 0 5 1)');
SELECT @a.Z, @b.Z, @a.STDistance(@b)

will return the values

0   5   1

Clearly the distance between these two points should be more than 1, if it used the Z properly. So, they may not work like you hope.

But if you're ok with ignoring the Z value, then performing these calculations will be pretty straight-forward. You could read these values from your existing table like so:

SELECT geometry::Point(X1,Y1,0).STDistance(geometry::Point(X2,Y2,0))
From myTable;  

(That 0 in this use is not a Z value, it's an SRID.)

or, if you take the time to change how they're actually stored:

CREATE TABLE myPoints (ID INT, P1 geometry, P2 geometry)

INSERT myPoints 
SELECT 1, geometry::Point(X1, Y1, 0), geometry::Point(X2, Y2, 0)
FROM oldTable

SELECT ID, P1.STDistance(P2)
FROM myPoints

There is some related discussion in one of my older answers here.

EDIT: Some further thoughts:

  1. Regarding performance, I have absolutely no clue how geometry objects will compare, performance wise, to you doing the math manually. I imagine converting to Geometry objects on the fly will have a certain degree of overhead, but perhaps converting how your raw data is stored will do better. Testing will be key here.
  2. The Z parameter might not be used by the STDistance() function, but clearly you can manually query Z, and use it to do that part of the math yourself, just use Pythagoras: True3D_Distance^2 = STDistance^2 + (a.Z-b.Z)^2
  3. But note that if your goal is to simply rank the points by closeness, and you don't need to calculate the actual distance, you can simplify your calculation. I'm thinking something like WHERE PointA.STDistance(PointB) + ABS(PointA.Z-PointB.Z) < tolerance or perhaps WHERE PointA.STDistance(PointB) < tolerance AND ABS(PointA.Z-PointB.Z) < tolerance