SQL Server 2016 – Alternative to MakeValid() for Spatial Data

spatialsql serversql-server-2016

I have a very large table of geography LINESTRING data that I'm moving from Oracle to SQL Server. There are a number of evaluations that are executed against this data in Oracle, and they will need to be executed against the data in SQL Server, too.

The problem: SQL Server has tighter requirements for a valid LINESTRING than Oracle; "The LineString instance cannot overlap itself over an interval of two or more consecutive points". It just so happens that a percentage of our LINESTRINGs do not meet that criterion, which means that the functions we need to evaluate the data fail. I need to adjust the data so that it can be successfully validated in SQL Server.

For example:

Validating a very simple LINESTRING that doubles back on itself:

select geography::STGeomFromText(
    'LINESTRING (0 0 1, 0 1 2, 0 -1 3)',4326).IsValidDetailed()
24413: Not valid because of two overlapping edges in curve (1).

Executing the MakeValid function against it:

select geography::STGeomFromText(
    'LINESTRING (0 0 1, 0 1 2, 0 -1 3)',4326).MakeValid().STAsText()
LINESTRING (0 -0.999999999999867, 0 0, 0 0.999999999999867)

Unfortunately the MakeValid function changes the order of the points and removes the third dimension, which makes it unusable for us. I'm looking for another approach that solves this problem without reordering or removing the 3rd dimension.

Any ideas?

My actual data contains hundreds/thousands of points.

Best Answer

Let me caveat that I am playing with spatial data in SQL server for the first time (so you probably already know this first part), but it took me a while to figure out that SQL Server isn't treating (x y z) coordinates as true 3D values, it is treating them as (latitude longitude) with an optional "elevation" value, Z, which is ignored by validation and other functions.

Evidence:

select geography::STGeomFromText('LINESTRING (0 0 1, 0 1 2, 0 -1 3)', 4326)
    .IsValidDetailed()

24413: Not valid because of two overlapping edges in curve (1).

Your first example seemed weird to me because (0 0 1), (0 1 2), and (0 -1 3) are not collinear in 3D space (I'm a mathematician, so I was thinking in those terms). IsValidDetailed (and MakeValid) is treating these as (0 0), (0 1), and (0, -1), which does make an overlapping line.

To prove it, just swap the X and Z, and it validates:

select geography::STGeomFromText('LINESTRING (1 0 0, 2 1 0, 3 -1 0)', 4326)
    .IsValidDetailed()

24400: Valid

This actually makes sense if we think of these as regions or paths traced on the surface of our globe, instead of points in mathematical 3D space.


The second part of your issue is that Z (and M) point values are not preserved by SQL through functions:

Z-coordinates are not used in any calculations made by the library and are not carried through any library calculations.

This is unfortunately by design. This was reported to Microsoft in 2010, the request was closed as "Won't Fix". You might find that discussion relevant, their reasoning is:

Assigning Z and M is ambiguous, because MakeValid splits and merges spatial elements. Points often gets created, removed or moved during this process. Therefore MakeValid (and other constructions) drops Z and M values.

For example:

DECLARE @a geometry = geometry::Parse('POINT(0 0 2 2)');
DECLARE @b geometry = geometry::Parse('POINT(0 0 1 1)');
SELECT @a.STUnion(@b).AsTextZM()

Values Z and M are ambiguous for point (0 0). We decided to drop Z and M completely instead of returning half-correct result.

You can assign them later on if you know exactly how. Alternatively you can change the way you generate your objects to be valid on input, or keep two versions of your objects, one that is valid and another one that preserves all your features. If you explain your scenario better and what you do with the objects maybe we could be able to give you additional workarounds.

In addition, as you've already seen, MakeValid can also do other unexpected things, like change the order of points, return a MULTILINESTRING, or even return a POINT object.


One idea I came across was to store them as a MULTIPOINT object instead:

The problem is when your linestring actually retraces a continuous section of line between two points that was previously traced by the line. By definition, if you're retracing existing points, then the linestring is no longer the simplest geometry that can represent this pointset, and MakeValid() will give you a multilinestring instead (and lose your Z/M values).

Unfortunately, if you're working with GPS data or similar then it's quite likely that you might have retraced your path at some point in the route, so linestrings are not always that useful in these scenarios :( Arguably, such data should be stored as a multipoint anyway since your data represents the discrete location of an object sampled at regular points in time.

In your case it validates just fine:

select geometry::STGeomFromText('MULTIPOINT (0 0 1, 0 1 2, 0 -1 3)',4326)
    .IsValidDetailed()

24400: Valid

If you absolutely need to maintain these as LINESTRINGS, then you will have to write your own version of MakeValid that slightly adjusts some of the source X or Y points by some tiny value, while still preserving Z (and doesn't do other crazy things like convert it into other object types).

I'm still working on some code, but give a look at some of the starting ideas here:


EDIT Ok, a few things I found while testing:

  • If the geometry object is invalid, you just can't do much with it. You can't read the STGeometryType, you can't get the STNumPoints or use STPointN to iterate through them. If you can't use MakeValid, you're basically stuck with operating on the text representation of the geographic object.
  • Using STAsText() will return the text representation of even an invalid object, but doesn't return Z or M values. Instead, we want AsTextZM() or ToString().
  • You can't create a function that calls RAND() (functions need to be deterministic), so I just made it nudge by successively larger and larger values. I really have no idea what the precision of your data is, or how tolerant it is of small changes, so use or modify this function at your own discretion.

I have no idea if there are possible inputs that will cause this loop to go on forever. You have been warned.

CREATE FUNCTION dbo.FixBadLineString (@input geography) RETURNS geography
AS BEGIN
DECLARE @output geography

IF @input.STIsValid() = 1   --send valid objects back as-is
  SET @output = @input;
ELSE IF LEFT(@input.IsValidDetailed(),6) = '24413:'
--"Not valid because of two overlapping edges in curve"
BEGIN
  --make a new MultiPoint object from the LineString text
  DECLARE @mp geography = geography::STGeomFromText(
      REPLACE(@input.AsTextZM(), 'LINESTRING', 'MULTIPOINT'), 4326);
  DECLARE @newText nvarchar(max); --to build output
  DECLARE @point int 
  DECLARE @tinynum float = 0;

  SET @output = @input;
  --keep going until it validates
  WHILE @output.STIsValid() = 0
  BEGIN
    SET @newText = 'LINESTRING (';
    SET @point = 1
    SET @tinynum = @tinynum + 0.00000001

    --Loop through the points, add a bit and append to the new string
    WHILE @point <= @mp.STNumPoints()
    BEGIN
      SET @newText = @newText + convert(varchar(50),
               @mp.STPointN(@point).Long + @tinynum) + ' ';
      SET @newText = @newText + convert(varchar(50),
               @mp.STPointN(@point).Lat - @tinynum) + ' ';
      SET @newText = @newText + convert(varchar(50), 
               @mp.STPointN(@point).Z) + ', ';
      SET @tinynum = @tinynum * -2
      SET @point = @point + 1
    END

    --close the parens and make the new LineString object
    SET @newText = LEFT(@newText, LEN(@newText) - 1) + ')'
    SET @output = geography::STGeomFromText(@newText, 4326);
  END; --this will loop if it is still invalid
  RETURN @output;
END;
--Any other unhandled error, just send back NULL
ELSE SET @output = NULL;

RETURN @output;
END

Instead of parsing the string, I chose to create a new MultiPoint object using the same set of points, so I could iterate through them and nudge them, then reassemble a new LineString. Here's some code to test it, 3 of these values (including your sample) start invalid but are fixed:

declare @geostuff table (baddata geography)

INSERT INTO @geostuff (baddata)
          SELECT geography::STGeomFromText('LINESTRING (0 0 1, 0 1 2, 0 -1 3)',4326)
UNION ALL SELECT geography::STGeomFromText('LINESTRING (0 2 0, 0 1 0.5, 0 -1 -14)',4326)
UNION ALL SELECT geography::STGeomFromText('LINESTRING (0 0 4, 1 1 40, -1 -1 23)',4326)
UNION ALL SELECT geography::STGeomFromText('LINESTRING (1 1 9, 0 1 -.5, 0 -1 3)',4326)
UNION ALL SELECT geography::STGeomFromText('LINESTRING (6 6 26.5, 4 4 42, 12 12 86)',4326)
UNION ALL SELECT geography::STGeomFromText('LINESTRING (0 0 2, -4 4 -2, 4 -4 0)',4326)

SELECT baddata.AsTextZM() as before, baddata.IsValidDetailed() as pretest,
 dbo.FixBadLineString(baddata).AsTextZM() as after,
 dbo.FixBadLineString(baddata).IsValidDetailed() as posttest 
FROM @geostuff