Sql-server – How to compare large stored procedures

procedure-definitionschemasql server

I want to compare stored procedures that should be identical in several different databases, by retrieving and comparing the definitions from sys.sql_modules.

Currently I'm thinking about procedures which are "identical" except that one or other copy has blank lines or other white space at the start or end. These seem to be added by making a script of a procedure. I'd like to remove the blank lines and such and compare what's left. So I could do:

IF ( LEFT(@definition1, 2) = NCHAR(13) + NCHAR(10) )
    SET @definition1 = SUBSTRING(@definition1, 3, 1048576)

That will work if the length of the procedure is less than 1048576 bytes. But what is the actual limit on length – if any?

I suppose actually I should ask: what is a good way to compare procedures? (Apart from "Buy the SQL Compare utility from Red Gate.")

Updated

(revised)
I just realised that sys.sql_modules.definition has data type nvarchar(max) and that has a specified length… in bytes, POWER(2, 31)-1, except that that formula doesn't work 😉 In nchar characters, no more than POWER(2, 30)-1.

So this ought to do it:

IF ( LEFT(@definition1, 2) = NCHAR(13) + NCHAR(10) )
    SET @definition1 = SUBSTRING(@definition1, 3, 2147483647)

Yes, that is 2^31-1, it ought to be 2^30-1, but as long as it's bigger than @definition1 is or can be, I get away with it. Until they increase "max" to 2^63-1 bytes. Before there was "max", I used 8000.

And maybe I should include double-checking the DATALENGTH() in case it accidentally gets truncated to the first 8000 bytes somewhere.

As for your wise words about good code management to not have this need in the first place – useful ammunition, thank you (now to extract the bullets from my own hide). I recognise the value of methodical working practices, even when they have to apply to me, as well. But a problem is management who let's just say hypothetically just want a report or a result produced -now-, and who have been getting and expecting service like that for many years – and now they've learned to call it "Agile".

Best Answer

This question exists on two levels:

  1. Comparing between two databases:

    In this context you do not need to worry about length (outside of perhaps using DATALENGTH as a quick indicator of there being a difference and hence no need to compare the actual string values). Here you just need to make sure that the definitions are identical, and that does not require SUBSTRING, but it does require adding COLLATE Latin1_General_100_BIN2 to the comparison to ensure that you catch differences in case, etc that otherwise evaluate as being equal (using an everything-sensitive Collation, such as anything ending with _CS_AS_WS_KS[_SC], can't catch differences when equality is determined by combining characters and/or linguistic rules).

    Within a particular environment, an object should be identical across all databases where it is deployed to. There can't be any differences due to scripting since none of the definitions are being compared to the actual source.

    While comparisons on this level are easier (since you need not worry about white-space differences), they can also be misleading if all definitions are identical, yet they are all different from the source definition in the exact same way.

  2. Comparing between database and source script

    In this context you do need to worry about length and white-space variation depending on how your source object scripts are written and how they are packaged up for deployment. Here T-SQL is not going to help you as there are too many variations in what could be different yet still be "the same".

    This is where several 3rd party tools come into play and you should choose one of them. They can be configured to ignore certain white-space as well as include additional properties and sometimes even permissions.

    If you want to know if there are differences in Production that should either be removed OR factored into the source definition, you can only determine that by comparing to the source object script.

    If you do not have source scripts for each object, then you absolutely need to do this and now is as good of a time as any to start.

    If you do not have your source scripts in some sort of version control (Git, SVN, TFS, anything..), then you absolutely need to do this and now is as good of a time as any to start.


On a related note, the equation for the max size (in bytes, not "characters") is correct:

SELECT POWER(CONVERT(BIGINT, 2), 31) - 1
-- 2147483647

That value matches the max size of an INT as noted here. Since NVARCHAR is UTF-16, that is either 2 or 4 bytes per Code Point (a "character" is comprised of 1 or more Code Points, such as when using combining characters), hence the max "length" of a string is half of that, assuming that all "characters" are of the single, 2-byte variety (and they usually are). Meaning:

SELECT 2147483647 / 2
-- 1073741823

So the max amount of "characters" in an NVARCHAR(MAX) column is 1,073,741,823.

HOWEVER, this is irrelevant for SUBSTRING since it doesn't error in T-SQL if requested to return more characters than are in the input string. For example:

SELECT SUBSTRING('A', 1, 100);
-- A

So you can just put an arbitrary high value as the 3rd parameter.