In an attempt to decouple an application from our monolithic database, we've tried to change INT IDENTITY columns of various tables to be a PERSISTED computed column that uses COALESCE. Basically, we need the decoupled application an ability to still update the database for common data shared across many applications while still allowing existing applications to create data in these tables without the need for code or procedure modification.
So essentially, we've moved from a column definition of;
PkId INT IDENTITY(1,1) PRIMARY KEY
to;
PkId AS AS COALESCE(old_id, external_id, new_id) PERSISTED NOT NULL,
old_id INT NULL, -- Values here are from existing records of PkId before table change
external_id INT NULL,
new_id INT IDENTITY(2000000,1) NOT NULL
In all cases the PkId is also a PRIMARY KEY and in all but one case, it is CLUSTERED. All tables have the same foreign keys and indexes as before. In essence, the new format allows the PkId to be supplied by the decoupled application (as external_id), but also allows the PkId to be the IDENTITY column value therefore allowing existing code that relies on the IDENTITY column through use of SCOPE_IDENTITY and @@IDENTITY to work as it used to.
The problem we've had is that we've come across a couple of queries that used to run in an acceptable time to now blow out completely. The generated query plans used by these queries is nothing like what it used to be before.
Given the new column is a PRIMARY KEY, the same data type as before, and PERSISTED, I would have expected queries and query plans to behave the same as they did before. Should the COMPUTED PERSISTED INT PkId essentially behave the same way as an explicit INT definition in terms of how SQL Server will produce the execution plan? Are there other likely issues with this approach that you can see?
The purpose of this change was supposed to allow us to change the table definition without the need to modify existing procedures and code. Given these issues, I don't feel like we can go with this approach.
Best Answer
FIRST
You probably don't need all three columns:
old_id
,external_id
,new_id
. Thenew_id
column, being anIDENTITY
, will have a new value generated for each row, even when you insert intoexternal_id
. But, betweenold_id
andexternal_id
, those are pretty much mutually exclusive: either there is already anold_id
value or that column, in the current conception, will just beNULL
if usingexternal_id
ornew_id
. Since you won't be adding a new "external" id to a row that already exists (i.e. one that has anold_id
value), and there won't be any new values coming in forold_id
, then there can be one column that is used for both purposes.So, get rid of the
external_id
column and renameold_id
to be something likeold_or_external_id
or whatever. This shouldn't require any real changes to anything, yet reduces some of the complication. At most you might need to call the columnexternal_id
, even if it contains "old" values, if app code is already written to insert intoexternal_id
.That reduces the new structure to be just:
Now you have only added 8 bytes per row instead of 12 bytes (assuming that you aren't using the
SPARSE
option or Data Compression). And you didn't need to change any code, T-SQL or App code.SECOND
Continuing down this path of simplification, let's look at what we have left:
old_or_external_id
column either has values already, or will be given a new value from the app, or will be left asNULL
.new_id
will always have a new value generated, but that value will only be used if theold_or_external_id
column isNULL
.There is never a time when you would need values in both
old_or_external_id
andnew_id
. Yes, there will be times when both columns have values due tonew_id
being anIDENTITY
, but thosenew_id
values are ignored. Again, these two fields are mutually exclusive. So what now?Now we can look into why we needed the
external_id
in the first place. Considering that it is possible to insert into anIDENTITY
column usingSET IDENTITY_INSERT {table_name} ON;
, you could get away with making no schema changes at all, and only modify your app code to wrap theINSERT
statements / operations inSET IDENTITY_INSERT {table_name} ON;
andSET IDENTITY_INSERT {table_name} OFF;
statements. You then need to determine what starting range to reset theIDENTITY
column to (for newly generated values) as it will need to be well above the values that the App code will be inserting since inserting a higher value will cause the next auto-generated value to be greater than the current MAX value. But you can always insert a value that is below the IDENT_CURRENT value.Combining the
old_or_external_id
andnew_id
columns also does not increase the chances of running into an overlapping value situation between auto-generated values and app-generated values since the intention of having the 2, or even 3, columns is to combine them into a Primary Key value, and those are always unique values.In this approach, you just need to:
Leave tables as being:
This adds 0 bytes to each row, instead of 8, or even 12.
SET IDENTITY_INSERT {table_name} ON;
andSET IDENTITY_INSERT {table_name} OFF;
statements.SECOND, Part B
A variation on the approach noted directly above would be to have the App code insert values starting with -1 and going down from there. This leaves the
IDENTITY
values as being the only ones going up. The benefit here is that you not only don't complicate the schema, you also don't need to worry about running into overlapping IDs (if the app-generated values run into the new auto-generated range). This is only an option if you aren't already using negative ID values (and it seems pretty rare for people to use negative values on auto-generated columns so this should be a likely posibility in most situations).In this approach, you just need to:
Leave tables as being:
This adds 0 bytes to each row, instead of 8, or even 12.
-1
.SET IDENTITY_INSERT {table_name} ON;
andSET IDENTITY_INSERT {table_name} OFF;
statements.Here you still need to do the
IDENTITY_INSERT
, but: you don't add any new columns, don't need to "reseed" anyIDENTITY
columns, and have no future risk of overlaps.SECOND, Part 3
One last variation of this approach would be to possibly swap out the
IDENTITY
columns and instead use Sequences. The reason to take this approach is to be able to have the app code insert values that are: positive, above the auto-generated range (not below), and no need forSET IDENTITY_INSERT ON / OFF
.In this approach, you just need to:
Copy the
IDENTITY
column to a new column that doesn't have theIDENTITY
property, but does have aDEFAULT
Constraint using the NEXT VALUE FOR function:This adds 0 bytes to each row, instead of 8, or even 12.
SET IDENTITY_INSERT {table_name} ON;
andSET IDENTITY_INSERT {table_name} OFF;
statements.HOWEVER, due to the requirement that code with either
SCOPE_IDENTITY()
or@@IDENTITY
still functions properly, switching to Sequences is not currently an option as it appears that there is no equivalent of those functions for Sequences :-(. Sad!