SQL Server SSDT – Adding Non-Nullable Column to Existing Table

nullscriptingsql serverssdt

Due to business logic, we need a new column in a table that is critical to ensure is always populated. Therefore it should be added to the table as NOT NULL. Unlike previous questions that explain how to do this manually, this needs to be managed by the SSDT publish.

I have been banging my head against the wall for a while over this simple-sounding task due to some realizations:

A default value is not appropriate, and it cannot be a computed column. Perhaps it is a foreign key column, but for others we cannot use a fake value like 0 or -1 because those values might have significance (e.g. numeric data).
Adding the column in a pre-deployment script will fail the publish when it automatically tries to create the same column, a second time (even if the pre-deployment script is written to be idempotent) _{(this one is really aggravating as I can otherwise think of an easy solution)}
Altering the column to NOT NULL in a post-deployment script will be reverted each time the SSDT schema refresh occurs (so at the very least our codebase will mismatch between source control and what is actually on the server)
Adding the column as nullable now with the intention of changing to NOT NULL in the future does not work across multiple branches/forks in source control, as the target systems will not necessarily all have the table in the same state next time they are upgraded _{(not that this is a good approach anyway IMO)}

The approach I have heard from others is to directly update the table definition (so the schema refresh is consistent), write a predeployment script that moves the entire contents of the table to a temporary table with the new column population logic included, then to move the rows back in a postdeployment script. This seems risky as all hell though, and still pisses off the Publish Preview when it detects a NOT NULL column is being added to a table with existing data (since that validation runs before the predeployment scripting).

How should I go about adding a new, non-nullable column without risking orphaned data, or moving data back and forth on every publish with lengthy migration scripts that are inherently risky?

Thanks.

Best Answer

I'll share how I have done this in the past. It is designed to solve the specific limitation of pre-deployment scripts that you call out in your second point:

Adding the column in a pre-deployment script will fail the publish when it automatically tries to create the same column, a second time (even if the pre- deployment script is written to be idempotent)

Why pre-deployment scripts don't work for this

When you deploy an SSDT project, the way it stitches thing together is like this (a bit simplified, but in general):

Do the "schema compare" between the source (dacpac file) and the target (database)
Generate a deployment script based on the results of that compare
Process any pre-deployment scripts in the dacpac (doing token replacement, etc.) and insert the contents into the beginning of the deployment script
Do the same for post-deployment scripts, appending to the end of the deployment script

When a new column exists in the dacpac and not in the target database, step #2 will generate code to add that column. So if the pre-deployment script adds this column, the main part of the script will fail (because it assumes the column doesn't exist, based on the results of the schema compare in step #1)

Solution: pre-SSDT script

Martin Smith mentioned this option in a comment, and it's the solution that has worked best for me so far:

We use premodel scripts in our deplyment pipeline. This is not something that is part of SSDT but a step that runs before the dacfx publish. So in this case the premodel script could add the column with the desired values and make it not null and by the time the publish happens it is already in the state expected by SSDT so it doesnt have anything to do. I'm still yet to find much use for predeploy scripts. – Martin Smith Jun 1 at 21:45

The steps to implement this solution in general are:

Create a script in the SSDT project to hold your "pre-SSDT" T-SQL code
- Depending on how your deployment process works, code in these files should probably be idempotent
Make sure to set this script to "Build Action=None" and "Copy to Output Directory=Copy always"
- the "copy always" option is especially important, as the deployment process needs to be able to find this script in your deployment artifacts
In your deployment process, locate and run this script (or scripts) before the SSDT schema compare occurs
Once that script has been executed successfully, you can engage DacServices / DacFx / whatever to complete your deployment as usual

In the end, this allows you to add the column using whatever custom code you like, populated using complex business logic, in the pre-SSDT script.

You also add the column definition in the SSDT project (so source control still matches the real life state of the database). But when the schema compare runs, it sees no changes related to that column (because you've already deployed it).

Other uses of pre-SSDT

I often find when testing deployments that SSDT performs a "table rebuild" operation* when it's completely unnecessary. This is where a new table is a created with the updated schema, all data is copied to that table, the old table is dropped, and the new table is renamed to replace the old table.

This can lead to massive transaction log file growth and other problems if the table is large. If I notice that a schema change is causing this, I'll instead make the change myself in pre-SSDT (which is usually a simple ALTER TABLE statement) and avoid the table rebuild.

Is this a good idea?

I think so. If you read Critiquing two different approaches to delivering databases: Migrations vs state by Alex Yates, this is essentially combining the two approaches a bit. SSDT is state based, but we incorporate a migration step (before SSDT) to handle some of the more complex scenarios that SSDT just has no way of dealing with in a general way.

In doing some searching while writing this answer, this is actually a very common approach discussed in the SSDT user community once you know what to search for. I've seen it called:

pre-compare
pre-model
pre-DAC
pre-SSDT

Etc. Here's a great article that covers a lot of the points that I mentoned above:

Pre-Compare & Pre-Deployment Scripts to SSDT

And one from Red Gate (in the #4 – Changing from system type to user defined type section) that also refers to this as pre-compare:

How to Fix Ten SSDT Deployment Snags, With or Without ReadyRoll

So what's the point of pre-deployment scripts?

Martin points out that he hasn't found "much use for predeploy scripts." I tend to feel the same way. But there are scenarios where they can be useful.

One example a coworker pointed out to me was storing some data in a temp table to be used in the post deployment script (say you're moving a column from one table to another).

*The table rebuild looks like this, which is horrifying, right?

GO
PRINT N'Starting rebuilding table [dbo].[MyTable]...';


GO
BEGIN TRANSACTION;

SET TRANSACTION ISOLATION LEVEL SERIALIZABLE;

SET XACT_ABORT ON;

CREATE TABLE [dbo].[tmp_ms_xx_MyTable] (
    [Id] BIGINT IDENTITY (1, 1) NOT NULL,
    -- etc, other columns
);

IF EXISTS (SELECT TOP 1 1 
           FROM   [dbo].[MyTable])
    BEGIN
        SET IDENTITY_INSERT [dbo].[tmp_ms_xx_MyTable] ON;
        INSERT INTO [dbo].[tmp_ms_xx_MyTable] ([Id], ...)
        SELECT   [Id],
                 -- etc, other columns
        FROM     [dbo].[MyTable]
        ORDER BY [Id] ASC;
        SET IDENTITY_INSERT [dbo].[tmp_ms_xx_MyTable] OFF;
    END

DROP TABLE [dbo].[MyTable];

EXECUTE sp_rename N'[dbo].[tmp_ms_xx_MyTable]', N'MyTable';

COMMIT TRANSACTION;

SET TRANSACTION ISOLATION LEVEL READ COMMITTED;

Related Solutions

MySQL column defined with NOT NULL is allowing nulls

What version of mysql is this?

What mode are you running in?

SELECT @@GLOBAL.SQL_MODE, @@SESSION.SQL_MODE;

(This should be run in the context of your application, just in case it is changing it).

MySQL is documented thus: http://dev.mysql.com/doc/refman/5.0/en/data-type-defaults.html

As of MySQL 5.0.2, if a column definition includes no explicit DEFAULT value, MySQL determines the default value as follows:

If the column can take NULL as a value, the column is defined with an explicit DEFAULT NULL clause. This is the same as before 5.0.2.

If the column cannot take NULL as the value, MySQL defines the column with no explicit DEFAULT clause. For data entry, if an INSERT or REPLACE statement includes no value for
the column, MySQL handles the column according to the SQL mode in effect at the time:

If strict SQL mode is not enabled, MySQL sets the column to the implicit default value for the column data type.

My own testing fails to duplicate your issue

mysql> CREATE TABLE `my_table` (
    ->   `entry_id` int(11) NOT NULL AUTO_INCREMENT,
    ->   `address` varchar(512) NOT NULL,
    ->   `follow_up_to` int(11) DEFAULT NULL,
    ->   PRIMARY KEY (`entry_id`),
    ->   KEY `follow_up_to` (`follow_up_to`)
    -> ) ENGINE=InnoDB AUTO_INCREMENT=536 DEFAULT CHARSET=latin1;
Query OK, 0 rows affected, 2 warnings (0.16 sec)

mysql> INSERT INTO my_table VALUES (NULL, NULL, NULL);
ERROR 1048 (23000): Column 'address' cannot be null
mysql> INSERT INTO my_table (follow_up_to) VALUES (NULL);
Query OK, 1 row affected, 1 warning (0.10 sec)

mysql> SHOW WARNINGS;
+---------+------+----------------------------------------------+
| Level   | Code | Message                                      |
+---------+------+----------------------------------------------+
| Warning | 1364 | Field 'address' doesn't have a default value |
+---------+------+----------------------------------------------+
1 row in set (0.00 sec)

mysql> SELECT * FROM my_table;
+----------+---------+--------------+
| entry_id | address | follow_up_to |
+----------+---------+--------------+
|      537 |         |         NULL |
+----------+---------+--------------+
1 row in set (0.00 sec)

The fact the old data had nulls shouldn't matter. The Alter table should have 'truncated' the nulls into empty strings

mysql> SHOW CREATE TABLE my_table\G
*************************** 1. row ***************************
       Table: my_table
Create Table: CREATE TABLE `my_table` (
  `entry_id` int(11) NOT NULL AUTO_INCREMENT,
  `address` varchar(512) NOT NULL,
  `follow_up_to` int(11) DEFAULT NULL,
  PRIMARY KEY (`entry_id`),
  KEY `follow_up_to` (`follow_up_to`)
) ENGINE=InnoDB AUTO_INCREMENT=536 DEFAULT CHARSET=latin1
1 row in set (0.04 sec)

mysql> ALTER TABLE my_table MODIFY address VARCHAR(512) NULL DEFAULT NULL;
Query OK, 1 row affected (0.76 sec)
Records: 1  Duplicates: 0  Warnings: 0

mysql> INSERT INTO my_table VALUES (NULL, NULL, NULL), (NULL, NULL, NULL);
Query OK, 2 rows affected (0.05 sec)
Records: 2  Duplicates: 0  Warnings: 0

mysql> SELECT * FROM my_table;
+----------+---------+--------------+
| entry_id | address | follow_up_to |
+----------+---------+--------------+
|      535 |         |         NULL |
|      536 | NULL    |         NULL |
|      537 | NULL    |         NULL |
+----------+---------+--------------+
3 rows in set (0.04 sec)

mysql> ALTER TABLE my_table MODIFY address VARCHAR(512) NOT NULL;
Query OK, 3 rows affected, 2 warnings (0.83 sec)
Records: 3  Duplicates: 0  Warnings: 2

mysql> SHOW WARNINGS;
+---------+------+----------------------------------------------+
| Level   | Code | Message                                      |
+---------+------+----------------------------------------------+
| Warning | 1265 | Data truncated for column 'address' at row 2 |
| Warning | 1265 | Data truncated for column 'address' at row 3 |
+---------+------+----------------------------------------------+
2 rows in set (0.04 sec)

mysql> SELECT * FROM my_table;
+----------+---------+--------------+
| entry_id | address | follow_up_to |
+----------+---------+--------------+
|      535 |         |         NULL |
|      536 |         |         NULL |
|      537 |         |         NULL |
+----------+---------+--------------+
3 rows in set (0.04 sec)

mysql> INSERT INTO my_table VALUES (NULL, NULL, NULL), (NULL, NULL, NULL);
Query OK, 2 rows affected, 2 warnings (0.08 sec)
Records: 2  Duplicates: 0  Warnings: 2

mysql> SELECT * FROM my_table;
+----------+---------+--------------+
| entry_id | address | follow_up_to |
+----------+---------+--------------+
|      535 |         |         NULL |
|      536 |         |         NULL |
|      537 |         |         NULL |
|      538 |         |         NULL |
|      539 |         |         NULL |
+----------+---------+--------------+
5 rows in set (0.05 sec)

SQL Server Views – Defining a Column as Nullable in a View But Not in Referenced Table

Yes, it's possible:

select Id, -- Remains as in table
  cast(MandatoryIntCol as int) as [MandatoryIntCol], -- Makes nullable
  isnull(OptionalIntCol, 0) as [OptionalIntCol] -- Makes mandatory
from dbo.MyTable;

It's worth noting that there are some restrictions with this approach:

The changed columns are not writeable in the view, so if the OP's application tries to write to them, it would fail.
The application must not try to distinguish between tables and views. Most apps do not, but annoyingly, some product applications do.