Sql-server – Changing the datatype of the primary key columns of partitioned tables

ddlsql serversql-server-2012

I am currently blessed with the following situation:

  • We have a production database that is partitioned and uses primary keys of type int
  • Multiple (~20) tables that each contain > 100.000.000 rows, 2 of these are near to 2.000.000.000 rows
  • These tables all have a column of type int as primary key which increments by one each row
  • These tables are partitioned, using a partition function of type int, using RIGHT(0), so effectively all data is in the one and only partition

Since some tables are nearing the int32 maximum value, we must alter these column types to bigint. Here's where the problems start:

  • Cannot modify the type of a primary key column (this I could dodge by dropping the primary key, modifying the type, and recreating the primary key again), however, I still cannot modify the column type after dropping the primary key constraint, getting the following exception:

    {"The object 'XXX_XXX' is dependent on column
    'ABCD'. ALTER TABLE ALTER COLUMN ABCD failed because one or
    more objects access this column."}

As far as I can see there is a schema bound dependency of the partition scheme (and function) to the table preventing the datatype modification. There are no foreign keys referencing these primary keys.

I'm not seeing a solution to updating the column type on this database, let alone a production database. We can afford some maintenance downtime, say +- 60 minutes, but no longer. What are my options?

Additional information

  • I'm starting to believe my only option is to create new tables with the correct structure and pump the existing table data into the new ones…

  • Tables are already partitioned based on their primary key value (currently just 1 partition (RIGHT(0), future partitions should be possible for new values (RIGHT(x)), the boundary x is yet to be determined)

  • These are data-logging tables, only inserts, no updates or deletes

  • There is always DML executed on these tables except for maintenance windows, once every 6-8 weeks for about 90 minutes tops.

  • The indexes based on the primary key are all clustered

  • There are no non-clustered indexes on these tables

Best Answer

First before I provide a solution, I want to confirm your assumption:

As far as I can see there is a schema bound dependency of the partition scheme (and function) to the table preventing the datatype modification.

You are correct; the fact that you have a partition function defined for a datatype of INT is preventing you from altering the data type of your column... Sadly, you're kind of stuck with the current table because there's no way, to my knowledge, to adjust the datatype of a partition function without dropping/recreating it.... which you can't do because you have tables dependent upon it. So basically this is the chicken-and-egg scenario.

I'm not seeing a solution to updating the column type on this database, let alone a production database. We can afford some maintenance downtime, say +- 60 minutes, but no longer. What are my options?

One possible solution to your issue is to take advantage of Partitioned Views. This functionality has been around forever and is often overlooked, but a Partitioned View will provide you with a way to sidestep your INT data limit by adding new data to a different table where the ID column is a BIGINT datatype. This new table should also have a better partition function/scheme underlying it and hopefully will be maintained a little better going forward. Any queries that are pointing to the old table would then be pointed to the Partitioned View and you'll not run into a data limit issue.

Let me explain with an example. The following is a trivial recreation of your current partitioned table, consisting of one partition, as follows:

-- Create your very frustraiting parition scheme
CREATE PARTITION FUNCTION dear_god_why_this_logic (INT)
AS RANGE RIGHT FOR VALUES (0);

-- Scheme to store data on PartitionedData Filegroup
CREATE PARTITION SCHEME dear_god_why_this_scheme
AS PARTITION dear_god_why_this_logic ALL TO ([PRIMARY]);

-- Create Partitioned Table
CREATE TABLE dbo.TestPartitioned
(
    ID INT IDENTITY(1,1) PRIMARY KEY CLUSTERED,
    VAL CHAR(1)
) ON dear_god_why_this_scheme (ID);

--Populate Table with Data
INSERT INTO dbo.TestPartitioned WITH (TABLOCK) (VAL)
SELECT CHAR(ROW_NUMBER() OVER (ORDER BY (SELECT NULL)) % 128)
FROM master.sys.configurations t1
    CROSS JOIN master.sys.configurations t2
    CROSS JOIN master.sys.configurations t3;

Let's say for argument's sake that I'm getting close to the INT limit (even though I'm obviously not). Because I don't want run out of valid INT values in the ID column, I'm going to create a similar table, but use BIGINT for the ID column instead. This table will be defined as follows:

CREATE PARTITION FUNCTION lets_maintain_this_going_forward (BIGINT)
AS RANGE RIGHT FOR VALUES (0, 500000, 1000000, 1500000, 2000000);

-- Scheme to store data on PartitionedData Filegroup
CREATE PARTITION SCHEME lets_maintain_this_going_forward_scheme
AS PARTITION lets_maintain_this_going_forward ALL TO ([PartitionedData]);

-- Table for New Data going forward with new ID datatype of BIGINT
CREATE TABLE dbo.TestPartitioned_BIGINT
(
    ID BIGINT IDENTITY(500000,1) PRIMARY KEY CLUSTERED,
    VAL CHAR(1)
) ON lets_maintain_this_going_forward_scheme (ID);

-- Add check constraint to be used by Partitioned View
ALTER TABLE dbo.TestPartitioned_BIGINT ADD CONSTRAINT CK_ID_BIGINT CHECK (ID > CAST(499999 AS BIGINT));

A few notes here:

Table Partitioning on the New Table

This new table is also going to be partitioned based on your requirements. It's a good idea to think about future maintenance needs, so create some new file groups, define a better partition alignment strategy, etc. My example is keeping it simple, so I'm throwing all of my partitions into one Filegroup. Don't do that in production, instead follow Table Partitioning Best Practices, courtesy of Brent Ozar et. al.

Check Constraint

Because I want to take advantage of Partitioned Views, I need to add a CHECK CONSTRAINT to this new table. I know that my insert statement generated ~440k records, so to be safe, I'm going to start my IDENTITY seed at 500k and create a CONSTRAINT defining this as well. The constraint will be used by the optimizer when evaluating which tables can be eliminated when the eventual Partitioned View is called.

Now to Mash the Two Tables Together

Partitioned Views don't particularly do well when you throw mixed datatypes at them when it comes to the partition column in the underlying tables. To get around this, we have to persist the current ID column in your current table as a BIGINT value. We're going to do that by adding a Persisted Computed Column, as follows:

-- Append Calculated Columns of New Datatype to Old Table
-- WARNING: This will take a while against a large data set and will likely lock the table
ALTER TABLE dbo.TestPartitioned ADD ID_BIG AS (CAST(ID AS BIGINT)) PERSISTED
GO

-- Add Constraints on Calculated Column
-- WARNING: This will likely lock the table
ALTER TABLE dbo.TestPartitioned ADD CONSTRAINT CK_ID_TestPartitioned_BIGINT CHECK(ID_BIG <= CAST(499999 AS BIGINT));
GO

-- Create a new Nonclustered index on ID_BIG to act as new "pkey"
CREATE NONCLUSTERED INDEX IX_TestPartitioned__BIG_ID__VAL ON dbo.TestPartitioned (ID_BIG) INCLUDE (VAL);

I've also added another CHECK CONSTRAINT on the old table (to aid with Partition Elimination from our eventual Partitioned View) and a new Non-Clustered Index that is going to act like a primary key index (because look ups originating from the Partitioned View are going to be occurring against ID_BIG instead of ID).

With the new Computed Column and Check Constraint in place, I can finally define the Partitioned View, as follows:

-- Build a Partitioned View on Top of the old and new tables
CREATE VIEW dbo.vw_TableAll
WITH SCHEMABINDING
AS
SELECT ID_BIG AS ID, VAL FROM dbo.TestPartitioned
UNION ALL
SELECT ID, VAL FROM dbo.TestPartitioned_BIGINT
GO

Running a quick query against this view will confirm we have partition elimination working (as you don't see any lookups occurring against the new table we've created):

SELECT *
FROM dbo.vw_TableAll
WHERE ID < CAST(500 AS BIGINT);

Execution Plan:

enter image description here

At this stage, you'll need to make a few changes to your application:

  1. Stop inserting new records into the old table, and instead insert them into the new table. We're not allowed to insert records into the Partitioned View because we're using IDENTITY values in the underlying tables. Partitioned Views don't allow for this, so you have to insert records directly into the new table in this scenario.
  2. Adjust any SELECT queries (that point to the old table) to point to the Partitioned View. Alternatively, you can rename the old table (e.g. TableName_Old) and create the view as the old tables' name (e.g. TableName); how fancy you want to get here is up to you.

New Records to the New Table

At this point, new records should be getting inserted into the new table. I'll simulate this by running the following:

--Populate Table with Data
INSERT INTO dbo.TestPartitioned_BIGINT WITH (TABLOCK) (VAL)
SELECT CHAR(ROW_NUMBER() OVER (ORDER BY (SELECT NULL)) % 128)
FROM master.sys.configurations t1
    CROSS JOIN master.sys.configurations t2
    CROSS JOIN master.sys.configurations t3;
GO

Again, my Identity Seed is configured so I won't have any ID conflicts between the two tables. The CHECK CONSTRAINTS on both tables should also enforce this. Let's confirm that Partition Elimination is still occurring:

SELECT *
FROM dbo.vw_TableAll
WHERE ID > CAST(300000 AS BIGINT)
    AND ID < CAST(300500 AS BIGINT);

SELECT *
FROM dbo.vw_TableAll
WHERE ID > CAST(500000 AS BIGINT)
    AND ID < CAST(500500 AS BIGINT);

Execution Plans:

enter image description here

Do take note that most queries will likely span both tables. While we won't be able to take advantage of partition elimination in these scenarios, the query plans should remain as optimal as possible with the benefit that you don't have to rewrite your underlying queries (if you decided to name the view the same as your old table).

What to do now?

Well this is wholly dependent upon you. If the current records are never going to disappear and you're happy with performance from the Partitioned View, pour a celebratory beer because you're done. Huzzah!

If you want to consolidate the old table into the new table, you're going to have to pick maintenance windows within which to do the next series of operations. Basically, you're going to drop the constraints on the tables (which will break the partition elimination component of the Partitioned View), copy your most recent records over from the old table to the new table, purge these records from the old table, and then update constraints (so the Partitioned View is able to take advantage of partition elimination once again). Because of the volume of data you have in the existing tables, you may have to go through a few rounds of this process to consolidate everything. Those steps are summarized as follows:

-- During a maintenance window, transfer old records to new table if you so choose
-- Drop Check Constraint while transferring over records
ALTER TABLE dbo.TestPartitioned_BIGINT DROP CONSTRAINT CK_ID_BIGINT;

-- Retain Identity Values
SET IDENTITY_INSERT dbo.TestPartitioned_BIGINT ON

-- Copy records into the new table
INSERT INTO dbo.TestPartitioned_BIGINT (ID, VAL)
SELECT ID_BIG, VAL
FROM dbo.TestPartitioned
WHERE ID > 300000

SET IDENTITY_INSERT dbo.TestPartitioned_BIGINT OFF

-- Enable Check Constraint after transferred records are complete, ensuring the Check Constraint is defined on the proper ID value
ALTER TABLE dbo.TestPartitioned_BIGINT ADD CONSTRAINT CK_ID_BIGINT CHECK (ID > CAST(300000 AS BIGINT));
GO

-- Purge records from original table
DELETE
FROM dbo.TestPartitioned
WHERE ID > 300000

-- Recreate Check Constraint on original table, ensuring the Check Constraint is defined on the proper ID value
ALTER TABLE dbo.TestPartitioned DROP CONSTRAINT CK_ID_TestPartitioned_BIGINT;
GO
ALTER TABLE dbo.TestPartitioned ADD CONSTRAINT CK_ID_TestPartitioned_BIGINT CHECK(ID_BIG <= CAST(300000 AS BIGINT));
GO

If possible, test this out in a non production environment. I don't condone testing in production. Hopefully this answer helps, and if you have any questions, feel free to post a comment and I'll do my best to get back to you quickly.