I am currently blessed with the following situation:
- We have a production database that is partitioned and uses primary keys of type
int
- Multiple (~20) tables that each contain > 100.000.000 rows, 2 of these are near to 2.000.000.000 rows
- These tables all have a column of type
int
as primary key which increments by one each row - These tables are partitioned, using a partition function of type
int
, usingRIGHT(0)
, so effectively all data is in the one and only partition
Since some tables are nearing the int32
maximum value, we must alter these column types to bigint
. Here's where the problems start:
-
Cannot modify the type of a primary key column (this I could dodge by dropping the primary key, modifying the type, and recreating the primary key again), however, I still cannot modify the column type after dropping the primary key constraint, getting the following exception:
{"The object 'XXX_XXX' is dependent on column
'ABCD'. ALTER TABLE ALTER COLUMN ABCD failed because one or
more objects access this column."}
As far as I can see there is a schema bound dependency of the partition scheme (and function) to the table preventing the datatype modification. There are no foreign keys referencing these primary keys.
I'm not seeing a solution to updating the column type on this database, let alone a production database. We can afford some maintenance downtime, say +- 60 minutes, but no longer. What are my options?
Additional information
-
I'm starting to believe my only option is to create new tables with the correct structure and pump the existing table data into the new ones…
-
Tables are already partitioned based on their primary key value (currently just 1 partition (
RIGHT(0)
, future partitions should be possible for new values (RIGHT(x)
), the boundaryx
is yet to be determined) -
These are data-logging tables, only inserts, no updates or deletes
-
There is always DML executed on these tables except for maintenance windows, once every 6-8 weeks for about 90 minutes tops.
-
The indexes based on the primary key are all clustered
-
There are no non-clustered indexes on these tables
Best Answer
First before I provide a solution, I want to confirm your assumption:
You are correct; the fact that you have a partition function defined for a datatype of
INT
is preventing you from altering the data type of your column... Sadly, you're kind of stuck with the current table because there's no way, to my knowledge, to adjust the datatype of a partition function without dropping/recreating it.... which you can't do because you have tables dependent upon it. So basically this is the chicken-and-egg scenario.One possible solution to your issue is to take advantage of Partitioned Views. This functionality has been around forever and is often overlooked, but a Partitioned View will provide you with a way to sidestep your
INT
data limit by adding new data to a different table where theID
column is aBIGINT
datatype. This new table should also have a better partition function/scheme underlying it and hopefully will be maintained a little better going forward. Any queries that are pointing to the old table would then be pointed to the Partitioned View and you'll not run into a data limit issue.Let me explain with an example. The following is a trivial recreation of your current partitioned table, consisting of one partition, as follows:
Let's say for argument's sake that I'm getting close to the
INT
limit (even though I'm obviously not). Because I don't want run out of validINT
values in theID
column, I'm going to create a similar table, but useBIGINT
for theID
column instead. This table will be defined as follows:A few notes here:
Table Partitioning on the New Table
This new table is also going to be partitioned based on your requirements. It's a good idea to think about future maintenance needs, so create some new file groups, define a better partition alignment strategy, etc. My example is keeping it simple, so I'm throwing all of my partitions into one Filegroup. Don't do that in production, instead follow Table Partitioning Best Practices, courtesy of Brent Ozar et. al.
Check Constraint
Because I want to take advantage of Partitioned Views, I need to add a
CHECK CONSTRAINT
to this new table. I know that my insert statement generated ~440k records, so to be safe, I'm going to start myIDENTITY
seed at 500k and create aCONSTRAINT
defining this as well. The constraint will be used by the optimizer when evaluating which tables can be eliminated when the eventual Partitioned View is called.Now to Mash the Two Tables Together
Partitioned Views don't particularly do well when you throw mixed datatypes at them when it comes to the partition column in the underlying tables. To get around this, we have to persist the current
ID
column in your current table as aBIGINT
value. We're going to do that by adding a Persisted Computed Column, as follows:I've also added another
CHECK CONSTRAINT
on the old table (to aid with Partition Elimination from our eventual Partitioned View) and a new Non-Clustered Index that is going to act like a primary key index (because look ups originating from the Partitioned View are going to be occurring againstID_BIG
instead ofID
).With the new Computed Column and Check Constraint in place, I can finally define the Partitioned View, as follows:
Running a quick query against this view will confirm we have partition elimination working (as you don't see any lookups occurring against the new table we've created):
Execution Plan:
At this stage, you'll need to make a few changes to your application:
IDENTITY
values in the underlying tables. Partitioned Views don't allow for this, so you have to insert records directly into the new table in this scenario.SELECT
queries (that point to the old table) to point to the Partitioned View. Alternatively, you can rename the old table (e.g. TableName_Old) and create the view as the old tables' name (e.g. TableName); how fancy you want to get here is up to you.New Records to the New Table
At this point, new records should be getting inserted into the new table. I'll simulate this by running the following:
Again, my Identity Seed is configured so I won't have any ID conflicts between the two tables. The
CHECK CONSTRAINTS
on both tables should also enforce this. Let's confirm that Partition Elimination is still occurring:Execution Plans:
Do take note that most queries will likely span both tables. While we won't be able to take advantage of partition elimination in these scenarios, the query plans should remain as optimal as possible with the benefit that you don't have to rewrite your underlying queries (if you decided to name the view the same as your old table).
What to do now?
Well this is wholly dependent upon you. If the current records are never going to disappear and you're happy with performance from the Partitioned View, pour a celebratory beer because you're done. Huzzah!
If you want to consolidate the old table into the new table, you're going to have to pick maintenance windows within which to do the next series of operations. Basically, you're going to drop the constraints on the tables (which will break the partition elimination component of the Partitioned View), copy your most recent records over from the old table to the new table, purge these records from the old table, and then update constraints (so the Partitioned View is able to take advantage of partition elimination once again). Because of the volume of data you have in the existing tables, you may have to go through a few rounds of this process to consolidate everything. Those steps are summarized as follows:
If possible, test this out in a non production environment. I don't condone testing in production. Hopefully this answer helps, and if you have any questions, feel free to post a comment and I'll do my best to get back to you quickly.