Sql-server – Change collation on partition column (SQL Server 2017)

collationpartitioningsql server

I'm trying to change collation on column:

ALTER TABLE [dbo].[yourdad] ALTER COLUMN [yourmam] varchar(36) COLLATE SQL_Latin1_General_CP1_CI_AI NOT NULL; 

and it gives message like this:

Msg 5074, Level 16, State 1, Line 1 The object 'yourdad' is dependent
on column 'yourmam'. Msg 4922, Level 16, State 9, Line 1 ALTER TABLE
ALTER COLUMN yourmam failed because one or more objects access this
column.

It has partition schema and function that depend on this column.

Currently I have like over 40k table with such columns where collation should be changed.

Is it possible to change collation without a table recreation?

  1. Drop the partitioned index(es) – deleted.

  2. But when I'm trying to delete schema:

    DROP PARTITION SCHEME yourdad

Msg 7717, Level 16, State 1, Line 1 The partition scheme
"yourdad" is currently being used to partition one or more
tables.

It's a new case insensitive database collation on database I wan't to install, old one was – case sensitive.

Best Answer

You don't need to drop the table(s).

One option (that I have tested) is:

  1. Drop the partitioned non-clustered index(es); move partitioned clustered indexes, primary keys, and unique constraints to a non-partitioned file group; and unpartition heaps by creating a (temporary) clustered index on a non-partitioned file group. For help identifying which indexes are affected and what action needs to be taken for each one, please see the query I have posted in the "Finding and Fixing Ignored Objects" section of "Changing the Collation of the Instance, the Databases, and All Columns in All User Databases: What Could Possibly Go Wrong?" (which also contains examples of dealing with clustered indexes, contraints, and heaps).
  2. Drop the partition scheme(s)
  3. Drop the partition function(s)
  4. Alter the column definition with the new collation
  5. Drop and recreate the partition function to specify the COLLATE clause with the new collation for the input parameter (assumption here is that you are not changing the default collation of the database, and the input parameter will use the database's default collation for string types unless specified using the COLLATE clause, and the CREATE INDEX statement using this scheme and function will fail if the collation of this parameter does not match the collation of the partitioning column). Meaning, if you change the column's collation to SQL_Latin1_General_CP1_CI_AI but the database is still using SQL_Latin1_General_CP1_CI_AS, then you need to:

    CREATE PARTITION FUNCTION PartitionFunc
    (
      VARCHAR(36) COLLATE SQL_Latin1_General_CP1_CI_AI
    )
    ...
    

Now, 40k+ tables is perhaps more than a few, but if they are all using the same partition scheme and function, then this is only 4 extra statements (drop and create for the partition schema and partition function) compared to what you already need to do even if partitioning wasn't being used (i.e. you would need to drop and recreate the indexes no matter what). So, while this does require a lot of I/O as the indexes are dropped and recreated, there is no way around that when changing a column's collation.

I am assuming (and have fingers crossed) that these columns are not:

  • part of the clustered index
  • key fields in a primary key or unique index/constraint that is foreign-keyed to

If either of those cases is true then you have a lot more work to do, but that has nothing to do with the partitioning.


Another option (given the desire to not recreate everything) might be to use the undocumented sqlservr.exe -q option to change the collation on ALL user columns and databases on the instance. This approach changes the meta-data for the database, columns, parameters, etc, and then rebuilds the indexes (again, no way around this when changing collations, so with 40k+ tables this will likely take a few minutes (or more) and the database, or the entire instance if you update the collation globally, will be offline during that time).

I just tested this scenario on SQL Server 2019 RC1, and it does not work for partition functions, but that doesn't mean that this approach isn't part of the overall solution. It just means that you really have no choice but to go through the process noted in the section above. The sqlservr.exe -q approach will handle everything else (except for user-defined table types, I believe). There are plenty of restrictions on being able to change a column's collation, and even more on changing the database's default collation (lots of dependencies in both cases), so if you are wanting to have a case-sensitive database now, then likely you will have other obstacles besides just the partitioned columns.

The thing that doesn't work with partition functions is that the collation of the input parameter is stored in sys.partition_parameters. This particular meta-data is not changed with the otherwise-awesome-even-if-undocumented -q option. The effects this has are:

  1. Assuming the data is either VARCHAR but no change of code page, or NVARCHAR, then the data will still be partitioned as it was before. For example, let's say that you have a RANGE LEFT partition function using FOR VALUES ( 'D', 'H', 'O', 'V' ). In this case, partition #2 contains values > D and <= H.

    • When using the case-insensitive collation of SQL_Latin1_General_CP1_CI_AI, partition #2 would contain E, e, F, f, G, g, H, h. Simple enough, right?
    • However, when using the case-sensitive collation of SQL_Latin1_General_CP1_CS_AI, partition #2 would contain d, E, e, F, f, G, g, H. This partition now contains lower-case d, and does not contain lower-case h. This is due to most of the SQL Server collations sorting upper-case before lower-case (Windows collations sort lower-case first).
  2. If data is VARCHAR and the code page is different, then things might get interesting. Any characters with values between 128 and 255 can potentially either have a different value and/or sort position, be mapped to a similar character because it is't available on code page of collation used for partition function collation (i.e. "best fit" mapping), or it might be missing on the old code page hence it becomes a "?" and sorts as that. Depending on what characters are being used, how they map (if they even do) to the other code page, and how the partition ranges are set up, you can experience anything ranging from "no effect at all" to "really strange and difficult to trace/debug behavior".

I have submitted a request to have this particular meta-data added to the sqlservr.exe -q operation, but Microsoft has no real incentive to fix it given that it's an undocumented feature. So, please support it (i.e. vote for it) so they can at least see that people do want it (thanks!): Update Collation of Partition Functions and User-Defined Table Types via SQLSERVR -Q

That being said, the two main questions are:

  1. What is the effect of changing the collation in the first place? What is the current collation and what is the new collation?

    If the code page is changing, will there be data loss (the example code in the question shows a VARCHAR column, which makes this a possibility; not so for NVARCHAR)? This approach does not do a code page conversion, so characters with values between 128 and 255 might become some other character, and there are no warnings/errors when this happens (but if you don't have such data, then this is a non-issue).

    If sensitivities are changing (i.e. going from Accent-Sensitive to Accent-Insensitive) then will there now be violations of uniqueness in an existing unique index/constraint?

  2. Do you want the new collation used for EVERYTHING? If so, this might work. If not, then this approach is not viable as it changes EVERYTHING.

For complete details on this approach, please see my post:

Changing the Collation of the Instance, the Databases, and All Columns in All User Databases: What Could Possibly Go Wrong? (be sure to review the "Finding and Fixing Ignored Objects" section which contains queries to help with fixing partition functions)

If you have multiple databases and want to only change this one (again, EVERYTHING in this database, including the database's default collation), then you can:

  1. Detach (or backup) this database.
  2. Create a new, temporary instance just for making this change, using the same instance-level collation being used for the instance that this database currently belongs to.
  3. Attach (or restore) the database to the new, temporary instance.
  4. Update the temp instance's collation using the sqlservr.exe -q method.
  5. Detach (or backup) the database from the temp instance.
  6. Re-attach (or restore) the database to the instance it came from.