Sql-server – Efficient way of changing VARCHAR to NVARCHAR fields in a large table in SQL Server 2008

database-designperformancesql-server-2008-r2type conversionvarchar

I am aware of when adding new fields to large tables, it is recommended to add them to the end of the fields rather than somewhere in the middle, and wondering if something like this applies when changing field types?

I have a table with about a million records that has several VARCHAR type fields. I would like to change these to NVARCHAR, but as I understand it, this will take some time and resources, as the fields are in the middle of the table, and SQL Server has to do a bunch of copying/re-ordering.

What is an efficient way of accomplishing this?

Best Answer

Directly answering the question, there are two ways to execute the operation.

  • If the number of varchar columns involved on the table is small (one or two), it is more practical to create pseudo-temp columns
  • If the number of varchar columns is bigger, the way above is not very practical - so you create a pseudo-table. This is most used on metadata updates scripts of some database tools like ErWin or ER/Studio (I used both tools, and reviewed the scripts generated before applying)

Note on large tables: If the table has few thousands of records or less, you could do the operation at once. In the case of million-records tables, it is more practical to perform in batches (let's say 1000's or 100s of records each time).

Pseudo-temp columns

Pseudo-temp columns (I forgot if there is another, more appropriate name) are columns used to store the result of a conversion. In this case, they would be also the final columns after the process.

  1. Create the new columns with the intended length. Do not forget to include any check constraints or defaults on the new definition
  2. Perform an update (or updates, see the observation above) to store the data of the old column in the new one.
  3. Perform the log backup and do the checkpoint, to not allow the log to grow absurdly large.
  4. If the old column has any constraint(s) associated with it, drop them.
  5. Drop the old column.
  6. Rename the new column to the old column name
  7. Rebuild the affected indexes (or all, if the affected column was also part of a clustered primary key constraint - it's rare someone using an (n)varchar as PK, but I have seen some).

This is the same process detailed in Aaron's answer.

Pseudo-temp tables

When the modification is in more than an handful of columns, it is more practical to create a new table, based on the schema of the old one.

  1. Create a new table, without any table constraints (PK,FK,etc). Bring only column ones at this time (NOT NULL, DEFAULT, CHECK, etc)
  2. Insert the data at the old table in the new table (see the Note on large tables above). SET IDENTITY_INSERT here is a must.
  3. Now, drop all the table constraints (PK, FKs, checks) and triggers on the old table. Recreate that constraints and triggers on the new table.
  4. Recreate all other indexes (all at once or one-at-a-time, depending on your maintenance window) of the old table, on the new table. Unless the table have no clustered index, this have to be done after step 3. Or, at least, after the creation of the PK constraint.
  5. Check if everything went right (if you did not forgot a trigger or a constraint in the process) and, if all is ok, drop the old table.
  6. Rename the new table to the name on the old table

Note on step 4: If you have any duplicate indexes detected (detecting duplicate indexes are a very long subject, see Kimberly Tripp's blog on SQLSkills.com), that is your chance to get rid of them if is that the case.

Performance implications

Changing from VARCHAR to NVARCHAR have some implications on performace, for any SQL Server below 2008R2, at least. For SQL 2008 R2, Aaron Bertrand has some blog posts on the Unicode compression feature- which can counter balance when NVarchar columns are used to store content that can be stored on VARCHAR columns. I didn't read them completely as the articles deserve, but the subject is interesting.

NVARCHAR columns tipically (IOW, before 2008R2) store all the chars in the columns with 2 bytes per char. For example, the string 'MSSQL' will be stored in 5 bytes on a VARCHAR column and 10 on a NVARCHAR one. Since non-LOB string columns are limited to store a maximum of 8000 bytes, it means that VARCHAR can store 8000 characters, while NVARCHR are limited to 4000.

Implications of that facts:

  • Since index keys are limited to 900 bytes (see docs on CREATE INDEX), if you try to index an NVARCHAR(500) column the command will not fail (if this the only one column on the index key), but if you UPDATE or INSERT a row with more than 450 - (total size of other columns on the index key, if is the case) characters the operation will fail.
  • The more bytes to operate, more work to do. You read/write/compare/cache the double of bytes.
  • Depending on how massive the table is, the influence of the string columns on the stored size of the table and how the participation of the table on database size, you can expect a growing on the (used) database size and all the variables it affects directly or not (like backup/restore time, index maintenance, etc).

EDIT: as gbn stated, does not worth create something just to use VARCHARs when you have a clear requisite that needs NVARCHAR columns to be fullfilled.