Ask yourself another question: If the entire database is in memory and I never have to touch the disk, do I want to store my data in an ordered B-tree or do I want to store my data in an unordered heap?
The answer to this question will depend on your access pattern. On most cases your access requires single row look-up (ie. seeks) and range scans. These access patterns require a B-Tree, otherwise they are inefficient. Some other access patterns, common in DW and OLAP, are always doing aggregates over the entire table end-to-end always and they do no benefit from range scans. As you drill further other requirements come to light, like the speed of insert and allocation into a heap vs. B-Tree may play a role for huge ETL transfer jobs. But most times the answer really boils down to one question: do you seek or range-scan? The overwhelming number of times the answer is YES. And therefore the overwhelming number of times the design requires a clustered index.
In other words: just because is cheap to read it from disk in random order does not imply that you can trash your TLBs and L2 lines in a 64Gb RAM scan bonanza...
Directly answering the question, there are two ways to execute the operation.
- If the number of varchar columns involved on the table is small (one or two), it is more practical to create pseudo-temp columns
- If the number of varchar columns is bigger, the way above is not very practical - so you create a pseudo-table. This is most used on metadata updates scripts of some database tools like ErWin or ER/Studio (I used both tools, and reviewed the scripts generated before applying)
Note on large tables: If the table has few thousands of records or less, you could do the operation at once. In the case of million-records tables, it is more practical to perform in batches (let's say 1000's or 100s of records each time).
Pseudo-temp columns
Pseudo-temp columns (I forgot if there is another, more appropriate name) are columns used to store the result of a conversion. In this case, they would be also the final columns after the process.
- Create the new columns with the intended length. Do not forget to include any check constraints or defaults on the new definition
- Perform an update (or updates, see the observation above) to store the data of the old column in the new one.
- Perform the log backup and do the checkpoint, to not allow the log to grow absurdly large.
- If the old column has any constraint(s) associated with it, drop them.
- Drop the old column.
- Rename the new column to the old column name
- Rebuild the affected indexes (or all, if the affected column was also part of a clustered primary key constraint - it's rare someone using an (n)varchar as PK, but I have seen some).
This is the same process detailed in Aaron's answer.
Pseudo-temp tables
When the modification is in more than an handful of columns, it is more practical to create a new table, based on the schema of the old one.
- Create a new table, without any table constraints (PK,FK,etc). Bring only column ones at this time (NOT NULL, DEFAULT, CHECK, etc)
- Insert the data at the old table in the new table (see the Note on large tables above). SET IDENTITY_INSERT here is a must.
- Now, drop all the table constraints (PK, FKs, checks) and triggers on the old table. Recreate that constraints and triggers on the new table.
- Recreate all other indexes (all at once or one-at-a-time, depending on your maintenance window) of the old table, on the new table. Unless the table have no clustered index, this have to be done after step 3. Or, at least, after the creation of the PK constraint.
- Check if everything went right (if you did not forgot a trigger or a constraint in the process) and, if all is ok, drop the old table.
- Rename the new table to the name on the old table
Note on step 4: If you have any duplicate indexes detected (detecting duplicate indexes are a very long subject, see Kimberly Tripp's blog on SQLSkills.com), that is your chance to get rid of them if is that the case.
Performance implications
Changing from VARCHAR to NVARCHAR have some implications on performace, for any SQL Server below 2008R2, at least. For SQL 2008 R2, Aaron Bertrand has some blog posts on the Unicode compression feature- which can counter balance when NVarchar columns are used to store content that can be stored on VARCHAR columns. I didn't read them completely as the articles deserve, but the subject is interesting.
NVARCHAR columns tipically (IOW, before 2008R2) store all the chars in the columns with 2 bytes per char. For example, the string 'MSSQL' will be stored in 5 bytes on a VARCHAR column and 10 on a NVARCHAR one. Since non-LOB string columns are limited to store a maximum of 8000 bytes, it means that VARCHAR can store 8000 characters, while NVARCHR are limited to 4000.
Implications of that facts:
- Since index keys are limited to 900 bytes (see docs on CREATE INDEX), if you try to index an NVARCHAR(500) column the command will not fail (if this the only one column on the index key), but if you UPDATE or INSERT a row with more than 450 - (total size of other columns on the index key, if is the case) characters the operation will fail.
- The more bytes to operate, more work to do. You read/write/compare/cache the double of bytes.
- Depending on how massive the table is, the influence of the string columns on the stored size of the table and how the participation of the table on database size, you can expect a growing on the (used) database size and all the variables it affects directly or not (like backup/restore time, index maintenance, etc).
EDIT: as gbn stated, does not worth create something just to use VARCHARs when you have a clear requisite that needs NVARCHAR columns to be fullfilled.
Best Answer
Actually, that depends on the setting of the
large value types out of row
option, which can be set usingsp_tableoption
. From the documentation:The default is for
MAX
values to be stored in-row, up to 8000 bytes, if they fit. Unless you have usedsp_tableoption
to change the default, yourMAX
data will most likely be stored in-row.That said, it is poor practice to use
MAX
data types for values that will never exceed 8000 bytes - use a non-MAX type instead. Aside from anything else, performance is often significantly poorer when dealing withMAX
types, because SQL Server must be prepared to cope with data that might be up to 2GB in size.Only the
MAX
ones. In addition, if a previously in-rowMAX
column is moved off-row, only that column in that row is affected. It is replaced in-row by a pointer to the off-rowLOB
structure. There are also circumstances where non-MAX columns may be moved off-row.Scanning the clustered index traverses only in-row data. If off-row data is needed for the query, it is looked up using the in-row pointer.