Sql-server – How to efficiently downsize some Unicode fields

sql-server-2012unicode

We have a SQL Server 2012 Enterprise live transactional database that is now growing over 1G per month and is becoming a size problem for us. It is currently at 23G. Character type fields are all Unicode and I have calculated a savings of 5G in space converting only 2 such fields averaging 206 characters each to non-Unicode, and almost 10G in space if we convert a few more of them from nchar and nvarchar to char and varchar types. These fields will never have a requirement to hold Unicode characters that cannot be in the SQL_Latin1_General_CP1_CI_AS collation as they come in as plain ASCII originally and will always do so per the protocol standard.

I’m the software architect and chief C# developer though only a DBA hack or I would not have designed our database to have Unicode fields for high volume tables that did not need Unicode for those fields when the database was created 3 years ago. I want to correct this mistake now before we finalize converting to an AlwaysOn environment to help with various performance and backup issues.

After downsizing these two or more fields, we would like to shrink the database one time to take advantage of the space savings for full backups, and for seeding an AlwaysOn environment.

Questions are –

  1. What is the safest and most efficient conversion technique for downsizing columns from nchar/nvarchar to char/varchar types? Esp. when there are multiple fields in the same table to be converted. I tested doing an “add new column, set new=old, drop old, rename old to new” for both of the main two fields I want to convert from nvarchar(max) to varchar(max), and it took 81 minutes on our test server (4 virtual core, 8G memory) before running out of disk space even though there was 8G left on the disk, and the db has unlimited size set (Could not allocate space for object 'dbo.abc'.'PK_xyz' in database 'xxx' because the 'PRIMARY' filegroup is full). I did delete an old database before it finished after getting a disk warning so maybe it did not count that new space. Regardless it was too slow. And this was on just the two largest of these columns (12.6M rows) and only ran 2 to 3% CPU busy so seemed not very efficient, and indicated unacceptable downtime if we were to convert even these two fields much less any additional fields. Average field size for these two fields was only 206 characters or 412 bytes each. Another technique I plan to try is to create the new table def in a new schema, select into it from the old table, then move tables amongst schema and delete the old table. I have a FK and indexes to contend with on the table.

  2. If I figure out how to do #1 efficiently within an acceptable maint window, what is the safest practice for doing a one-time shrink and end up with organized/rebuilt indexes and updated Statistics? I understand the logic of not doing regular shrinks and that sometime it can actually increase the size.

  3. Is there any third party tool that could take a backup and restore it into a new database with the modified field definitions or otherwise convert certain field types?

Any suggestions and best practices most welcome.

Thanks, Dave

Best Answer

Regarding:

I tested doing an “add new column, set new=old, drop old, rename old to new” for both of the main two fields I want to convert from nvarchar(max) to varchar(max), and it took 81 minutes on our test server ... before running out of disk space ... it was too slow.

and

Another technique I plan to try is to create the new table def in a new schema, select into it from the old table, then move tables amongst schema and delete the old table.

Generally speaking, making a copy of the table with the ideal schema is my preferred approach. But, if you only now have maybe enough space to convert the two columns, are you sure you have enough space to make a copy of the entire table?

Also, the new table would just need to have a different name. It wouldn't need to be in a different schema.

Since you are on Enterprise Edition, have you considered or even looked at enabling Data Compression? Not only would it have the effect you are looking for on the NCHAR / NVARCHAR fields, but it would also save space on other fields with other types as well.

There are two types of Compression: Row and Page. You should read up on them and run the stored procedure that estimates what your savings would be.

Enabling compression can be done as an ONLINE operation, but might require a bit of disk space. If you do not have the available space for this, then you might could consider a hybrid approach where you build a copy of the table as TableNEW, and have the clustered index already created, and created with Compression enabled. Then you should be able to slowly fill TableNEW and the data will compress as it goes in. Of course, you will want to use INSERT INTO...SELECT to do it in batches. And the full benefit of the compression might not be realized until after you drop the original table and do a full rebuild on the Clutered Index of TableNEW.

And keep in mind that there are scenarios where you might not save that much space, or saving the space isn't worth the increase in CPU activity. But, that all depends on a lot of factors so it is really something that should be tested on your system.

You could always take the approach of:

  1. Enable compression, either directly to the current table as ONLINE (if there is enough space to support it), or into a separate table that has Compression enabled.
  2. IF you find that CPU actually has increased beyond the benefit of the space savings, then you have the option of building the table again with regular VARCHAR fields and no Compression. Because compression was already enabled, you should definitely have room for this now.

But again, like anything we do, it should be tested. I have heard for years how horrible "XML parsing" is on CPU, and how bad Compression is supposed to be, but in practice, those concerns are quite often overstated. The only way to know is to test on your system. (ps, just in case it is not clear being a text-only medium, these final statements are not attacking what @Kin said in his answer about needing to be cautious of CPU activity increase. He is correct, at least to a degree. I am just reminding everyone to be sure to put everything into perspective of current hardware and software and system setup).