Sql-server – the best practice for storing large data when nvarchar(4000) is sufficient 99% of the time

sql serversql-server-2005

I have a table with about 100MM rows that stores information about a user.

CREATE TABLE [dbo].[UserData](
        [UserDataID] [int] IDENTITY(1,1) NOT NULL,
        [UserID] [int] NOT NULL,
        [DataId] int NOT NULL,
        [DataValue] [nvarchar](4000) NOT NULL,
        [EncryptedDataValue] [varbinary](max) NULL)

I need to be able to store data in DataValue that is > nvarchar(4000). But, only a very small percentage of the rows actually need this column as nvarchar(max). I know that as soon as DataValue is over nvarchar(4000), internally, sql will store the data as a blob[?], substantially increasing the time it takes to make this change. (not sure how read/write time will be affected later).

I thought of a few potential options…

  1. Change DataValue from nvarchar(4000) to nvarchar(max) and just eat the time it takes to make the change; not worrying that only 1% of the rows are using MAX?

  2. Alongside DataValue, add a DataValueXL column that is nvarchar(max) and introduce application logic to save in the appropriate column depending on the size of the data? (Marking both as NULL)

  3. Create a new table FKed to UserDataId to store only large DataValues > 4000?

Which–if any–should I go with?

thanks

Best Answer

as soon as DataValue is over nvarchar(4000), internally, sql will store the data as a blob

This is incorrect. SQL Server will use off-row storage for both max and non-max types. In other words your NVARCHAR(4000) may also be stored off-row. The actual storage location will depend on whether the row fits or not on a page and on the exact combination of sp_tableoption values for the large value types out of row and text in row settings.

Your best bet is to change it to max. SQL will store the value in-row whenever possible, offering fast access. This makes application programing much simpler as you don't have to deal with the actual location (Column) based on size. You should only be concerned if you have an objection on adding the first max type column to a table (eg. it will prevent online rebuild operations), but you are already willing to add one so that should not be an issue.