SQL Server – Using MAX Text or More Specific, Smaller Type

database-designdatatypesperformancesql servervarchar

Someone was reviewing my DDL code for creating tables and suggested, when they saw I saw using VARCHAR(256) fields for text I expect to be pretty small, like a first name or whatever, that I should always just use VARCHAR(MAX) and linked Why use anything but varchar(max). I read it but it seemed dated, as it was focusing on 2005, and didn't seem to offer any real justification to allocate potentially up to 2 GB per row on all text fields.

From a performance, storage, etc. standpoint, how should one go about deciding whether to use VARCHAR(MAX) or a smaller more specific type for modern versions of SQL Server? (e.g., 2008, 2012, 2014)

Best Answer

Should I always use (n)varchar(max) for text columns?

No.

For SQL Server, the max data types should only be specified when there is no alternative. One should instead choose the correct base type (varchar or nvarchar) and specify an explicit maximum length that is appropriate to the data to be stored.

Physical storage is identical whether the column is typed as varchar(n) or varchar(max), so that is not the concern.

The reasons to not choose (n)varchar(max) everywhere revolve around features, plan quality, and performance.

An exhaustive list is probably not practical, but among other things, max columns:

Features

  • Require a separate constraint to enforce a maximum length
  • Cannot be a key in an index (so no unique constraints either)
  • May prevent online DDL (including index rebuilds and adding a new non-null column)
  • Are generally not supported for 'newer' features e.g. columnstore
  • See the product documentation for more specific features and limitations. The general pattern is that there are awkward limitations and restrictions around max data types. Not all limitations and side-effects are documented.

Performance

  • Require special handling in the execution engine, to account for the potentially very large size. Typically, this involves using a less efficient code path, with a streaming interface
  • May have similar unanticipated consequences for external code (and other SQL Server components like SSIS), which must also be prepared to handle data up to 2GB in size
  • Are assumed to be 4000 bytes wide in memory grant calculations. This is likely to lead to excessive memory reservation, which limits concurrency, and pushes valuable index and data pages out of cache memory
  • Disable several important performance optimizations
  • May extend lock duration
  • May prevent the optimizer choosing a (non-dynamic) seek plan
  • Prevent filters being pushed into scans and seeks as a residual
  • May increase tempdb pressure and contention (version dependant), since variables and parameters are also likely to be typed as max to match column definitions

In summary, there are so many subtle (and undesirable) side effects of unnecessarily using the max specifier that it makes no sense to do this. The minor 'convenience' of using a single declaration is no sort of compensation.

Evaluate each type in context, use the correct base type (varchar or nvarchar), and a sensible explicit length.

Further reading: