Sql-server – SQL Server VARCHAR Column Width

sql serversql-server-2008varchar

Searching around the web, I have found conflicting advice on whether there is a performance impact when specifying overly-wide VARCHAR columns, e.g. VARCHAR(255) when VARCHAR(30) will probably do.

I consistently see agreement that there's a performance hit if the entire row exceeds 8060 bytes. Other than that, I see disagreement.

Is the claim true that The default is SET ANSI PADDING ON = potential for lots of trailing spaces? As long as the total row width is less than 8060, are there any real performance concerns in over-sizing VARCHAR columns?

Evidence that column width matters


The same goes for CHAR and VARCHAR data types. Don’t specify more characters in character columns that you need.

http://www.sql-server-performance.com/2007/datatypes/


  • Length is a constraint on the data (like CHECK, FK, NULL etc)
  • Performance when the row exceeds 8060 bytes
  • Can not have unique constraint or index (key column width must be < 900)
  • The default is SET ANSI PADDING ON = potential for lots of trailing spaces

What are the consequences of setting varchar(8000)?


Evidence that column width DOES NOT matter


If you're talking about varchar and nvarchar then no, there is no penalty for allowing a higher field length.

https://stackoverflow.com/questions/7025996/overstating-field-size-in-database-design


The varchar datatype, by contrast, consumes only the amount of
actual space used plus 2 bytes for overhead

http://sqlfool.com/content/PerformanceConsiderationsOfDataTypes.pdf


Best Answer

The question might be better stated as:

"What is the advantage of over-specifying the maximum length of a variable-length column?"

In general, there is little advantage, and several disadvantages as the various linked answers point out. Aside from any other concerns, consider that SQL Server is not open-source: there are many 'magic numbers' and heuristics applied based on the information provided to the system. Without source code access, we could never be entirely sure what the impact of this practice might be.

In some cases, where the average length of a column is significantly higher than the 50% assumed by SQL Server when calculating sort/hash memory grants, you may see a performance improvement by over-specifying the maximum length. This is a dubious workaround, and should probably only be applied by an explicit CAST or CONVERT (with comments!) rather than changing the base column definition. Sometimes, it will be preferable to rewrite the query to sort keys instead of whole rows anyway.

Where the maximum row size might exceed the in-row limit (even if no rows actually do), deleting rows can cause unexpected page-splitting if a trigger is present. Updates may also lead to fragmentation via the same mechanism.

SQL Server does a pretty good job in most cases where it is provided with good, accurate information from metadata. Compromising this principle for 'convenience' seems unwise to me. A sensible approach is to choose a maximum length value that is reasonable according to the actual data to be stored, and any foreseeable changes.