SQL Server – Impact of Long Columns on Performance and Disk Usage

database-designdatatypessql-server-2008

In our current project it just happens too often, that we need to extend columns by a couple of characters. From varchar(20) to varchar(30) and so on.

In reality, how much does it really matter? How good is this optimized? What is the impact of just allowing 100 or 200 or even 500 chars for normal "input" fields? An email can only have 320 chars, so ok – there is a good limit there. But what do I gain if I set it to 200, because I do not expect longer e-mail addresses than that.

Usually our tables will not have more than 100.000 rows, and up to 20 or 30 such columns.

We use SQL Server 2008 now, but it would be interesting to know how different DBs handle this issues.

In case the impact is very low – as I would expect, it would help to get some good arguments (backed up with links?) to convince my DBA, that this long-field-paranoia isn't really necessary.

In case it is, I'm here to learn 🙂

Best Answer

The specific answer to your question (at least for Oracle and probably other databases) is that the length of the field doesn’t matter, only the length of the data. However, this shouldn’t be used as a determining factor concerning whether to set the field to its maximum allowable length or not. Here are some other issues you should consider before maxing out field sizes.

Formatting Any client tool that formats the data based on the size of the fields will require special formatting considerations. Oracle’s SQL*Plus for example by default displays the maximum size of Varchar2 columns even if the data is only one character long. Compare…

create table f1 (a varchar2(4000), b varchar2(4000));
create table f2 (a varchar2(5), b varchar2(5));
insert into f1 values ('a','b');
insert into f2 values ('a','b');
select * from f1;
select * from f2;

Bad Data Field length provides an additional mechanism to catch/prevent bad data. An interface shouldn’t attempt to insert 3000 characters into a 100 character field, but if that field is defined to be 4000 characters, it just might. The error woudn’t be caught at the data entry stage, but the system may have trouble further down when another application tries to process the data and chokes. As an example, if you later decide to index the field in Oracle you would exceed the maximum key length (depending on block size and concatenation). See…

create index i1 on f1(a);

Memory If the client application allocates memory using the maximum size, the application would allocate significantly more memory than is necessary. Special considerations would have to be done to avoid this.

Documentation The size of the field provides another data point of documentation about the data. We could call all tables t1, t2, t3, etc. and all fields f1, f2, f3, etc., but by specifying meaningful names we better understand the data. For example, if an address table for a company with customers in the U.S. has a field called State that is two characters we expect the two character state abbreviation to go in it. On the other hand if the field is one hundred characters we might expect the full state name to go in the field.


All that being said, it does seem prudent to be prepared for change. Just because all your product names today fit in 20 characters doesn’t mean they always will. Don’t go overboard and make it 1000, but do leave room for plausible expansion.