Sql-server – ‘Row Chaining/Overflow’ in SQL Server when adding column to table

sql serversql-server-2012

Each row in a table can be 8060 bytes, since page size is 8KB. If a row size exceeds this, the largest column is moved to another page ROW_OVERFLOW_DATA and a pointer is created on the original page.

What happens when a new column is added to an existing table ? Is the data stored at the 'end' of the table, and pointers are created after each existing row in a page to point to this new data ? Clearly this would cause performance issues..

Or is the entire table reorganized so the new column 'fits' in the original page ?

We're running SQL 2012 (& above) Enterprise, but also interested to hear about lower versions.
I'm interested in both heaps & clustered indexes, because in either one, the existing row must be modified – to add the new column, or to add a pointer to the new column, right ?

Best Answer

There are three separate concepts here and I think that it's important to keep them separate.

1. Adding a column to a table

Adding a column to a table is an optimized operation designed to not take a lot of time. Most of the time it will not require all pages in the table to be modified. You can verify by this by measuring the size of the transaction required to do so or by looking at the space used by the table before and after.

Suppose that you add a column to a table with a default value of 1. The table has 100 billion rows. Does it really make sense to modify every data page to write a "1" for every row? That could take a really long time. Instead, you can interpret the data in a way to work around it. If a value for the new row hasn't been written to the page yet then you can assume that it's the default value.

Martin Smith pointed out that this optimization only apply to Enterprise edition. One rule of thumb is that SQL Server needs to modify every page if the new value for the column isn't constant, such as a computed column or a column with a default of NEWID(). There are edge cases which may result in all pages getting modified, such as this one.

2. Updating a column for an existing row

If you update a column it's possible that the full row will no longer fit on the data page. For heaps, the row will be moved to a new page with a pointer. This is known as forwarded records. For clustered indexes some of the rows will be moved to a new page. This is known as a page split and many people have written extensively about them. If a table has so many forwarded records and page splits that you're experiencing some kind of performance problem then that can be cleaned up with a REBUILD. But you probably have better things to worry about.

3. Rows larger than 8k

As of SQL Server 2008 rows can be larger than 8k. Some of the mechanics involved in doing that are explained in the link that you have in the question. Conceptually this really has very little to do with adding a column to a table.