Sql-server – SQL Server – Physical Table Design – Column Order – Null and Non Null Columns

database-designsql serverssmstable

We are in the process of building out a database for a new application. I got some feedback from the architect that he read somewhere there is a memory utilization benefit for having all of the non-nullable columns in a table first, then all the nullable columns. If there are nullable columns in-between non-nullable columns that some kind of memory or storage benefit is forfeited because those nullable columns are between non-nullable columns.

The columns were ordered in a slightly ambiguous order but in general it was prioritized for columns which were more likely to be used first.

I was under the impression that SSMS orders the columns on the page file by its own logic instead of the specific order they are created on the table. That the order I put the columns in on the table was totally independent from how the data was stored in the page file.

Is there any documentation somewhere that would support this finding/understanding?

EDIT:

Having some more conversations around figuring out where this idea came from. The request was an over simplification of having high density of data, or columns used most frequently in SELECT or JOIN operations in the front of the table to prevent them from moving to the Overflow section of the page file.

That the order of the columns assigned to the table is the same order the data is added onto the page file. For a given row, the data is stored on the page file in the order the columns are set on the table. If the row holds more data than the page file allows (assuming we are not using anything that would go into a LOB IMAGE or VARCHAR(MAX) or something like that) then the remainder is put onto the overflow file. If that overflow data is needed than there is extra effort and time spent doing the lookup to that overflow file. That can result in noticeable impacts to performance in high row environments (10's or 100's of millions of rows +). So we would want to prioritize the column order in how often we think that column will be accessed, not necessarily if it is NULL or not.

Does that understanding/analysis sound right?

Best Answer

I am posting this here as an answer so that I have more room to post resources and links, as well as explain a bit.

To answer the specific question about NULLABLE vs NON-NULLABLE columns and their relative cardinality, Kimberly Tripp has an article here: https://www.sqlskills.com/blogs/kimberly/column-order-doesnt-matter-generally-but-it-depends/

Unfortunately I cannot find the reference that supports my statement about having variable width columns at the end of the column order list, but the reasoning was that updating a variable width column at the end of a row would be less likely to cause fragmentation (and off-page spill) than one in the middle of the column order.

But as I've previously said, these are hard to get right and unless you know exactly your data access patterns you will probably get it wrong. There are many other low hanging fruit to tune over worrying about column order. Switching to properly sized static column lengths would be an easier way to get better memory estimates and avoid fragmentation.

Indexes with proper fill factors, statistic updates, stored procedures for data access (so you can more tightly control execution plans) and more. If you are on enterprise edition then you have more options, table compression will allow you to fit more data into memory and that's a far easier (and more widely applicable), partitioning will allow you to do really cool things on management side (although using it to accelerate your reads is a challenge (partition elimination is another hard thing to get right)).

Note, when I say hard to get right, I usually mean that it's hard to get SQL to use them to best advantage, so getting ROI for time spent optimizing is usually much better spent elsewhere.

To answer the last section of the question in the EDIT section. Yes, pre-loading the column with primary keys, foreign keys and the other join columns would probably help. But if you have indexes on those then it shouldn't matter as the index will be used to determine which rows to retrieve and then a keylookup performed. I just don't think spending architecture time on it is worth the impact. You are probably going to have an index on those columns in any case.