Sql-server – SQL Server – Physical Table Design – Column Order – Null and Non Null Columns

database-designsql serverssmstable

We are in the process of building out a database for a new application. I got some feedback from the architect that he read somewhere there is a memory utilization benefit for having all of the non-nullable columns in a table first, then all the nullable columns. If there are nullable columns in-between non-nullable columns that some kind of memory or storage benefit is forfeited because those nullable columns are between non-nullable columns.

The columns were ordered in a slightly ambiguous order but in general it was prioritized for columns which were more likely to be used first.

I was under the impression that SSMS orders the columns on the page file by its own logic instead of the specific order they are created on the table. That the order I put the columns in on the table was totally independent from how the data was stored in the page file.

Is there any documentation somewhere that would support this finding/understanding?

EDIT:

Having some more conversations around figuring out where this idea came from. The request was an over simplification of having high density of data, or columns used most frequently in SELECT or JOIN operations in the front of the table to prevent them from moving to the Overflow section of the page file.

That the order of the columns assigned to the table is the same order the data is added onto the page file. For a given row, the data is stored on the page file in the order the columns are set on the table. If the row holds more data than the page file allows (assuming we are not using anything that would go into a LOB IMAGE or VARCHAR(MAX) or something like that) then the remainder is put onto the overflow file. If that overflow data is needed than there is extra effort and time spent doing the lookup to that overflow file. That can result in noticeable impacts to performance in high row environments (10's or 100's of millions of rows +). So we would want to prioritize the column order in how often we think that column will be accessed, not necessarily if it is NULL or not.

Does that understanding/analysis sound right?

Best Answer

I am posting this here as an answer so that I have more room to post resources and links, as well as explain a bit.

To answer the specific question about NULLABLE vs NON-NULLABLE columns and their relative cardinality, Kimberly Tripp has an article here: https://www.sqlskills.com/blogs/kimberly/column-order-doesnt-matter-generally-but-it-depends/

Unfortunately I cannot find the reference that supports my statement about having variable width columns at the end of the column order list, but the reasoning was that updating a variable width column at the end of a row would be less likely to cause fragmentation (and off-page spill) than one in the middle of the column order.

But as I've previously said, these are hard to get right and unless you know exactly your data access patterns you will probably get it wrong. There are many other low hanging fruit to tune over worrying about column order. Switching to properly sized static column lengths would be an easier way to get better memory estimates and avoid fragmentation.

Indexes with proper fill factors, statistic updates, stored procedures for data access (so you can more tightly control execution plans) and more. If you are on enterprise edition then you have more options, table compression will allow you to fit more data into memory and that's a far easier (and more widely applicable), partitioning will allow you to do really cool things on management side (although using it to accelerate your reads is a challenge (partition elimination is another hard thing to get right)).

Note, when I say hard to get right, I usually mean that it's hard to get SQL to use them to best advantage, so getting ROI for time spent optimizing is usually much better spent elsewhere.

To answer the last section of the question in the EDIT section. Yes, pre-loading the column with primary keys, foreign keys and the other join columns would probably help. But if you have indexes on those then it shouldn't matter as the index will be used to determine which rows to retrieve and then a keylookup performed. I just don't think spending architecture time on it is worth the impact. You are probably going to have an index on those columns in any case.

Related Solutions

Sql-server – Check constraint only one of three columns is non-null

The following should do the trick:

CREATE TABLE MyTable (col1 FLOAT NULL, col2 NVARCHAR(30) NULL, col3 DATETIME NULL);
GO

ALTER TABLE MyTable
ADD CONSTRAINT CheckOnlyOneColumnIsNull
CHECK 
(
    ( CASE WHEN col1 IS NULL THEN 0 ELSE 1 END
    + CASE WHEN col2 IS NULL THEN 0 ELSE 1 END
    + CASE WHEN col3 IS NULL THEN 0 ELSE 1 END
    ) = 1
)
GO

Database Design: New Table versus New Columns

What you are wrestling with is vertical partitioning. This is a physical database design technique to improve performance. As with any physical database design technique, its applicability depends on the specific queries you are trying to optimize and if this technique will optimize them. From a logical standpoint, if these new fields depend upon the candidate key for your entity then they are facts about it that belong with it. First you should make sure you fully understand the functional dependence of these new fields on your candidate keys to verify they really are facts about daily page views. If they are, deciding to partition them into another table is a performance optimization that should only be done if it achieves your performance goals.

In general, vertical partitioning is useful if you will query these new columns infrequently and distinctly from the other columns in the original table. By placing those columns in another table that shares the same PK as your existing table, you can query it directly when you want those new columns and get much greater through-put as you will have many more rows per page on disk for this new table as all the columns from the original table won't be sitting on those rows. However, if you will always query these columns along with the columns in the original table then a vertical partition wouldn't make much sense as you will always have to outer join to get them. Pages from tables on disk come into the buffer pool of a DBMS independently, never pre-joined, and so that join will have to happen with every query execution even if the data is pinned in the buffer pool. In this scenario making them NULLABLE columns on the original table would enable the DBMS storage engine to store them efficiently when NULL and eliminate the need to join on retrieval.

It sounds to me like your use case is the latter and adding them as NULLABLE to your original table is the way to go. But as with everything else in database design, it depends, and in order to make the right decision you need to know your expected workload and what making a good choice depends on. One good example of a proper use case for vertical partitioning would be a person search panel, where your application has some very rarely populated information about a person that someone might want to search on but rarely does. If you put that information into a different table you have some good options for performance. You can write the search so that you have 2 queries - one that uses the main, always populated information to search (like last name or ssn) only, and one that outer joins the very infrequently populated information only when it is requested for search. Or you could take advantage of the DBMS optimizer if it is smart enough to recognize for a given set of host variables that the outer join is not needed and won't perform it, and thus you only have to create 1 query.

What DBMS platform are you using? The way in which the platform handles NULL column storage, optimizes your query, as well as the availability of sparse column support (SQL Server has this) will impact the decision. Ultimately I would recommend trying out both designs in a test environment with production sized data and workload and seeing which better achieves your performance objectives.

Best Answer

Related Solutions

Sql-server – Check constraint only one of three columns is non-null

Database Design: New Table versus New Columns

Related Question