Sql-server – Store VarChar(Max) [Notes] Column on Same or Separate Table

database-designsql servervarchar

Disclaimer

I had and idea for a table design a little while ago that made sense to me at the time. In recent history I have realized I have a habit of "over engineering" or spending too much time trying to "over optimize" solutions. I am assuming that this is one of those instances where I am over thinking the solution and will create extra overhead with little to no actual gain.

Question

Assume for an object (a Shipment table for example) there is some kind of NOTE varchar(MAX) data element we want to keep track of. The NOTE column has the opportunity to have data pushed into an overflow and dramatically increase the size of the row (thus limiting the number of rows that can be saved on the page). As I understand it, this then negatively impacts the execution time of various operations on the table as a whole.

Is there ever an instance where we should push that column into a separate stand alone ShipmentNote table instead of keeping it as a column on the Shipment table? The theory being, if we push the NOTE column into a separate table it saves the pages on the Shipment table which allows all operations on the Shipment table to perform better. Because the row size is smaller and now you can fit more rows onto the same page.

(See schema examples below):

ER Diagram

The main use case where this might be a good idea is if:

  • The Note column is regularly over 8000 characters (which I think is when we start using the extra paging)
  • The Note column only returned in SELECT operations and is rarely or never used as part of a JOIN or WHERE
  • The remaining columns on Shipment are going to be queried independent of Note on a regular basis (IE: most of our operations, will not utilize Note, be that JOIN or WHERE conditions happening on other columns in `Shipment)

The disadvantages I am seeing (outside of this may not actual make noticeable improvements in working with the Shipment table outside of the Note column):

  • It now becomes impossible (or at least requires a trigger or something else) to ensure that there is always some kind of value for NOTE (IE: because it is now on a separate child table we can't ensure that NOTE is NOT NULL for every row in Shipment
  • Any operation utilizing NOTE will now take extra effort, because of the need to do a JOIN to make sure we are working with the correct record

Best Answer

While its great that you are even thinking about this - I would consider it to be micro-optimisation. SQL Server already has optimisations built into it for handling NVARCHAR(MAX) and so generally I would say to not second guess it until it becomes a problem. However, there are two main points that came to mind when reading your question.

The first is (as you touched on with your last point) would there actually be any performance gained by splitting the data out into another table over letting SQL Server optimise the pages and data itself? I don't have hard evidence with me right now but I suspect not. The reason is that, as you mention, your data is regularly over 8000 bytes and therefore will be moved to LOB pages in your Note table anyway. You would just be moving the perceived problem from one table to another. Add the overhead of performing a JOIN when having to fetch the data and you are likely end up slightly worse off than you were to begin with.

Secondly, as you tagged the question with database design, my actual recommendation is that you do split out the notes into another table. However, my reasoning is not so much driven by the performance optimisation but more specifically adding flexibility to the application. It is highly likely that a shipment may need more than one note attached to it (for example if it is delayed on transit, or a customer calls in etc...). If you have the note on the actual shipment itself it makes it harder to do this without duplicating data. If you extract it into its own table you will be able to assign multiple notes to a shipment, and track who entered them and when. This makes your application and database more flexible.

If you want to optimise then I would recommend removing the need to query the notes unless someone specifically wants to view them (such as having a dedicated form in your application to do so). That way, you only need to perform the join/lookup when somebody explicitly wants to view the data. Even if you didn't externalise the notes to another table, not including the column in the query if it is not needed will remove the need for SQL Server to read the LOB page. If possible I'd always optimise queries to return the least amount of information required before optimising at a micro level.