Disclaimer
I had and idea for a table design a little while ago that made sense to me at the time. In recent history I have realized I have a habit of "over engineering" or spending too much time trying to "over optimize" solutions. I am assuming that this is one of those instances where I am over thinking the solution and will create extra overhead with little to no actual gain.
Question
Assume for an object (a Shipment
table
for example) there is some kind of NOTE
varchar(MAX)
data element we want to keep track of. The NOTE
column
has the opportunity to have data pushed into an overflow and dramatically increase the size of the row
(thus limiting the number of rows
that can be saved on the page
). As I understand it, this then negatively impacts the execution time of various operations on the table as a whole.
Is there ever an instance where we should push that column
into a separate stand alone ShipmentNote
table
instead of keeping it as a column
on the Shipment
table
? The theory being, if we push the NOTE
column
into a separate table
it saves the pages
on the Shipment
table
which allows all operations on the Shipment
table
to perform better. Because the row
size is smaller and now you can fit more rows
onto the same page
.
(See schema examples below):
The main use case where this might be a good idea is if:
- The
Note
column
is regularly over 8000 characters (which I think is when we start using the extra paging) - The
Note
column
only returned inSELECT
operations and is rarely or never used as part of aJOIN
orWHERE
- The remaining
columns
onShipment
are going to be queried independent ofNote
on a regular basis (IE: most of our operations, will not utilizeNote
, be thatJOIN
orWHERE
conditions happening on othercolumns
in `Shipment)
The disadvantages I am seeing (outside of this may not actual make noticeable improvements in working with the Shipment
table outside of the Note
column):
- It now becomes impossible (or at least requires a
trigger
or something else) to ensure that there is always some kind of value forNOTE
(IE: because it is now on a separatechild table
we can't ensure thatNOTE
isNOT NULL
for everyrow
inShipment
- Any operation utilizing
NOTE
will now take extra effort, because of the need to do aJOIN
to make sure we are working with the correct record
Best Answer
While its great that you are even thinking about this - I would consider it to be micro-optimisation. SQL Server already has optimisations built into it for handling
NVARCHAR(MAX)
and so generally I would say to not second guess it until it becomes a problem. However, there are two main points that came to mind when reading your question.The first is (as you touched on with your last point) would there actually be any performance gained by splitting the data out into another table over letting SQL Server optimise the pages and data itself? I don't have hard evidence with me right now but I suspect not. The reason is that, as you mention, your data is regularly over 8000 bytes and therefore will be moved to LOB pages in your Note table anyway. You would just be moving the perceived problem from one table to another. Add the overhead of performing a
JOIN
when having to fetch the data and you are likely end up slightly worse off than you were to begin with.Secondly, as you tagged the question with database design, my actual recommendation is that you do split out the notes into another table. However, my reasoning is not so much driven by the performance optimisation but more specifically adding flexibility to the application. It is highly likely that a shipment may need more than one note attached to it (for example if it is delayed on transit, or a customer calls in etc...). If you have the note on the actual shipment itself it makes it harder to do this without duplicating data. If you extract it into its own table you will be able to assign multiple notes to a shipment, and track who entered them and when. This makes your application and database more flexible.
If you want to optimise then I would recommend removing the need to query the notes unless someone specifically wants to view them (such as having a dedicated form in your application to do so). That way, you only need to perform the join/lookup when somebody explicitly wants to view the data. Even if you didn't externalise the notes to another table, not including the column in the query if it is not needed will remove the need for SQL Server to read the LOB page. If possible I'd always optimise queries to return the least amount of information required before optimising at a micro level.